Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Sort vector after relative frequency

Subject: Sort vector after relative frequency

From: Christian B.

Date: 9 Aug, 2012 09:57:10

Message: 1 of 10

Hello!

I have a 1D-vector containing numbers, and I would like to sort them so that they are not scrambled any more. It should be optimized to the effect that the least differences of positions between original and sorted vector are necessary, but also each number occurs contiguously in one piece (which is a must).

An example makes the situation clearer:

Source:
[11 11 3 11 11 3 3 3 17 17 3 11 17 17 17]
Sorted:
[11 11 11 11 11 3 3 3 3 3 17 17 17 17 17]

Source:
[ 2 3 3 4 4 4 2 4 4 2 ]
Sorted:
[ 3 3 4 4 4 4 4 2 2 2]

The numbers represent IDs, so the value is irrelevant - it just marks the difference between them.
I also know that in some situations there may be very bad solutions when looking at the number of differences between source and sorted vector, but an optimization is possible in each case.

Help on this would be appreciated!

Thanks.

Subject: Sort vector after relative frequency

From: Torsten

Date: 9 Aug, 2012 13:20:05

Message: 2 of 10

On 9 Aug., 11:57, "Christian B." <christian.backfrie...@fh-
hagenberg.at> wrote:
> Hello!
>
> I have a 1D-vector containing numbers, and I would like to sort them so that they are not scrambled any more. It should be optimized to the effect that the least differences of positions between original and sorted vector are necessary, but also each number occurs contiguously in one piece (which is a must).
>
> An example makes the situation clearer:
>
> Source:
> [11 11 3 11 11 3 3 3 17 17 3 11 17 17 17]
> Sorted:
> [11 11 11 11 11 3 3 3 3 3 17 17 17 17 17]
>
> Source:
> [ 2 3 3 4 4 4 2 4 4 2 ]
> Sorted:
> [ 3 3 4 4 4 4 4 2 2 2]
>
> The numbers represent IDs, so the value is irrelevant - it just marks the difference between them.
> I also know that in some situations there may be very bad solutions when looking at the number of differences between source and sorted vector, but an optimization is possible in each case.
>
> Help on this would be appreciated!
>
> Thanks.

Google
"sorting an array with minimum number of swaps"

Best wishes
Torsten.

Subject: Sort vector after relative frequency

From: Christian B.

Date: 9 Aug, 2012 13:45:16

Message: 3 of 10

Torsten <Torsten.Hennig@umsicht.fraunhofer.de> wrote in message <2571f90e-001c-4ec9-8bbc-f2d0e9201722@f2g2000vbm.googlegroups.com>...
>
> Google
> "sorting an array with minimum number of swaps"
>
> Best wishes
> Torsten.

Thanks for your answer, but that's not what I'm looking for. The number of swaps does not matter, much more the number of differences between the original and the reordered vector plays a role.

Maybe sorting is the wrong word, reorder may describe it better. It has nothing to do with ascending or descending, the goal is just to have all numbers of one value contiguously next to each other, while the number of differences between the original and reordered vector should be minimal. Or speaking in matlab code: sum(original==reordered) should be maximized

Subject: Sort vector after relative frequency

From: Kwen

Date: 9 Aug, 2012 13:53:13

Message: 4 of 10

"Christian B." wrote in message <k00etc$6en$1@newscl01ah.mathworks.com>...
> Torsten <Torsten.Hennig@umsicht.fraunhofer.de> wrote in message <2571f90e-001c-4ec9-8bbc-f2d0e9201722@f2g2000vbm.googlegroups.com>...
> >
> > Google
> > "sorting an array with minimum number of swaps"
> >
> > Best wishes
> > Torsten.
>
> Thanks for your answer, but that's not what I'm looking for. The number of swaps does not matter, much more the number of differences between the original and the reordered vector plays a role.
>
> Maybe sorting is the wrong word, reorder may describe it better. It has nothing to do with ascending or descending, the goal is just to have all numbers of one value contiguously next to each other, while the number of differences between the original and reordered vector should be minimal. Or speaking in matlab code: sum(original==reordered) should be maximized

http://www.mathworks.com/help/techdoc/ref/sortrows.html

Why would sort not do this? The intention is for it to sort by value, so wouldn't all the values end up being placed together? This won't minimize the number of changes but you can index where the changes are if you were needing to go back to it?

Subject: Sort vector after relative frequency

From: Christian B.

Date: 9 Aug, 2012 14:12:16

Message: 5 of 10

>
> http://www.mathworks.com/help/techdoc/ref/sortrows.html
>
> Why would sort not do this? The intention is for it to sort by value, so wouldn't all the values end up being placed together? This won't minimize the number of changes but you can index where the changes are if you were needing to go back to it?

I don't need to go back to the original, the major criterion is to reorder the vector to have contiguous blocks of equal numbers, while maximizing the optimization criterion sum(vector==reordered), which is not optimal when simply using matlab's sort function.

My initial example as an explanation:

x=[11 11 3 11 11 3 3 3 17 17 3 11 17 17 17];
s=[11 11 11 11 11 3 3 3 3 3 17 17 17 17 17]; %the intended output

sum(x==s)
10
sum(x==sort(x))
4

Subject: Sort vector after relative frequency

From: dpb

Date: 9 Aug, 2012 14:39:28

Message: 6 of 10

On 8/9/2012 9:12 AM, Christian B. wrote:
...

> I don't need to go back to the original, the major criterion is to
> reorder the vector to have contiguous blocks of equal numbers, while
> maximizing the optimization criterion sum(vector==reordered), which is
> not optimal when simply using matlab's sort function.
...

I don't know of anything otomh other than a heuristic search or LP
solution...

You can start w/ unique(), histc(), sort() or whatever to select the
values but if the permutation desired must meet the other criterion
that's a separate solution space.

It might(?) be sufficient to simply select the first in the list and
work through collecting matching until get to end if the absolute
minimum isn't absolutely required??? I've not considered at all whether
there are multiple equivalent solutions altho my initial guess is there
are in general.

--

Subject: Sort vector after relative frequency

From: Christian B.

Date: 9 Aug, 2012 14:58:10

Message: 7 of 10

dpb <none@non.net> wrote in message <k00i33$kr8$1@speranza.aioe.org>...
>
> I don't know of anything otomh other than a heuristic search or LP
> solution...
>
> You can start w/ unique(), histc(), sort() or whatever to select the
> values but if the permutation desired must meet the other criterion
> that's a separate solution space.
>
> It might(?) be sufficient to simply select the first in the list and
> work through collecting matching until get to end if the absolute
> minimum isn't absolutely required??? I've not considered at all whether
> there are multiple equivalent solutions altho my initial guess is there
> are in general.
>
> --

Thanks for your answer, I feared that its not that easy. The problem with a loop solution is that I guess it will take inacceptably long to find the optimum solution. And I agree with you - there will be multiple equivalent solutions in some cases.

The solution I first thought of is to reorder the groups and check if the differences between original and reordered are really a minimum, but this solution will result in exponential time increase if the number of different values rises

Subject: Sort vector after relative frequency

From: dpb

Date: 9 Aug, 2012 18:39:42

Message: 8 of 10

On 8/9/2012 9:58 AM, Christian B. wrote:
> dpb <none@non.net> wrote in message <k00i33$kr8$1@speranza.aioe.org>...
>>
>> I don't know of anything otomh other than a heuristic search or LP
>> solution...
>>
>> You can start w/ unique(), histc(), sort() or whatever to select the
>> values but if the permutation desired must meet the other criterion
>> that's a separate solution space.
>>
>> It might(?) be sufficient to simply select the first in the list and
>> work through collecting matching until get to end if the absolute
>> minimum isn't absolutely required??? ...
>
> Thanks for your answer, I feared that its not that easy. The problem
> with a loop solution is that I guess it will take inacceptably long to
> find the optimum solution. And I agree with you - there will be multiple
> equivalent solutions in some cases.
>
> The solution I first thought of is to reorder the groups and check if
> the differences between original and reordered are really a minimum, but
> this solution will result in exponential time increase if the number of
> different values rises

That's what happens w/ LP programming in general...it's somewhat akin to
the traveling salesman problem in that you're asking the maximum
distance of group instead of minimum but since there's no preset
arrangement (apparently) of the starting sequence I don't see any way to
arbitrarily solve for it other than it would appear that starting w/ the
beginning and working in order would likely be a near-optimal solution
unless the arrangement is far more random than it appears from the
sample data.

One other heuristic thought I'd have would be to look at the length of
the difference vector and start w/ the largest sequence(s) of zeros in
the looping instead of just linearly.

--

Subject: Sort vector after relative frequency

From: Bruno Luong

Date: 10 Aug, 2012 07:24:19

Message: 9 of 10

a = [ 2 3 3 4 4 4 2 4 4 2 ]

[u I J] = unique(a);
n = histc(a,u);
c = arrayfun(@(x,n) x+zeros(1,n), u, n, 'unif', 0);
[~, is] = sort(accumarray(J(:),(1:length(J))',[],@median));
b = [c{is}]

% Bruno

Subject: Sort vector after relative frequency

From: Christian B.

Date: 16 Aug, 2012 09:23:07

Message: 10 of 10

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <k02cv3$o2t$1@newscl01ah.mathworks.com>...
> a = [ 2 3 3 4 4 4 2 4 4 2 ]
>
> [u I J] = unique(a);
> n = histc(a,u);
> c = arrayfun(@(x,n) x+zeros(1,n), u, n, 'unif', 0);
> [~, is] = sort(accumarray(J(:),(1:length(J))',[],@median));
> b = [c{is}]
>
> % Bruno

Thanks a lot for this implementation, this does exactly what I was looking for.

Perfect!

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us