Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
combining arrays

Subject: combining arrays

From: David Epstein

Date: 2 Oct, 2010 19:58:03

Message: 1 of 17

I have six arrays, A1, ... ,A6, each with about 50K rows and exactly two columns. The first column of each array is a strictly increasing sequence of non-negative integers. The second column consists of doubles between 0 and 1.

I want as output an array with 7 columns. The set of first column entries is equal to the union of the six sets of first column entries, that is, NO repetition, for example as given by the Matlab function unique. In position (k,i+1), we have 0 if k does not appear in the first column of Ai, and Ai(k,2) if k does appear. Here i = 1,2,3,4,5 or 6.

What is a good way to carry out this computation?
I can do it using accumarray six times, but maybe in the future I will have to deal with more than 6 arrays, so I'm not very happy with this "solution".

thanks for any help.
David

Subject: combining arrays

From: Roger Stafford

Date: 2 Oct, 2010 21:12:04

Message: 2 of 17

"David Epstein" <David.Epstein.spam@remove.warwick.ac.uk> wrote in message <i882sb$hfj$1@fred.mathworks.com>...
> I have six arrays, A1, ... ,A6, each with about 50K rows and exactly two columns. The first column of each array is a strictly increasing sequence of non-negative integers. The second column consists of doubles between 0 and 1.
>
> I want as output an array with 7 columns. The set of first column entries is equal to the union of the six sets of first column entries, that is, NO repetition, for example as given by the Matlab function unique. In position (k,i+1), we have 0 if k does not appear in the first column of Ai, and Ai(k,2) if k does appear. Here i = 1,2,3,4,5 or 6.
>
> What is a good way to carry out this computation?
> I can do it using accumarray six times, but maybe in the future I will have to deal with more than 6 arrays, so I'm not very happy with this "solution".
>
> thanks for any help.
> David
- - - - - - - - - -
  Instead of trying to give you a best solution, David, I will be content with making comments on algorithmic efficiency. If the sum, N, of the various column lengths in your Ai's is very large, the greatest time consumer will be determining that first column in the result. If 'unique' or 'sort' are used on the concatenation of the first columns, you will have an order(N*log(N)) algorithm on your hands which overshadows all other subsequent computations with the second columns, these presumably being only order(N). What is needed in my opinion is a method of using the already sorted first columns in some kind of repeated pairwise merge operation that therefore takes advantage of already being well along on a sorting operation. Unfortunately, I don't know of any matlab functions that do this efficiently, which suggests that perhaps you ought to be thinking along the lines of doing the
merging using a good mex function. In your particular problem you would merge the first pair of columns, then the second pair, and then the third pair. Finally it will take two more merges to finish the operation and you would have something like an order(3*N) algorithm instead of order(N*log(N)).

Roger Stafford

Subject: combining arrays

From: Bruno Luong

Date: 2 Oct, 2010 21:47:03

Message: 3 of 17

"David Epstein" <David.Epstein.spam@remove.warwick.ac.uk> wrote in message <i882sb$hfj$1@fred.mathworks.com>...
> I have six arrays, A1, ... ,A6, each with about 50K rows and exactly two columns. The first column of each array is a strictly increasing sequence of non-negative integers. The second column consists of doubles between 0 and 1.
>
> I want as output an array with 7 columns. The set of first column entries is equal to the union of the six sets of first column entries, that is, NO repetition, for example as given by the Matlab function unique. In position (k,i+1), we have 0 if k does not appear in the first column of Ai, and Ai(k,2) if k does appear. Here i = 1,2,3,4,5 or 6.
>
> What is a good way to carry out this computation?
> I can do it using accumarray six times, but maybe in the future I will have to deal with more than 6 arrays, so I'm not very happy with this "solution".
>
> thanks for any help.
> David

A1 = ceil(20*rand(12,1));
A2 = ceil(20*rand(5,1));
A3 = ceil(20*rand(22,1));
A4 = ceil(20*rand(11,1));
A5 = ceil(20*rand(7,1));
A6 = ceil(20*rand(2,1));
A1(:,2) = rand(size(A1));
A2(:,2) = rand(size(A2));
A3(:,2) = rand(size(A3));
A4(:,2) = rand(size(A4));
A5(:,2) = rand(size(A5));
A6(:,2) = rand(size(A6));

% Engine
A = {A1 A2 A3 A4 A5 A6};

id = arrayfun(@(k) A{k}(:,1), 1:length(A), 'UniformOutput', false);
J = arrayfun(@(k) k+zeros(size(A{k},1),1), 1:length(A), 'UniformOutput', false);
data = arrayfun(@(k) A{k}(:,2), 1:length(A), 'UniformOutput', false);

id= cat(1,id{:});
[id, ~, I] = unique(id);
J = cat(1,J{:});
data = cat(1,data{:});
B = [id accumarray([I J], data)]

% Bruno

Subject: combining arrays

From: Oleg Komarov

Date: 2 Oct, 2010 22:22:05

Message: 4 of 17

> A1 = ceil(20*rand(12,1));
> A2 = ceil(20*rand(5,1));
> A3 = ceil(20*rand(22,1));
> A4 = ceil(20*rand(11,1));
> A5 = ceil(20*rand(7,1));
> A6 = ceil(20*rand(2,1));
> A1(:,2) = rand(size(A1));
> A2(:,2) = rand(size(A2));
> A3(:,2) = rand(size(A3));
> A4(:,2) = rand(size(A4));
> A5(:,2) = rand(size(A5));
> A6(:,2) = rand(size(A6));
>
> % Engine
> A = {A1 A2 A3 A4 A5 A6};
>
> id = arrayfun(@(k) A{k}(:,1), 1:length(A), 'UniformOutput', false);
> J = arrayfun(@(k) k+zeros(size(A{k},1),1), 1:length(A), 'UniformOutput', false);
> data = arrayfun(@(k) A{k}(:,2), 1:length(A), 'UniformOutput', false);
>
> id= cat(1,id{:});
> [id, ~, I] = unique(id);
> J = cat(1,J{:});
> data = cat(1,data{:});
> B = [id accumarray([I J], data)]
>
> % Bruno

Using the inputs generated by Bruno:

% Label the input with a colum of 1s for A1, 2s for A2 etc...
A1 = [1*ones(size(A1,1),1), A1];
A2 = [2*ones(size(A2,1),1), A2];
A3 = [3*ones(size(A3,1),1), A3];
A4 = [4*ones(size(A4,1),1), A4];
A5 = [5*ones(size(A5,1),1), A5];
A6 = [6*ones(size(A6,1),1), A6];

% Use the pivot function (http://www.mathworks.com/matlabcentral/fileexchange/26119)
Pivot([A1;A2;A3;A4;A5;A6],[],[],0)

The output is almost identical to Bruno's. The first row is a header row.

Oleg

Subject: combining arrays

From: Bruno Luong

Date: 2 Oct, 2010 22:26:03

Message: 5 of 17

"Roger Stafford"
> What is needed in my opinion is a method of using the already sorted first columns in some kind of repeated pairwise merge operation that therefore takes advantage of already being well along on a sorting operation.

Presumably something along this line does have a best complexity:

% Strictly sorted arrays
 A1 = unique(ceil(10*rand(1,5)))
 A2 = unique(ceil(10*rand(1,7)))

% Merge them
m=max(A1(end),A2(end))
A12 = find(accumarray(A1(:),1,[m 1],[],[],true)+accumarray(A2(:),1,[m 1],[],[],true))

Bruno

Subject: combining arrays

From: Bruno Luong

Date: 2 Oct, 2010 22:33:04

Message: 6 of 17

PS to post#4, Using directly SPARSE is more readable:

m=max(A1(end),A2(end))
A12 = find(sparse(A1,1,true,m,1)+ sparse(A2,1,true,m,1))

Bruno

Subject: combining arrays

From: Bruno Luong

Date: 2 Oct, 2010 22:41:04

Message: 7 of 17

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <i88bv0$po9$1@fred.mathworks.com>...
> PS to post#4, Using directly SPARSE is more readable:
>
> m=max(A1(end),A2(end))
> A12 = find(sparse(A1,1,true,m,1)+ sparse(A2,1,true,m,1))
>
> Bruno

PS again: actually I don't know what kind of complexity i.e., O(n) or O(n*log(n), when SPARSE is invoked with the input indexes that are already sorted.

Bruno

Subject: combining arrays

From: james bejon

Date: 2 Oct, 2010 23:09:03

Message: 8 of 17

Alternatively, (using Bruno's data but ensuring strictly increasing ids),

% Data
n = 20;
A1 = unique(ceil(n*rand(12,1)));
A2 = unique(ceil(n*rand(5,1)));
A3 = unique(ceil(n*rand(22,1)));
A4 = unique(ceil(n*rand(11,1)));
A5 = unique(ceil(n*rand(7,1)));
A6 = unique(ceil(n*rand(2,1)));
A1(:,2) = rand(size(A1));
A2(:,2) = rand(size(A2));
A3(:,2) = rand(size(A3));
A4(:,2) = rand(size(A4));
A5(:,2) = rand(size(A5));
A6(:,2) = rand(size(A6));
dat = cat(1, A1, A2, A3, A4, A5, A6);

% Engine
ind = dat(:, 1) + cumsum([0; (diff(dat(:, 1)) <= 0)] * n);
res = zeros(n, 6);
res(ind) = dat(:, 2);
res = [(1:n).', res]

Subject: combining arrays

From: james bejon

Date: 2 Oct, 2010 23:23:06

Message: 9 of 17

oops...should have had

A3 = unique(ceil(n*rand(22,1)));

as

A3 = unique(ceil(n*rand(20,1)));

to be consistent, but you get the idea...

Subject: combining arrays

From: Roger Stafford

Date: 3 Oct, 2010 01:27:03

Message: 10 of 17

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <i88cdv$pll$1@fred.mathworks.com>...
> "Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <i88bv0$po9$1@fred.mathworks.com>...
> > m=max(A1(end),A2(end))
> > A12 = find(sparse(A1,1,true,m,1)+ sparse(A2,1,true,m,1))
> > Bruno
> PS again: actually I don't know what kind of complexity i.e., O(n) or O(n*log(n), when SPARSE is invoked with the input indexes that are already sorted.
> Bruno
- - - - - - - - -
  Hi Bruno. That is a very ingenious method of merging. I too have some doubts about the what order algorithm it is operating at. I can imagine that laying down all the data in sparse form would probably be order(n), but putting it in order so that the 'find' action will be operating in successive index values sounds a little difficult and mysterious to me. It could well be order(n*log(n)) to get it all unwound properly.

  I can dream of functions, hopefully written by MathWorks, which would augment 'sort', 'unique', and perhaps a few other functions requiring sorting, having for example an added argument vector, [p1,p2,...,pm] which informs the function that the segments of the input vector are already sorted from 1 to p1, from p1+1 to p1+p2, ..., etc. Then the function is supposed to take advantage of this additional information and produce a very fast merging/sorting of all the data into a single vector (or a new collection of unique values, etc.) That should only be an order(n*log(m)) problem, rather than order(n*log(n)), for n the total length.

Roger Stafford

Subject: combining arrays

From: David Epstein

Date: 3 Oct, 2010 06:34:38

Message: 11 of 17

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <i88m57$djc$1@fred.mathworks.com>...
> Hi Bruno. That is a very ingenious method of merging. I too have some doubts about the what order algorithm it is operating at. I can imagine that laying down all the data in sparse form would probably be order(n), but putting it in order so that the 'find' action will be operating in successive index values sounds a little difficult and mysterious to me. It could well be order(n*log(n)) to get it all unwound properly.

Thanks to everyone for their responses---very educational and interesting.

One of the annoying things about using commercial software is that vital information is kept secret. However, one can guess that the indices in sparse matrices are kept in order wherever possible, as this will affect dramatically the speed of an arithmetic operation. There is probably an "isordered" Boolean in the C-code, keeping track. When a sparse function is created using ordered arrays, as in Bruno's code, I would imagine that "sparse" notices this and sets the "isordered" flag to TRUE. So my guess is O(n)-complexity with a small constant coefficient---so, as good as possible.

David

Subject: combining arrays

From: Bruno Luong

Date: 3 Oct, 2010 07:10:07

Message: 12 of 17

Hi all,

Few remarks

- Sparse matrix keeps the index internally sorted. This behavior is documented. So create a SPARSE matrix needs a sorting (and in the opposite way, we can use SPARSE command to sort something as I did).

- What I don't know is what kind of sorting algorithm SPARSE command uses. Most known algorithms will be O(n*(log(n)) in average on *unsorted* input, but it can be either O(n), O(n*log(n)) or O(n^2) on sorted input (the classical example of quicksort with bad pivot selection). My claims above work only when it's O(n).

- Is there a fundamental methodology difference between the implementation of ACCUMARRAY(..., TRUE) and SPARSE()? I don't know.

- Now the undocumented behavior that I use is not where you expect: it's FIND command. Yes. No where it's written it should returns the SORTED non-zero index. I queried Loren from Mathworks once about this very subject, and she told that: yes, it's undocumented because if the they ever find a better way to implement FIND command in the future they may use it and the output is not SORTED. IMHO, her answer is not very satisfying: Firstly: many codes out there rely on this undocumented behavior. Change it will causes big problem of backward compatibility. Secondly, it can be proved that there is no better FIND algorithm (in term of complexity) than just scanning the array. So to me there is practically no chance that the behavior of FIND will change, and it's pity that Mathworks won't document such behavior, so fundamental for programmers and algorithm developers who use Matlab.

- @Roger: It's not a difficult to write a MEX merging, but it will be compete against Matlab SORT/UNIQUE which are speedy and suitable in most of use in practices. If I have time I will do it and submit on FEX, just for the sake of completeness.

Bruno

Subject: combining arrays

From: Oleg Komarov

Date: 3 Oct, 2010 10:48:03

Message: 13 of 17

> - What I don't know is what kind of sorting algorithm SPARSE command uses. Most known algorithms will be O(n*(log(n)) in average on *unsorted* input, but it can be either O(n), O(n*log(n)) or O(n^2) on sorted input (the classical example of quicksort with bad pivot selection). My claims above work only when it's O(n).

I have a question about the complexity of an algorithm...since I don't have an IT background [and I don't understand the meaning of O(n), O(n*log(n)) or O(n^2)...] do you have any insightful link to share or any other reference?

Thanks

Oleg

Subject: combining arrays

From: Bruno Luong

Date: 3 Oct, 2010 11:20:06

Message: 14 of 17

"Oleg Komarov" <oleg.komarovRemove.this@hotmail.it> wrote in message <i89n13$1o5$1@fred.mathworks.com>...
> > - What I don't know is what kind of sorting algorithm SPARSE command uses. Most known algorithms will be O(n*(log(n)) in average on *unsorted* input, but it can be either O(n), O(n*log(n)) or O(n^2) on sorted input (the classical example of quicksort with bad pivot selection). My claims above work only when it's O(n).
>
> I have a question about the complexity of an algorithm...since I don't have an IT background [and I don't understand the meaning of O(n), O(n*log(n)) or O(n^2)...] do you have any insightful link to share or any other reference?
>

As always, the Wikipedia is has a short consist introduction:
http://en.wikipedia.org/wiki/Computational_complexity_theory

The complexity notion is very easy to grab. What is more difficult is to show the complexity of a specific algorithm, or a specific kind of problem has a certain lower limit complexity.

The strict framework for complexity theory is "Turing's machine", an ideal formal definition of modern computer. The subject more mathematics and logic than "computer science" than most people would think.

One of the unsolved problem is "N = NP?" is one of the 10 Millenium Mathematical Problems: and it belongs to http://en.wikipedia.org/wiki/P_%3D_NP_problem

Bruno

Subject: combining arrays

From: Bruno Luong

Date: 4 Oct, 2010 16:31:22

Message: 15 of 17

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <i89a8f$n3p$1@fred.mathworks.com>...

> It's not a difficult to write a MEX merging, but it will be compete against Matlab SORT/UNIQUE which are speedy and suitable in most of use in practices. If I have time I will do it and submit on FEX, just for the sake of completeness.
>

Here we go, here is the code (Mex required):
http://www.mathworks.com/matlabcentral/fileexchange/28930-merge-sorted-arrays

Bruno

Subject: combining arrays

From: David Epstein

Date: 4 Oct, 2010 17:36:20

Message: 16 of 17

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <i8cvgq$due$1@fred.mathworks.com>...
> "Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <i89a8f$n3p$1@fred.mathworks.com>...
>
> > It's not a difficult to write a MEX merging, but it will be compete against Matlab SORT/UNIQUE which are speedy and suitable in most of use in practices. If I have time I will do it and submit on FEX, just for the sake of completeness.
> >
>
> Here we go, here is the code (Mex required):
> http://www.mathworks.com/matlabcentral/fileexchange/28930-merge-sorted-arrays
>
> Bruno

Thanks for this, Bruno. However, the mex function needed for this thread should eliminate duplicates, like unique, and this function seems to maintain the duplicates, if I understand the code correctly. This is like sort, but not like unique.

Incidentally, you said that keeping indices sorted in sparse matrices was documented. I have looked around, and can find no mention of this in any documentation. Could you give a reference?

After looking at Bruno's trick for merging using sparse matrices, it seems to me that the whole original task I specified could be carried out using sparse. But I didn't pursue this because Bruno's suggested code using unique is very short so I doubt whether using sparse could do a better job (though it might have better asymptotic properties).

David

Subject: combining arrays

From: Bruno Luong

Date: 4 Oct, 2010 18:01:24

Message: 17 of 17

"David Epstein" <David.Epstein.spam@remove.warwick.ac.uk> wrote in message <i8d3ak$smn$1@fred.mathworks.com>...

>
> Thanks for this, Bruno. However, the mex function needed for this thread should eliminate duplicates, like unique, and this function seems to maintain the duplicates, if I understand the code correctly. This is like sort, but not like unique.

Easy to adapt:
1) just use DIFF to check duplicated elements then remove them
2) Use accumarray on sorted index

>
> Incidentally, you said that keeping indices sorted in sparse matrices was documented. I have looked around, and can find no mention of this in any documentation. Could you give a reference?

The internal representation of sparse matrix is documented in MEX documentation, and if you dig a little bit, you'll see the non-zeros elements are stored in a sorted manner by linear index.

>
> After looking at Bruno's trick for merging using sparse matrices, it seems to me that the whole original task I specified could be carried out using sparse. But I didn't pursue this because Bruno's suggested code using unique is very short so I doubt whether using sparse could do a better job (though it might have better asymptotic properties).

I have no specific comment beside that you should try profiling the code and see how much time UNIQUE takes. Roger is the only person I know who notices and expressed the lack of optimal algorithm in Matlab, other users seem to be pretty happy as it is....

Bruno

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us