http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975
MATLAB Central Newsreader  combining arrays
Feed for thread: combining arrays
enus
©19942015 by MathWorks, Inc.
webmaster@mathworks.com
MATLAB Central Newsreader
http://blogs.law.harvard.edu/tech/rss
60
MathWorks
http://www.mathworks.com/images/membrane_icon.gif

Sat, 02 Oct 2010 19:58:03 +0000
combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#784691
David Epstein
I have six arrays, A1, ... ,A6, each with about 50K rows and exactly two columns. The first column of each array is a strictly increasing sequence of nonnegative integers. The second column consists of doubles between 0 and 1.<br>
<br>
I want as output an array with 7 columns. The set of first column entries is equal to the union of the six sets of first column entries, that is, NO repetition, for example as given by the Matlab function unique. In position (k,i+1), we have 0 if k does not appear in the first column of Ai, and Ai(k,2) if k does appear. Here i = 1,2,3,4,5 or 6.<br>
<br>
What is a good way to carry out this computation?<br>
I can do it using accumarray six times, but maybe in the future I will have to deal with more than 6 arrays, so I'm not very happy with this "solution".<br>
<br>
thanks for any help.<br>
David

Sat, 02 Oct 2010 21:12:04 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#784699
Roger Stafford
"David Epstein" <David.Epstein.spam@remove.warwick.ac.uk> wrote in message <i882sb$hfj$1@fred.mathworks.com>...<br>
> I have six arrays, A1, ... ,A6, each with about 50K rows and exactly two columns. The first column of each array is a strictly increasing sequence of nonnegative integers. The second column consists of doubles between 0 and 1.<br>
> <br>
> I want as output an array with 7 columns. The set of first column entries is equal to the union of the six sets of first column entries, that is, NO repetition, for example as given by the Matlab function unique. In position (k,i+1), we have 0 if k does not appear in the first column of Ai, and Ai(k,2) if k does appear. Here i = 1,2,3,4,5 or 6.<br>
> <br>
> What is a good way to carry out this computation?<br>
> I can do it using accumarray six times, but maybe in the future I will have to deal with more than 6 arrays, so I'm not very happy with this "solution".<br>
> <br>
> thanks for any help.<br>
> David<br>
         <br>
Instead of trying to give you a best solution, David, I will be content with making comments on algorithmic efficiency. If the sum, N, of the various column lengths in your Ai's is very large, the greatest time consumer will be determining that first column in the result. If 'unique' or 'sort' are used on the concatenation of the first columns, you will have an order(N*log(N)) algorithm on your hands which overshadows all other subsequent computations with the second columns, these presumably being only order(N). What is needed in my opinion is a method of using the already sorted first columns in some kind of repeated pairwise merge operation that therefore takes advantage of already being well along on a sorting operation. Unfortunately, I don't know of any matlab functions that do this efficiently, which suggests that perhaps you ought to be thinking along the lines of doing the <br>
merging using a good mex function. In your particular problem you would merge the first pair of columns, then the second pair, and then the third pair. Finally it will take two more merges to finish the operation and you would have something like an order(3*N) algorithm instead of order(N*log(N)). <br>
<br>
Roger Stafford

Sat, 02 Oct 2010 21:47:03 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#784707
Bruno Luong
"David Epstein" <David.Epstein.spam@remove.warwick.ac.uk> wrote in message <i882sb$hfj$1@fred.mathworks.com>...<br>
> I have six arrays, A1, ... ,A6, each with about 50K rows and exactly two columns. The first column of each array is a strictly increasing sequence of nonnegative integers. The second column consists of doubles between 0 and 1.<br>
> <br>
> I want as output an array with 7 columns. The set of first column entries is equal to the union of the six sets of first column entries, that is, NO repetition, for example as given by the Matlab function unique. In position (k,i+1), we have 0 if k does not appear in the first column of Ai, and Ai(k,2) if k does appear. Here i = 1,2,3,4,5 or 6.<br>
> <br>
> What is a good way to carry out this computation?<br>
> I can do it using accumarray six times, but maybe in the future I will have to deal with more than 6 arrays, so I'm not very happy with this "solution".<br>
> <br>
> thanks for any help.<br>
> David<br>
<br>
A1 = ceil(20*rand(12,1));<br>
A2 = ceil(20*rand(5,1));<br>
A3 = ceil(20*rand(22,1));<br>
A4 = ceil(20*rand(11,1));<br>
A5 = ceil(20*rand(7,1));<br>
A6 = ceil(20*rand(2,1));<br>
A1(:,2) = rand(size(A1));<br>
A2(:,2) = rand(size(A2));<br>
A3(:,2) = rand(size(A3));<br>
A4(:,2) = rand(size(A4));<br>
A5(:,2) = rand(size(A5));<br>
A6(:,2) = rand(size(A6));<br>
<br>
% Engine<br>
A = {A1 A2 A3 A4 A5 A6};<br>
<br>
id = arrayfun(@(k) A{k}(:,1), 1:length(A), 'UniformOutput', false);<br>
J = arrayfun(@(k) k+zeros(size(A{k},1),1), 1:length(A), 'UniformOutput', false);<br>
data = arrayfun(@(k) A{k}(:,2), 1:length(A), 'UniformOutput', false);<br>
<br>
id= cat(1,id{:});<br>
[id, ~, I] = unique(id);<br>
J = cat(1,J{:});<br>
data = cat(1,data{:});<br>
B = [id accumarray([I J], data)]<br>
<br>
% Bruno

Sat, 02 Oct 2010 22:22:05 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#784718
Oleg Komarov
> A1 = ceil(20*rand(12,1));<br>
> A2 = ceil(20*rand(5,1));<br>
> A3 = ceil(20*rand(22,1));<br>
> A4 = ceil(20*rand(11,1));<br>
> A5 = ceil(20*rand(7,1));<br>
> A6 = ceil(20*rand(2,1));<br>
> A1(:,2) = rand(size(A1));<br>
> A2(:,2) = rand(size(A2));<br>
> A3(:,2) = rand(size(A3));<br>
> A4(:,2) = rand(size(A4));<br>
> A5(:,2) = rand(size(A5));<br>
> A6(:,2) = rand(size(A6));<br>
> <br>
> % Engine<br>
> A = {A1 A2 A3 A4 A5 A6};<br>
> <br>
> id = arrayfun(@(k) A{k}(:,1), 1:length(A), 'UniformOutput', false);<br>
> J = arrayfun(@(k) k+zeros(size(A{k},1),1), 1:length(A), 'UniformOutput', false);<br>
> data = arrayfun(@(k) A{k}(:,2), 1:length(A), 'UniformOutput', false);<br>
> <br>
> id= cat(1,id{:});<br>
> [id, ~, I] = unique(id);<br>
> J = cat(1,J{:});<br>
> data = cat(1,data{:});<br>
> B = [id accumarray([I J], data)]<br>
> <br>
> % Bruno<br>
<br>
Using the inputs generated by Bruno:<br>
<br>
% Label the input with a colum of 1s for A1, 2s for A2 etc...<br>
A1 = [1*ones(size(A1,1),1), A1];<br>
A2 = [2*ones(size(A2,1),1), A2];<br>
A3 = [3*ones(size(A3,1),1), A3];<br>
A4 = [4*ones(size(A4,1),1), A4];<br>
A5 = [5*ones(size(A5,1),1), A5];<br>
A6 = [6*ones(size(A6,1),1), A6];<br>
<br>
% Use the pivot function (<a href="http://www.mathworks.com/matlabcentral/fileexchange/26119">http://www.mathworks.com/matlabcentral/fileexchange/26119</a>)<br>
Pivot([A1;A2;A3;A4;A5;A6],[],[],0)<br>
<br>
The output is almost identical to Bruno's. The first row is a header row.<br>
<br>
Oleg

Sat, 02 Oct 2010 22:26:03 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#784719
Bruno Luong
"Roger Stafford"<br>
> What is needed in my opinion is a method of using the already sorted first columns in some kind of repeated pairwise merge operation that therefore takes advantage of already being well along on a sorting operation. <br>
<br>
Presumably something along this line does have a best complexity:<br>
<br>
% Strictly sorted arrays<br>
A1 = unique(ceil(10*rand(1,5)))<br>
A2 = unique(ceil(10*rand(1,7)))<br>
<br>
% Merge them<br>
m=max(A1(end),A2(end))<br>
A12 = find(accumarray(A1(:),1,[m 1],[],[],true)+accumarray(A2(:),1,[m 1],[],[],true))<br>
<br>
Bruno

Sat, 02 Oct 2010 22:33:04 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#784720
Bruno Luong
PS to post#4, Using directly SPARSE is more readable:<br>
<br>
m=max(A1(end),A2(end))<br>
A12 = find(sparse(A1,1,true,m,1)+ sparse(A2,1,true,m,1))<br>
<br>
Bruno

Sat, 02 Oct 2010 22:41:04 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#784721
Bruno Luong
"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <i88bv0$po9$1@fred.mathworks.com>...<br>
> PS to post#4, Using directly SPARSE is more readable:<br>
> <br>
> m=max(A1(end),A2(end))<br>
> A12 = find(sparse(A1,1,true,m,1)+ sparse(A2,1,true,m,1))<br>
> <br>
> Bruno<br>
<br>
PS again: actually I don't know what kind of complexity i.e., O(n) or O(n*log(n), when SPARSE is invoked with the input indexes that are already sorted. <br>
<br>
Bruno

Sat, 02 Oct 2010 23:09:03 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#784722
james bejon
Alternatively, (using Bruno's data but ensuring strictly increasing ids),<br>
<br>
% Data<br>
n = 20;<br>
A1 = unique(ceil(n*rand(12,1)));<br>
A2 = unique(ceil(n*rand(5,1)));<br>
A3 = unique(ceil(n*rand(22,1)));<br>
A4 = unique(ceil(n*rand(11,1)));<br>
A5 = unique(ceil(n*rand(7,1)));<br>
A6 = unique(ceil(n*rand(2,1)));<br>
A1(:,2) = rand(size(A1));<br>
A2(:,2) = rand(size(A2));<br>
A3(:,2) = rand(size(A3));<br>
A4(:,2) = rand(size(A4));<br>
A5(:,2) = rand(size(A5));<br>
A6(:,2) = rand(size(A6));<br>
dat = cat(1, A1, A2, A3, A4, A5, A6);<br>
<br>
% Engine<br>
ind = dat(:, 1) + cumsum([0; (diff(dat(:, 1)) <= 0)] * n);<br>
res = zeros(n, 6);<br>
res(ind) = dat(:, 2);<br>
res = [(1:n).', res]

Sat, 02 Oct 2010 23:23:06 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#784723
james bejon
oops...should have had<br>
<br>
A3 = unique(ceil(n*rand(22,1)));<br>
<br>
as<br>
<br>
A3 = unique(ceil(n*rand(20,1)));<br>
<br>
to be consistent, but you get the idea...

Sun, 03 Oct 2010 01:27:03 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#784726
Roger Stafford
"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <i88cdv$pll$1@fred.mathworks.com>...<br>
> "Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <i88bv0$po9$1@fred.mathworks.com>...<br>
> > m=max(A1(end),A2(end))<br>
> > A12 = find(sparse(A1,1,true,m,1)+ sparse(A2,1,true,m,1))<br>
> > Bruno<br>
> PS again: actually I don't know what kind of complexity i.e., O(n) or O(n*log(n), when SPARSE is invoked with the input indexes that are already sorted. <br>
> Bruno<br>
        <br>
Hi Bruno. That is a very ingenious method of merging. I too have some doubts about the what order algorithm it is operating at. I can imagine that laying down all the data in sparse form would probably be order(n), but putting it in order so that the 'find' action will be operating in successive index values sounds a little difficult and mysterious to me. It could well be order(n*log(n)) to get it all unwound properly.<br>
<br>
I can dream of functions, hopefully written by MathWorks, which would augment 'sort', 'unique', and perhaps a few other functions requiring sorting, having for example an added argument vector, [p1,p2,...,pm] which informs the function that the segments of the input vector are already sorted from 1 to p1, from p1+1 to p1+p2, ..., etc. Then the function is supposed to take advantage of this additional information and produce a very fast merging/sorting of all the data into a single vector (or a new collection of unique values, etc.) That should only be an order(n*log(m)) problem, rather than order(n*log(n)), for n the total length.<br>
<br>
Roger Stafford

Sun, 03 Oct 2010 06:34:38 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#784749
David Epstein
"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <i88m57$djc$1@fred.mathworks.com>...<br>
> Hi Bruno. That is a very ingenious method of merging. I too have some doubts about the what order algorithm it is operating at. I can imagine that laying down all the data in sparse form would probably be order(n), but putting it in order so that the 'find' action will be operating in successive index values sounds a little difficult and mysterious to me. It could well be order(n*log(n)) to get it all unwound properly.<br>
<br>
Thanks to everyone for their responsesvery educational and interesting.<br>
<br>
One of the annoying things about using commercial software is that vital information is kept secret. However, one can guess that the indices in sparse matrices are kept in order wherever possible, as this will affect dramatically the speed of an arithmetic operation. There is probably an "isordered" Boolean in the Ccode, keeping track. When a sparse function is created using ordered arrays, as in Bruno's code, I would imagine that "sparse" notices this and sets the "isordered" flag to TRUE. So my guess is O(n)complexity with a small constant coefficientso, as good as possible.<br>
<br>
David

Sun, 03 Oct 2010 07:10:07 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#784754
Bruno Luong
Hi all, <br>
<br>
Few remarks<br>
<br>
 Sparse matrix keeps the index internally sorted. This behavior is documented. So create a SPARSE matrix needs a sorting (and in the opposite way, we can use SPARSE command to sort something as I did).<br>
<br>
 What I don't know is what kind of sorting algorithm SPARSE command uses. Most known algorithms will be O(n*(log(n)) in average on *unsorted* input, but it can be either O(n), O(n*log(n)) or O(n^2) on sorted input (the classical example of quicksort with bad pivot selection). My claims above work only when it's O(n).<br>
<br>
 Is there a fundamental methodology difference between the implementation of ACCUMARRAY(..., TRUE) and SPARSE()? I don't know.<br>
<br>
 Now the undocumented behavior that I use is not where you expect: it's FIND command. Yes. No where it's written it should returns the SORTED nonzero index. I queried Loren from Mathworks once about this very subject, and she told that: yes, it's undocumented because if the they ever find a better way to implement FIND command in the future they may use it and the output is not SORTED. IMHO, her answer is not very satisfying: Firstly: many codes out there rely on this undocumented behavior. Change it will causes big problem of backward compatibility. Secondly, it can be proved that there is no better FIND algorithm (in term of complexity) than just scanning the array. So to me there is practically no chance that the behavior of FIND will change, and it's pity that Mathworks won't document such behavior, so fundamental for programmers and algorithm developers who use Matlab. <br>
<br>
 @Roger: It's not a difficult to write a MEX merging, but it will be compete against Matlab SORT/UNIQUE which are speedy and suitable in most of use in practices. If I have time I will do it and submit on FEX, just for the sake of completeness.<br>
<br>
Bruno

Sun, 03 Oct 2010 10:48:03 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#784772
Oleg Komarov
>  What I don't know is what kind of sorting algorithm SPARSE command uses. Most known algorithms will be O(n*(log(n)) in average on *unsorted* input, but it can be either O(n), O(n*log(n)) or O(n^2) on sorted input (the classical example of quicksort with bad pivot selection). My claims above work only when it's O(n).<br>
<br>
I have a question about the complexity of an algorithm...since I don't have an IT background [and I don't understand the meaning of O(n), O(n*log(n)) or O(n^2)...] do you have any insightful link to share or any other reference?<br>
<br>
Thanks <br>
<br>
Oleg

Sun, 03 Oct 2010 11:20:06 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#784773
Bruno Luong
"Oleg Komarov" <oleg.komarovRemove.this@hotmail.it> wrote in message <i89n13$1o5$1@fred.mathworks.com>...<br>
> >  What I don't know is what kind of sorting algorithm SPARSE command uses. Most known algorithms will be O(n*(log(n)) in average on *unsorted* input, but it can be either O(n), O(n*log(n)) or O(n^2) on sorted input (the classical example of quicksort with bad pivot selection). My claims above work only when it's O(n).<br>
> <br>
> I have a question about the complexity of an algorithm...since I don't have an IT background [and I don't understand the meaning of O(n), O(n*log(n)) or O(n^2)...] do you have any insightful link to share or any other reference?<br>
> <br>
<br>
As always, the Wikipedia is has a short consist introduction: <br>
<a href="http://en.wikipedia.org/wiki/Computational_complexity_theory">http://en.wikipedia.org/wiki/Computational_complexity_theory</a><br>
<br>
The complexity notion is very easy to grab. What is more difficult is to show the complexity of a specific algorithm, or a specific kind of problem has a certain lower limit complexity.<br>
<br>
The strict framework for complexity theory is "Turing's machine", an ideal formal definition of modern computer. The subject more mathematics and logic than "computer science" than most people would think.<br>
<br>
One of the unsolved problem is "N = NP?" is one of the 10 Millenium Mathematical Problems: and it belongs to <a href="http://en.wikipedia.org/wiki/P_%3D_NP_problem">http://en.wikipedia.org/wiki/P_%3D_NP_problem</a><br>
<br>
Bruno

Mon, 04 Oct 2010 16:31:22 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#785081
Bruno Luong
"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <i89a8f$n3p$1@fred.mathworks.com>...<br>
<br>
> It's not a difficult to write a MEX merging, but it will be compete against Matlab SORT/UNIQUE which are speedy and suitable in most of use in practices. If I have time I will do it and submit on FEX, just for the sake of completeness.<br>
> <br>
<br>
Here we go, here is the code (Mex required):<br>
<a href="http://www.mathworks.com/matlabcentral/fileexchange/28930mergesortedarrays">http://www.mathworks.com/matlabcentral/fileexchange/28930mergesortedarrays</a><br>
<br>
Bruno

Mon, 04 Oct 2010 17:36:20 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#785100
David Epstein
"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <i8cvgq$due$1@fred.mathworks.com>...<br>
> "Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <i89a8f$n3p$1@fred.mathworks.com>...<br>
> <br>
> > It's not a difficult to write a MEX merging, but it will be compete against Matlab SORT/UNIQUE which are speedy and suitable in most of use in practices. If I have time I will do it and submit on FEX, just for the sake of completeness.<br>
> > <br>
> <br>
> Here we go, here is the code (Mex required):<br>
> <a href="http://www.mathworks.com/matlabcentral/fileexchange/28930mergesortedarrays">http://www.mathworks.com/matlabcentral/fileexchange/28930mergesortedarrays</a><br>
> <br>
> Bruno<br>
<br>
Thanks for this, Bruno. However, the mex function needed for this thread should eliminate duplicates, like unique, and this function seems to maintain the duplicates, if I understand the code correctly. This is like sort, but not like unique.<br>
<br>
Incidentally, you said that keeping indices sorted in sparse matrices was documented. I have looked around, and can find no mention of this in any documentation. Could you give a reference?<br>
<br>
After looking at Bruno's trick for merging using sparse matrices, it seems to me that the whole original task I specified could be carried out using sparse. But I didn't pursue this because Bruno's suggested code using unique is very short so I doubt whether using sparse could do a better job (though it might have better asymptotic properties).<br>
<br>
David

Mon, 04 Oct 2010 18:01:24 +0000
Re: combining arrays
http://www.mathworks.com/matlabcentral/newsreader/view_thread/292975#785106
Bruno Luong
"David Epstein" <David.Epstein.spam@remove.warwick.ac.uk> wrote in message <i8d3ak$smn$1@fred.mathworks.com>...<br>
<br>
> <br>
> Thanks for this, Bruno. However, the mex function needed for this thread should eliminate duplicates, like unique, and this function seems to maintain the duplicates, if I understand the code correctly. This is like sort, but not like unique.<br>
<br>
Easy to adapt: <br>
1) just use DIFF to check duplicated elements then remove them<br>
2) Use accumarray on sorted index<br>
<br>
> <br>
> Incidentally, you said that keeping indices sorted in sparse matrices was documented. I have looked around, and can find no mention of this in any documentation. Could you give a reference?<br>
<br>
The internal representation of sparse matrix is documented in MEX documentation, and if you dig a little bit, you'll see the nonzeros elements are stored in a sorted manner by linear index. <br>
<br>
> <br>
> After looking at Bruno's trick for merging using sparse matrices, it seems to me that the whole original task I specified could be carried out using sparse. But I didn't pursue this because Bruno's suggested code using unique is very short so I doubt whether using sparse could do a better job (though it might have better asymptotic properties).<br>
<br>
I have no specific comment beside that you should try profiling the code and see how much time UNIQUE takes. Roger is the only person I know who notices and expressed the lack of optimal algorithm in Matlab, other users seem to be pretty happy as it is....<br>
<br>
Bruno