Asked by olivier espeli
on 28 Jan 2015

I need to fill a large matrix with data coming from a smaller matrix. Both matrices contains a first column with an index and a second column with the results of an experiment. I need to combine the results according to their index. For example I have a matrix A:

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 1

And a matrix B

1 10

2 10

4 10

7 10

I would like to combine them to obtain the matrix C with 3 columns (column 1 for index, column 2 results from A, column 3 results from B)

1 1 10

2 1 10

3 1 NaN

4 1 10

5 1 NaN

6 1 NaN

7 1 10

8 1 NaN

9 1 NaN

I have to do that on matrices with millions of rows, therefore methods limiting calculation time would be welcome. Thank you very much for any help.

Answer by Matz Johansson Bergström
on 28 Jan 2015

Edited by Matz Johansson Bergström
on 28 Jan 2015

Accepted Answer

Without knowing the size of the matrix, it might be a good idea to use a sparse matrix, instead of filling the non existant elements with NaN. Now, this is not exactly what you did, but it will give you a good hint of what you can accomplish with a sparse structure.

I did this fairly quick, because it is late where I live, but it should work fine

%create the example vectors

A = [(1:9)',ones(9,1)];

B = [1,2,4,7]';

B = [B, 10+0*B];

%putting the data into a sparse matrix

M = sparse([]);

M(A(:,1), 1) = A(:,2);

M(B(:,1), 2) = B(:,2);

Note that I use the first column in A as an index in M and the same goes for B. When I add the elements I add them in the first column of M, then the second column, separating where the values come frome.

I hope this helps.

Answer by Shoaibur Rahman
on 28 Jan 2015

A = [(1:9)' ones(9,1)];

B = [1 10; 2 10; 4 10; 7 10];

C = NaN(size(A,1),1);

C(B(:,1)) = B(:,2);

C = [A C]

Answer by Image Analyst
on 28 Jan 2015

Try this:

A = [...

1 1

2 1

3 1

4 5

5 1

6 1

7 9

8 1

9 1]

B = [...

1 10

2 20

4 70

7 90]

% Find out what the max index could possibly be.

maxRow = max([A(:,1); B(:,1)])

C = zeros(maxRow, 3); % Preallocate C.

% Assign all possible indices to column 1 of C.

C(:,1)=1:maxRow;

% Assign column 2 of A to column 2 of C.

C(A(:,1),2) = A(:, 2);

% Assign column 2 of B to column 3 of C.

C(B(:,1), 3) = B(:, 2)

In the command window:

C =

1 1 10

2 1 20

3 1 0

4 5 70

5 1 0

6 1 0

7 9 90

8 1 0

9 1 0

It's vectorized so it should be pretty fast. Anyway, a million rows is not that many. Here it is with a million rows and it took 0.03 seconds:

numberOfRows = 1000000;

A = int32(randi(99, [numberOfRows, 2]));

B = int32(randi(99, [numberOfRows, 2]));

tic;

% Find out what the max index could possibly be.

maxRow = max([A(:,1); B(:,1)]);

C = zeros(maxRow, 3); % Preallocate C.

% Assign all possible indices to column 1 of C.

C(:,1)=1:maxRow;

% Assign column 2 of A to column 2 of C.

C(A(:,1),2) = A(:, 2);

% Assign column 2 of B to column 3 of C.

C(B(:,1), 3) = B(:, 2);

toc;

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.