Arrange Matrix - Same Values of a Row to one Column

Question

0 votes

Hello, I have a huge database that needs to be re-arranged:

System_Values ; System_Name
; ABC1
; ABC2
; ABC1
; ABC3
; ABC1
; ABC3
; ABC2
; ABC3
; ABC2
; ABC2

So as you can see there are three different system names ABC1..3. ABC1 for example has several values in the first column: 1,8 and 0. I want matlab to find the systems and adding the values of each system in one column:

ABC1 ; ABC2 ; ABC3
1 ; 3 ; 7
8 ; 8 ; 5
0 ; 7 ; 4
empty ; 6 ; empty

I have absolutely no idea how to do this in a matlab script. I hope you can help me, please let me know if you don`t understand my problem.

Thanks

4 Comments
Show 2 older comments Hide 2 older comments

Guillaume on 26 Jan 2015

Edited: Guillaume on 26 Jan 2015

A 2d cell array has to have the same number of rows for all columns. Hence if one column is made up of 130 rows, so are all the other. So what do you put in there after the first 5 rows for the shorter column? NaN? 0? empty matrix?

The other option is to have a 1d row cell array made up of 1d column cell arrays. The inner cell arrays can then have different number of rows. It's lot more unwieldy though.

Hence my question.

Marv on 26 Jan 2015

All other rows which are smaller than 130 for another system shall be empty

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Cedric on 26 Jan 2015

Edited: Cedric on 27 Jan 2015

Open in MATLAB Online

4 votes

It is unclear how you want to export the result, but here is how you can extract the data by system name:

 isMatch   = strcmpi( 'ABC1', System_Name ) ;
 data_ABC1 = System_Values(isMatch) ;

and you can do the same for ABC2 and ABC3, or build a loop like:

 nSys   = 3 ;
 names  = cell( 1, nSys ) ;
 values = cell( 1, nSys ) ;
 for sysId = 1 : nSys
    names{sysId}  = sprintf( 'ABC%d', sysId ) ;
    isMatch       = strcmpi( names{sysId}, System_Name ) ;
    values{sysId} = System_Values(isMatch) ;
 end

EDIT 01/26/2015@7:38pm: taking in account your last comments, here is a solution.

 sourceData = {1 'ABC1';3 'ABC2'; 8 'ABC1'; 7 'ABC3'; 0 'ABC1'; ...
               5 'ABC3'; 8 'ABC2'; 4 'ABC3'; 7 'ABC2'; 6 'ABC2'} ;
 [names, ~, cId] = unique( sourceData(:, 2) ) ;
 buf      = bsxfun( @eq, 1:max( cId ), cId ) ;
 rId      = sum( cumsum( buf ) .* buf, 2 ) ;
 destData = [names.'; cell( max( rId ), numel( names ))] ;
 destData(sub2ind( size(destData), 1+rId, cId )) = sourceData(:, 1) ;

which leads to:

 >> destData
 destData = 
    'ABC1'    'ABC2'    'ABC3'
    [   1]    [   3]    [   7]
    [   8]    [   8]    [   5]
    [   0]    [   7]    [   4]
        []    [   6]        []

Both this and Guillaume's solutions have the disadvantage that you won't understand them if you are not familiar with MATLAB though. If I were you, I would go for a much simpler solution based on STRCMPI and a loop, and the call to UNIQUE that Star Strider proposed first.

Let us know if you are interested. It would involve a bit more code, but it would have the advantage that you would understand it better (especially if you come back to the code a few months/years from now).

EDIT 01/27/2015@9:30am: here is a solution that involves only basic MATLAB operations. My advice is that you try to fully understand it, even if you end up implementing Guillaume or my solution above, because they are more concise but significantly more complicated.

 % - Init: extract unique names, initialize data structures for storing 
 %  both cell data and statistics (as a numeric array).
 names  = unique( sourceData(:,2) ) ;
 nNames = numel( names ) ;
 data   = cell( 1, nNames ) ;
 stat   = zeros( 4, nNames ) ;
 % - Extract data and compute stat.
 for nId = 1 : nNames
    % Extract data for current name, store as cell array and convert
    % to num for computing statistics.
    isMatch      = strcmpi( names{nId}, sourceData(:,2) ) ;
    data{nId}    = sourceData(isMatch,1) ;
    dataNum      = cell2mat( data{nId} ) ;
    % Compute statistics, add extra row with data count (useful for
    % initializing destination array later).
    stat(1, nId) = mean( dataNum ) ;
    stat(2, nId) = var( dataNum ) ;
    stat(3, nId) = std( dataNum ) ;
    stat(4, nId) = numel( dataNum ) ;
 end
 % - Build output cell array.
 destData = cell( max( stat(4,:) ), nNames ) ;
 % Copy data cell arrays to destination cell array.
 for nId = 1 : nNames
    destData(1:stat(4,nId), nId) = data{nId} ;
 end
 % Frame with names and statistics. Being a numeric array, the latter
 % is converted to cell array first.
 destData = [names.'; destData; num2cell( stat(1:3,:) )] ;
 % - Export, e.g. to XLSX file.
 xlswrite( 'Output.xlsx', destData ) ;

2 Comments
Show None Hide None

Guillaume on 27 Jan 2015

I think Cedric's latest edition is probably the best way for a beginner to process and export the data indeed.

The downside is that the format of the output destData is not very good for visualisation within matlab. If you want to visualise it in matlab, I'd then convert the cells/matrices into tables. I'd also keep the stats and data separate.

Cedric on 27 Jan 2015

Edited: Cedric on 27 Jan 2015

.. and then maybe learning how to use Tables; but our solutions based on BSXFUN, ACCUMARRAY, ARRAY/CELLFUN, will require a lot of work if you want to understand what we do.

Note that I could update this solution (based on explicit loops) so it uses a numeric array of data. The only information that I/we still need is what kind of export you need exactly. You showed the content and said "text file", but is it really ASCII/plain text, or wouldn't you prefer an Excel file?

Sign in to comment.

Answer 2

Star Strider on 26 Jan 2015

Edited: Star Strider on 26 Jan 2015

Open in MATLAB Online

1 vote

Edited to use the new (20:00 GMT) version of ‘M’:

This works:

M = {1  'ABC1'
    3  'ABC2'
    8  'ABC1'
    7  'ABC3'
    0  'ABC1'
    5  'ABC3'
    8  'ABC2'
    4  'ABC3'
    7  'ABC2'
    6  'ABC2'};
[ABC, ia, ic] = unique(M(:,2));
for k1 = 1:size(ABC,1)
    ABCv{k1} = [ABC{k1} M(ic == k1, 1)']';
end
C1 = ABCv{1}                                    % Display Output
C2 = ABCv{2}
C3 = ABCv{3}

produces:

C1 = 
      'ABC1'       
      [1.0000e+000]
      [8.0000e+000]
      [0.0000e+000]
C2 = 
      'ABC2'       
      [3.0000e+000]
      [8.0000e+000]
      [7.0000e+000]
      [6.0000e+000]
C3 = 
      'ABC3'       
      [7.0000e+000]
      [5.0000e+000]
      [4.0000e+000]

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Answer 3

Guillaume on 26 Jan 2015

Open in MATLAB Online

1 vote

accumarray is quite good for this. The difficulty is filling shorter columns with empty:

C = {1 'ABC1';3 'ABC2'; 8 'ABC1'; 7 'ABC3'; 0 'ABC1'; 5 'ABC3'; 8 'ABC2'; 4 'ABC3'; 7 'ABC2'; 6 'ABC2'}; %demo data
[names, ~, in] = unique(C(:, 2)); %get unique names and their position
namesdata = accumarray(in, cell2mat(C(:, 1)), [], @(x) {x}); %redistribute name data into cell arrays
%now what's left is to expand all the cell arrays to the same size
maxsize = max(cellfun(@numel, namesdata)); %get max length
namesdata = cellfun(@(c) [num2cell(c); cell(maxsize-numel(c), 1)], namesdata, 'UniformOutput', false); %fill shorter columns with empty cells
result = [names'; horzcat(namesdata{:})] %and concatenate the lot

1 Comment
Show -1 older comments Hide -1 older comments

Guillaume on 27 Jan 2015

Open in MATLAB Online

Following your comments, what you really need are tables. Here is one way to do it:

C = {1 'ABC1';3 'ABC2'; 8 'ABC1'; 7 'ABC3'; 0 'ABC1'; 5 'ABC3'; 8 'ABC2'; 4 'ABC3'; 7 'ABC2'; 6 'ABC2'}; %demo data
[names, ~, in] = unique(C(:, 2)); %get unique names and their position
namesdata = accumarray(in, cell2mat(C(:, 1)), [], @(x) {x}); %redistribute name data into cell arrays
maxsize = max(cellfun(@numel, namesdata)); %get max length
namesdata = cell2mat(cellfun(@(c) [c; nan(maxsize-numel(c), 1)], namesdata, 'UniformOutput', false)');

At this point namesdata is now a matrix of your data, with NaN for empty elements. You can make a table with:

tdata = array2table(namesdata, 'VariableNames', names)

To create the statistics table:

stats = [nanmean(namesdata); nanvar(namesdata, 1); nanstd(namesdata, 1)]; %requires stat toolbox
tstats = array2table(stats, 'VariableNames', names, 'RowNames', {'mean', 'variance', 'standard_deviation'})

To write it to a text file:

writetable([tdata;tstats], 'somefile.txt', 'WriteRowNames', true);

Sign in to comment.

Answer 4

Marv on 26 Jan 2015

0 votes

The system names are also not defined like a numeration ABC1,2,3,4 ... The names are random but there are several identical system names which have values like in the example above. Matlab should find these same names and put its values together into a vector or matrix.

So I think the solutions here dont work so far :(

7 Comments
Show 5 older comments Hide 5 older comments

Star Strider on 27 Jan 2015

My pleasure, Cedric! Your significant expertise is always welcome. I always learn from your Answers and Comments, many sending me back to the Documentation to learn something new about a particular function. I hope your absence was simply an essentially voluntary hiatus. Guillaume is keeping the bar high in your absence.

The spammers still exist, but the (albeit imperfect) spam filter actually traps a lot of them. (I doubt ‘cracked’ triggered it, but if it did, it will catch this one, too. Testing...)

The spam filter seems to be heuristic, so some spam still show up, but most get quarantined in Potentially Spam Questions (‘Flagged Content’ -> ‘Potentially Spam Questions’ in the upper right). Those with 3K privileges can get others’ erroneously-flagged Questions, Answers, and Comments out of ’spam purgatory’, or you can by clicking on the ‘Contact’ link in the spam banner above your Answer or Comment and telling the powers-that-be that your contribution is not spam. I had to do that a couple times, and I’ve rescued a few (such as yours) that I found.

The new default Profile page is causing some controversy (I’ve complained), but if you click on ‘My Profile’ on the upper left above ‘Ask a Question’, you’re back to the previous version, at least for your own Profile. It fortunately seems to be a work in progress, but why it got ‘fixed’ when it wasn’t ‘broken’ mystifies me.

And then there’s HG2...

Cedric on 27 Jan 2015

Edited: Cedric on 27 Jan 2015

Thank you Star Strider! It is a voluntary hiatus driven by the fact that I have too much work AND because I know too well how "too much" time I can spend answering questions if I allow me to interact on the forum.

Thank you for the link and for the information about these new features. I am not too fond of the new design for profiles either. Hopefully, as you say, it is a work in progress!

Star Strider on 27 Jan 2015

My pleasure! I’m glad it is a voluntary hiatus. There are worse things in life than too much work, providing you’re getting paid appropriately. (Harking back to my Internal Medicine residency when my workload seemed infinite but our pay on an hourly basis was about one third the minimum wage.) ‘Answers’ can definitely expand to fill all available time, so I’ll likely reduce my participation in a few months.

I had to ask about getting access to the ‘old’ Answers features, since it was very much less than obvious. It really hinders spam-deletion efforts. I am assured that this is ‘Version 1.0’ of the ‘new’ Answers, but then the ‘old’ Answers wasn’t broken, so the need to ‘fix’ it still mystifies.

Sign in to comment.

Answer 5

Marv on 27 Jan 2015

Open in MATLAB Online

0 votes

Ok I tried the solution by Cedric:

   sourceData = {1 'ABC1';3 'ABC2'; 8 'ABC1'; 7 'ABC3'; 0 'ABC1'; ...
                 5 'ABC3'; 8 'ABC2'; 4 'ABC3'; 7 'ABC2'; 6 'ABC2'} ;
   [names, ~, cId] = unique( sourceData(:, 2) ) ;
   buf      = bsxfun( @eq, 1:max( cId ), cId ) ;
   rId      = sum( cumsum( buf ) .* buf, 2 ) ;
   destData = [names.'; cell( max( rId ), numel( names ))] ;
   destData(sub2ind( size(destData), 1+rId, cId )) = sourceData(:, 1) ;

which leads to:

 >> destData
   destData = 
      'ABC1'    'ABC2'    'ABC3'
      [   1]    [   3]    [   7]
      [   8]    [   8]    [   5]
      [   0]    [   7]    [   4]
          []    [   6]        []

This works, but I still have 3 problems:

The empty values are "[]"
How can I export this cell array in a text file ?
I would like to calculate the standard deviation of each columm and outputting this at the end of each column. So it should look like:

      'ABC1'    'ABC2'    'ABC3'
      [   1]    [   3]    [   7]
      [   8]    [   8]    [   5]
      [   0]    [   7]    [   4]
          []    [   6]        []
      --------------------------
      [   3]    [   6]    [5.34] // arithmetic average
     [12.67]    [ 3.5]    [1.55] // variance
     [ 3.56]    [ 3.5]    [1.55] // standard deviation

4 Comments
Show 2 older comments Hide 2 older comments

Guillaume on 27 Jan 2015

1. The empty values are "[]"

This is what you asked. [] is the empty matrix. How else would you represent empty in matlab?

2. How can I export this cell array in a text file ?

In that format, not easily.

3. I would like to calculate the standard deviation of each columm and outputting this at the end of each column

Again, the data format is a hindrance. Furthermore, you're overloading the data in each column. In addition to a column header, and the data, you now want statistics stored in the column. With a cell array, it's only your own convention that tells you what each row is. Matlab won't track this for you.

In the end a cell array with empty elements is probably not what you want (hence, my original question about what output you expect). For a start, I would use NaNs instead of empty, so your data can be a matrix. Statistics on a matrix is very straightforward. On a cell array not so much. If you want headers you can store the matrix in a table with column headers. The stats, I would store in another table with both column and row headers. Tables are the easiest thing to export to file in matlab.

See comments to my answer on how to do this.

Guillaume on 27 Jan 2015

I would say in your particular case, which involves redistributing single column data into multicolumn data, it's much easier in matlab.

You can of course do it in VBA, if you're more familiar with it. I doubt you could just do it with Excel formulas.

In any case, I've added code as a comment to my own answer on how to do what you want using tables. The same would be significantly more code in VBA.

Cedric on 27 Jan 2015

See the edit in my post. There is nothing complicated overall, but you have to understand data manipulation in MATLAB, and in particular the difference between numeric arrays and cell arrays.

Sign in to comment.

Answer 6

Andrei Bobrov on 27 Jan 2015

Edited: Andrei Bobrov on 27 Jan 2015

Open in MATLAB Online

0 votes

sourceData = {1 'ABC1';3 'ABC2'; 8 'ABC1'; 7 'ABC3'; 0 'ABC1'; ...
              5 'ABC3'; 8 'ABC2'; 4 'ABC3'; 7 'ABC2'; 6 'ABC2'} ;
[a,~,c] = unique(sourceData(:,2));
[v,ii] = sort(c);
t = [1;diff(v)]>0;
k = ones(size(t));
n = (1:numel(v))';
xx = n(t);
k(xx(2:end)) = 1-diff(find(t));
out = [a(:)';accumarray([cumsum(k),v],ii,[],@(x)sourceData(x,1))];

or used Cedric's idea

[a,~,c] = unique(sourceData(:,2));
b = bsxfun( @eq, 1:max( c ), c );
[ii,jj] = find(b);
d = cumsum(b).*b;
out2 = [a(:)';accumarray([d(d>0),jj],ii,[],@(x)sourceData(x,1))];
d2 = out2(2:end,:);
d2(cellfun('isempty',d2)) = {nan};
d2 = cell2mat(d2);
out3 = [out2;num2cell([nanmean(d2);nanstd(d2);nanvar(d2)])];

2 Comments
Show None Hide None

Marv on 27 Jan 2015

Thanks, could you guys maybe add a few comments in the code ? I dont find it hard to understand the commands. Its harder to get the principle / technique you use to arrange the data. Could you explain please ? Thank you very much

Cedric on 28 Jan 2015

Edited: Cedric on 28 Jan 2015

As written in my answer Marv (EDIT 01/27/2015@9:30am), you should understand the basic approach first, because it would take a lot of time for us to explain in depth what Andrei, Guillaume, or I implemented using BSXFUN, ACCUMARRAY, etc., and you would not be able to re-use this material in another context.

If you focus on the basic approach, however, you'll have to get more familiar with e.g. numeric and cell types of arrays, and this won't be lost because they are used everywhere in MATLAB.

I am speaking for myself here, but I guess that it can be extended to Andrei and Guillaume (correct me if I am wrong): I/we usually start writing these concise solutions because we find them elegant, despite not knowing at first how proficient the original poster (OP) is with MATLAB. But, as far as I am concerned, these aren't good solutions (unless the OP is very proficient already), because they are too complex for an economy of just a few lines of code. In addition, more lines of code and explicit loops, don't mean less efficient, as the basic approach is the second fastest code according to the profiler (in this context, after repeating sourceData 1000 times for working with a larger data set).

Sign in to comment.

Arrange Matrix - Same Values of a Row to one Column

4 Comments
Show 2 older comments Hide 2 older comments

Answers (6)

2 Comments
Show None Hide None

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments

7 Comments
Show 5 older comments Hide 5 older comments

4 Comments
Show 2 older comments Hide 2 older comments

2 Comments
Show None Hide None

Categories

Tags

Community Treasure Hunt

Arrange Matrix - Same Values of a Row to one Column

4 Comments Show 2 older comments Hide 2 older comments

Answers (6)

2 Comments Show None Hide None

0 Comments Show -2 older comments Hide -2 older comments

1 Comment Show -1 older comments Hide -1 older comments

7 Comments Show 5 older comments Hide 5 older comments

4 Comments Show 2 older comments Hide 2 older comments

2 Comments Show None Hide None

Categories

Tags

See Also

Community Treasure Hunt

4 Comments
Show 2 older comments Hide 2 older comments

2 Comments
Show None Hide None

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments

7 Comments
Show 5 older comments Hide 5 older comments

4 Comments
Show 2 older comments Hide 2 older comments

2 Comments
Show None Hide None