Arrange Matrix - Same Values of a Row to one Column

5 views (last 30 days)
Marv
Marv on 26 Jan 2015
Edited: Cedric on 28 Jan 2015
Hello, I have a huge database that needs to be re-arranged:
System_Values ; System_Name
1 ; ABC1
3 ; ABC2
8 ; ABC1
7 ; ABC3
0 ; ABC1
5 ; ABC3
8 ; ABC2
4 ; ABC3
7 ; ABC2
6 ; ABC2
So as you can see there are three different system names ABC1..3. ABC1 for example has several values in the first column: 1,8 and 0. I want matlab to find the systems and adding the values of each system in one column:
ABC1 ; ABC2 ; ABC3
1 ; 3 ; 7
8 ; 8 ; 5
0 ; 7 ; 4
empty ; 6 ; empty
I have absolutely no idea how to do this in a matlab script. I hope you can help me, please let me know if you don`t understand my problem.
Thanks
  4 Comments
Guillaume
Guillaume on 26 Jan 2015
Edited: Guillaume on 26 Jan 2015
A 2d cell array has to have the same number of rows for all columns. Hence if one column is made up of 130 rows, so are all the other. So what do you put in there after the first 5 rows for the shorter column? NaN? 0? empty matrix?
The other option is to have a 1d row cell array made up of 1d column cell arrays. The inner cell arrays can then have different number of rows. It's lot more unwieldy though.
Hence my question.
Marv
Marv on 26 Jan 2015
All other rows which are smaller than 130 for another system shall be empty

Sign in to comment.

Answers (6)

Cedric
Cedric on 26 Jan 2015
Edited: Cedric on 27 Jan 2015
It is unclear how you want to export the result, but here is how you can extract the data by system name:
isMatch = strcmpi( 'ABC1', System_Name ) ;
data_ABC1 = System_Values(isMatch) ;
and you can do the same for ABC2 and ABC3, or build a loop like:
nSys = 3 ;
names = cell( 1, nSys ) ;
values = cell( 1, nSys ) ;
for sysId = 1 : nSys
names{sysId} = sprintf( 'ABC%d', sysId ) ;
isMatch = strcmpi( names{sysId}, System_Name ) ;
values{sysId} = System_Values(isMatch) ;
end
EDIT 01/26/2015@7:38pm: taking in account your last comments, here is a solution.
sourceData = {1 'ABC1';3 'ABC2'; 8 'ABC1'; 7 'ABC3'; 0 'ABC1'; ...
5 'ABC3'; 8 'ABC2'; 4 'ABC3'; 7 'ABC2'; 6 'ABC2'} ;
[names, ~, cId] = unique( sourceData(:, 2) ) ;
buf = bsxfun( @eq, 1:max( cId ), cId ) ;
rId = sum( cumsum( buf ) .* buf, 2 ) ;
destData = [names.'; cell( max( rId ), numel( names ))] ;
destData(sub2ind( size(destData), 1+rId, cId )) = sourceData(:, 1) ;
which leads to:
>> destData
destData =
'ABC1' 'ABC2' 'ABC3'
[ 1] [ 3] [ 7]
[ 8] [ 8] [ 5]
[ 0] [ 7] [ 4]
[] [ 6] []
Both this and Guillaume's solutions have the disadvantage that you won't understand them if you are not familiar with MATLAB though. If I were you, I would go for a much simpler solution based on STRCMPI and a loop, and the call to UNIQUE that Star Strider proposed first.
Let us know if you are interested. It would involve a bit more code, but it would have the advantage that you would understand it better (especially if you come back to the code a few months/years from now).
EDIT 01/27/2015@9:30am: here is a solution that involves only basic MATLAB operations. My advice is that you try to fully understand it, even if you end up implementing Guillaume or my solution above, because they are more concise but significantly more complicated.
% - Init: extract unique names, initialize data structures for storing
% both cell data and statistics (as a numeric array).
names = unique( sourceData(:,2) ) ;
nNames = numel( names ) ;
data = cell( 1, nNames ) ;
stat = zeros( 4, nNames ) ;
% - Extract data and compute stat.
for nId = 1 : nNames
% Extract data for current name, store as cell array and convert
% to num for computing statistics.
isMatch = strcmpi( names{nId}, sourceData(:,2) ) ;
data{nId} = sourceData(isMatch,1) ;
dataNum = cell2mat( data{nId} ) ;
% Compute statistics, add extra row with data count (useful for
% initializing destination array later).
stat(1, nId) = mean( dataNum ) ;
stat(2, nId) = var( dataNum ) ;
stat(3, nId) = std( dataNum ) ;
stat(4, nId) = numel( dataNum ) ;
end
% - Build output cell array.
destData = cell( max( stat(4,:) ), nNames ) ;
% Copy data cell arrays to destination cell array.
for nId = 1 : nNames
destData(1:stat(4,nId), nId) = data{nId} ;
end
% Frame with names and statistics. Being a numeric array, the latter
% is converted to cell array first.
destData = [names.'; destData; num2cell( stat(1:3,:) )] ;
% - Export, e.g. to XLSX file.
xlswrite( 'Output.xlsx', destData ) ;
  2 Comments
Guillaume
Guillaume on 27 Jan 2015
I think Cedric's latest edition is probably the best way for a beginner to process and export the data indeed.
The downside is that the format of the output destData is not very good for visualisation within matlab. If you want to visualise it in matlab, I'd then convert the cells/matrices into tables. I'd also keep the stats and data separate.
Cedric
Cedric on 27 Jan 2015
Edited: Cedric on 27 Jan 2015
.. and then maybe learning how to use Tables; but our solutions based on BSXFUN, ACCUMARRAY, ARRAY/CELLFUN, will require a lot of work if you want to understand what we do.
Note that I could update this solution (based on explicit loops) so it uses a numeric array of data. The only information that I/we still need is what kind of export you need exactly. You showed the content and said "text file", but is it really ASCII/plain text, or wouldn't you prefer an Excel file?

Sign in to comment.


Star Strider
Star Strider on 26 Jan 2015
Edited: Star Strider on 26 Jan 2015
Edited to use the new (20:00 GMT) version of ‘M’:
This works:
M = {1 'ABC1'
3 'ABC2'
8 'ABC1'
7 'ABC3'
0 'ABC1'
5 'ABC3'
8 'ABC2'
4 'ABC3'
7 'ABC2'
6 'ABC2'};
[ABC, ia, ic] = unique(M(:,2));
for k1 = 1:size(ABC,1)
ABCv{k1} = [ABC{k1} M(ic == k1, 1)']';
end
C1 = ABCv{1} % Display Output
C2 = ABCv{2}
C3 = ABCv{3}
produces:
C1 =
'ABC1'
[1.0000e+000]
[8.0000e+000]
[0.0000e+000]
C2 =
'ABC2'
[3.0000e+000]
[8.0000e+000]
[7.0000e+000]
[6.0000e+000]
C3 =
'ABC3'
[7.0000e+000]
[5.0000e+000]
[4.0000e+000]

Guillaume
Guillaume on 26 Jan 2015
accumarray is quite good for this. The difficulty is filling shorter columns with empty:
C = {1 'ABC1';3 'ABC2'; 8 'ABC1'; 7 'ABC3'; 0 'ABC1'; 5 'ABC3'; 8 'ABC2'; 4 'ABC3'; 7 'ABC2'; 6 'ABC2'}; %demo data
[names, ~, in] = unique(C(:, 2)); %get unique names and their position
namesdata = accumarray(in, cell2mat(C(:, 1)), [], @(x) {x}); %redistribute name data into cell arrays
%now what's left is to expand all the cell arrays to the same size
maxsize = max(cellfun(@numel, namesdata)); %get max length
namesdata = cellfun(@(c) [num2cell(c); cell(maxsize-numel(c), 1)], namesdata, 'UniformOutput', false); %fill shorter columns with empty cells
result = [names'; horzcat(namesdata{:})] %and concatenate the lot
  1 Comment
Guillaume
Guillaume on 27 Jan 2015
Following your comments, what you really need are tables. Here is one way to do it:
C = {1 'ABC1';3 'ABC2'; 8 'ABC1'; 7 'ABC3'; 0 'ABC1'; 5 'ABC3'; 8 'ABC2'; 4 'ABC3'; 7 'ABC2'; 6 'ABC2'}; %demo data
[names, ~, in] = unique(C(:, 2)); %get unique names and their position
namesdata = accumarray(in, cell2mat(C(:, 1)), [], @(x) {x}); %redistribute name data into cell arrays
maxsize = max(cellfun(@numel, namesdata)); %get max length
namesdata = cell2mat(cellfun(@(c) [c; nan(maxsize-numel(c), 1)], namesdata, 'UniformOutput', false)');
At this point namesdata is now a matrix of your data, with NaN for empty elements. You can make a table with:
tdata = array2table(namesdata, 'VariableNames', names)
To create the statistics table:
stats = [nanmean(namesdata); nanvar(namesdata, 1); nanstd(namesdata, 1)]; %requires stat toolbox
tstats = array2table(stats, 'VariableNames', names, 'RowNames', {'mean', 'variance', 'standard_deviation'})
To write it to a text file:
writetable([tdata;tstats], 'somefile.txt', 'WriteRowNames', true);

Sign in to comment.


Marv
Marv on 26 Jan 2015
The system names are also not defined like a numeration ABC1,2,3,4 ... The names are random but there are several identical system names which have values like in the example above. Matlab should find these same names and put its values together into a vector or matrix.
So I think the solutions here dont work so far :(
  7 Comments
Cedric
Cedric on 27 Jan 2015
Edited: Cedric on 27 Jan 2015
Thank you Star Strider! It is a voluntary hiatus driven by the fact that I have too much work AND because I know too well how "too much" time I can spend answering questions if I allow me to interact on the forum.
Thank you for the link and for the information about these new features. I am not too fond of the new design for profiles either. Hopefully, as you say, it is a work in progress!
Star Strider
Star Strider on 27 Jan 2015
My pleasure! I’m glad it is a voluntary hiatus. There are worse things in life than too much work, providing you’re getting paid appropriately. (Harking back to my Internal Medicine residency when my workload seemed infinite but our pay on an hourly basis was about one third the minimum wage.) ‘Answers’ can definitely expand to fill all available time, so I’ll likely reduce my participation in a few months.
I had to ask about getting access to the ‘old’ Answers features, since it was very much less than obvious. It really hinders spam-deletion efforts. I am assured that this is ‘Version 1.0’ of the ‘new’ Answers, but then the ‘old’ Answers wasn’t broken, so the need to ‘fix’ it still mystifies.

Sign in to comment.


Marv
Marv on 27 Jan 2015
Ok I tried the solution by Cedric:
sourceData = {1 'ABC1';3 'ABC2'; 8 'ABC1'; 7 'ABC3'; 0 'ABC1'; ...
5 'ABC3'; 8 'ABC2'; 4 'ABC3'; 7 'ABC2'; 6 'ABC2'} ;
[names, ~, cId] = unique( sourceData(:, 2) ) ;
buf = bsxfun( @eq, 1:max( cId ), cId ) ;
rId = sum( cumsum( buf ) .* buf, 2 ) ;
destData = [names.'; cell( max( rId ), numel( names ))] ;
destData(sub2ind( size(destData), 1+rId, cId )) = sourceData(:, 1) ;
which leads to:
>> destData
destData =
'ABC1' 'ABC2' 'ABC3'
[ 1] [ 3] [ 7]
[ 8] [ 8] [ 5]
[ 0] [ 7] [ 4]
[] [ 6] []
This works, but I still have 3 problems:
  1. The empty values are "[]"
  2. How can I export this cell array in a text file ?
  3. I would like to calculate the standard deviation of each columm and outputting this at the end of each column. So it should look like:
'ABC1' 'ABC2' 'ABC3'
[ 1] [ 3] [ 7]
[ 8] [ 8] [ 5]
[ 0] [ 7] [ 4]
[] [ 6] []
--------------------------
[ 3] [ 6] [5.34] // arithmetic average
[12.67] [ 3.5] [1.55] // variance
[ 3.56] [ 3.5] [1.55] // standard deviation
  4 Comments
Guillaume
Guillaume on 27 Jan 2015
I would say in your particular case, which involves redistributing single column data into multicolumn data, it's much easier in matlab.
You can of course do it in VBA, if you're more familiar with it. I doubt you could just do it with Excel formulas.
In any case, I've added code as a comment to my own answer on how to do what you want using tables. The same would be significantly more code in VBA.
Cedric
Cedric on 27 Jan 2015
See the edit in my post. There is nothing complicated overall, but you have to understand data manipulation in MATLAB, and in particular the difference between numeric arrays and cell arrays.

Sign in to comment.


Andrei Bobrov
Andrei Bobrov on 27 Jan 2015
Edited: Andrei Bobrov on 27 Jan 2015
sourceData = {1 'ABC1';3 'ABC2'; 8 'ABC1'; 7 'ABC3'; 0 'ABC1'; ...
5 'ABC3'; 8 'ABC2'; 4 'ABC3'; 7 'ABC2'; 6 'ABC2'} ;
[a,~,c] = unique(sourceData(:,2));
[v,ii] = sort(c);
t = [1;diff(v)]>0;
k = ones(size(t));
n = (1:numel(v))';
xx = n(t);
k(xx(2:end)) = 1-diff(find(t));
out = [a(:)';accumarray([cumsum(k),v],ii,[],@(x)sourceData(x,1))];
or used Cedric's idea
[a,~,c] = unique(sourceData(:,2));
b = bsxfun( @eq, 1:max( c ), c );
[ii,jj] = find(b);
d = cumsum(b).*b;
out2 = [a(:)';accumarray([d(d>0),jj],ii,[],@(x)sourceData(x,1))];
d2 = out2(2:end,:);
d2(cellfun('isempty',d2)) = {nan};
d2 = cell2mat(d2);
out3 = [out2;num2cell([nanmean(d2);nanstd(d2);nanvar(d2)])];
  2 Comments
Marv
Marv on 27 Jan 2015
Thanks, could you guys maybe add a few comments in the code ? I dont find it hard to understand the commands. Its harder to get the principle / technique you use to arrange the data. Could you explain please ? Thank you very much
Cedric
Cedric on 28 Jan 2015
Edited: Cedric on 28 Jan 2015
As written in my answer Marv (EDIT 01/27/2015@9:30am), you should understand the basic approach first, because it would take a lot of time for us to explain in depth what Andrei, Guillaume, or I implemented using BSXFUN, ACCUMARRAY, etc., and you would not be able to re-use this material in another context.
If you focus on the basic approach, however, you'll have to get more familiar with e.g. numeric and cell types of arrays, and this won't be lost because they are used everywhere in MATLAB.
I am speaking for myself here, but I guess that it can be extended to Andrei and Guillaume (correct me if I am wrong): I/we usually start writing these concise solutions because we find them elegant, despite not knowing at first how proficient the original poster (OP) is with MATLAB. But, as far as I am concerned, these aren't good solutions (unless the OP is very proficient already), because they are too complex for an economy of just a few lines of code. In addition, more lines of code and explicit loops, don't mean less efficient, as the basic approach is the second fastest code according to the profiler (in this context, after repeating sourceData 1000 times for working with a larger data set).

Sign in to comment.

Categories

Find more on Mathematics in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!