Sort cell strings according to specific subsets of those cell strings

1 view (last 30 days)
Let's say I have a cell string with values:
filename = {'2009.272.17.57.23.8445.AZ.SMER..BHE.R.SAC';...
'2009.272.17.57.24.5500.AZ.FRD..BHN.R.SAC';...
'2009.272.17.57.27.5445.AZ.SMER..BHN.R.SAC';...
'2009.272.17.57.27.8000.AZ.SND..BHZ.R.SAC';...
'2009.272.17.57.27.9445.AZ.BZN..BHE.R.SAC';...
'2009.272.17.57.28.7000.AZ.SND..BHN.R.SAC';...
'2009.272.17.57.29.1250.AZ.FRD..BHZ.R.SAC';...
'2009.272.17.57.29.2250.AZ.PFO..BHE.R.SAC';...
'2009.272.17.57.29.3695.AZ.SMER..BHZ.R.SAC';...
'2009.272.17.57.29.9445.AZ.BZN..BHN.R.SAC';...
'2009.272.17.57.30.0000.AZ.RDM..BHN.R.SAC';...
'2009.272.17.57.30.8000.AZ.RDM..BHZ.R.SAC';...
'2009.272.17.57.31.8250.AZ.LVA2..BHZ.R.SAC';...
'2009.272.17.57.31.8500.AZ.LVA2..BHE.R.SAC';...
'2009.272.17.57.31.9195.AZ.BZN..BHZ.R.SAC';...
'2009.272.17.57.32.0000.AZ.WMC..BHZ.R.SAC';...
'2009.272.17.57.32.6750.AZ.WMC..BHN.R.SAC';...
'2009.272.17.57.33.3195.AZ.KNW..BHZ.R.SAC';...
'2009.272.17.57.33.4750.AZ.TRO..BHN.R.SAC';...
'2009.272.17.57.33.7750.AZ.PFO..BHN.R.SAC';...
'2009.272.17.57.33.9000.AZ.PFO..BHZ.R.SAC';...
'2009.272.17.57.34.1750.AZ.LVA2..BHN.R.SAC';...
'2009.272.17.57.34.8000.AZ.TRO..BHZ.R.SAC';...
'2009.272.17.57.35.0000.AZ.WMC..BHE.R.SAC';...
'2009.272.17.57.35.0750.AZ.RDM..BHE.R.SAC';...
'2009.272.17.57.35.8945.AZ.KNW..BHE.R.SAC';...
'2009.272.17.57.36.0250.AZ.FRD..BHE.R.SAC';...
'2009.272.17.57.36.2250.AZ.CRY..BHZ.R.SAC';...
'2009.272.17.57.36.3500.AZ.CRY..BHN.R.SAC';...
'2009.272.17.57.36.4500.AZ.SND..BHE.R.SAC';...
'2009.272.17.57.36.5000.AZ.TRO..BHE.R.SAC';...
'2009.272.17.57.36.5195.AZ.KNW..BHN.R.SAC';...
'2009.272.17.57.36.5750.AZ.CRY..BHE.R.SAC'};
I want to be able to assume that I do not know what character the station name (e.g., CRY) or component name (e.g., BHE) starts and ends on. Though, the number of periods (".") will be consistent.
I have something fairly clunky to do this, but I am wondering if anyone can suggest a quick one/two-liner that would assume a string format of the general form:
YYYY.DDD.HH.MM.SS.ssss.$1.$2..$3.R.SAC
where:
$1 = Array name $2 = Station name $3 = Component name
And then sort the list with the primary and secondary sort order according to $2 and $3, respectively, so that the first 6 rows in the cell string would be:
2009.272.17.57.27.9445.AZ.BZN..BHE.R.SAC
2009.272.17.57.29.9445.AZ.BZN..BHN.R.SAC
2009.272.17.57.31.9195.AZ.BZN..BHZ.R.SAC
2009.272.17.57.36.5750.AZ.CRY..BHE.R.SAC
2009.272.17.57.36.3500.AZ.CRY..BHN.R.SAC
2009.272.17.57.36.2250.AZ.CRY..BHZ.R.SAC
...
  4 Comments
Jan
Jan on 22 Jan 2012
It looks like the parts do *not* have the same length:
'2009.272.17.57.33.9000.AZ.PFO..BHZ.R.SAC'
'2009.272.17.57.34.1750.AZ.LVA2..BHN.R.SAC'
Dr. Seis
Dr. Seis on 22 Jan 2012
Oh, his question was related to the "component" name, which are all the same number of characters (i.e., 3). The "station" names are not the same - they range from 3 to 4 characters.

Sign in to comment.

Accepted Answer

Oleg Komarov
Oleg Komarov on 22 Jan 2012
% Split using |'.'| as the delimiter
splt = regexpi(filename,'\.','split');
% Sort according to the 8th and 10th column
[sorted,idx] = sortrows(cat(1,splt{:}),[8,10])
Now you can use the sorted split array or apply idx to filename

More Answers (2)

Jan
Jan on 22 Jan 2012
filename = {'2009.272.17.57.23.8445.AZ.SMER..BHE.R.SAC';...
'2009.272.17.57.24.5500.AZ.FRD..BHN.R.SAC';...
'2009.272.17.57.27.5445.AZ.SMER..BHN.R.SAC';...
'2009.272.17.57.27.8000.AZ.SND..BHZ.R.SAC';...
'2009.272.17.57.27.9445.AZ.BZN..BHE.R.SAC';...
'2009.272.17.57.28.7000.AZ.SND..BHN.R.SAC';...
'2009.272.17.57.29.1250.AZ.FRD..BHZ.R.SAC';...
'2009.272.17.57.29.2250.AZ.PFO..BHE.R.SAC'};
n = numel(filename);
C2 = cell(1, n);
C3 = cell(1, n);
for iC = 1:n
D = textscan(filename{iC}(27:end), '%s', 'Delimiter', '.');
C2{iC} = D{1}{1};
C3{iC} = D{1}{3};
end
% A kind of SORTROWS:
[dummy, ind3] = sort(C3);
[dummy, ind2] = sort(C2(ind3));
index = ind3(ind2);
filename = filename(index);
  3 Comments
Jan
Jan on 23 Jan 2012
While Oleg's REGEXP is much nicer than calling TEXTSCAN in a loop, SORTROWS does exactly the same as my sorting method, but with a lot of overhead.

Sign in to comment.


Dr. Seis
Dr. Seis on 22 Jan 2012
Here is the clunky version I have been using:
numFiles = numel(filename);
sortcell = {''};
sortind = zeros(numFiles,4);
for i = 1 : numFiles
sortind(i,2)=strfind(filename{i},'..')-1;
for j = sortind(i,2):-1:1
if isequal(filename{i}(j),'.')
break;
end
sortind(i,1)=j;
end
sortind(i,3)=sortind(i,2)+3;
for j = sortind(i,3):length(filename{i})
if isequal(filename{i}(j),'.')
break;
end
sortind(i,4)=j;
end
sortcell(i,1)=cellstr(filename{i}(sortind(i,1):sortind(i,2)));
sortcell(i,2)=cellstr(filename{i}(sortind(i,3):sortind(i,4)));
end
[tempcell,tempind1]=sort(sortcell(:,2));
[tempcell,tempind2]=sort(sortcell(tempind1,1));
filename = filename(tempind1(tempind2));

Categories

Find more on Shifting and Sorting Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!