strcmp and rows of dataset table

16 views (last 30 days)
Bina
Bina on 27 Dec 2011
i have a text dataset with 4 columns and n rows : cl1 cl2 cl3 cl4 i want to know how can i use strcmp() to show which rows are with the same CL2 and CL3 (no CL2=CL3)for example ,according to the dataset below i want to show row1 and row4 , becouse they have same cl2 and cl3,
cl1 cl2 cl3 cl4
---------------------------
a b c d
d j h n
s b v y
q b c g
and as i said dataset has "n" rows so some rows have same CL2-Cl3 and... i want to make domains, for example Domain1={some rows with same CL2-CL3} Domain2={some rows with another same CL2-Cl3} , ...
pleasecheck code below and give me idea what should i do? how to use strcmp() in this case? and how to show the target rows?
fid = fopen('Input2.txt','r')
data = textscan(fid,'%s %s %s %s')
fclose(fid)
indices = strcmp(data{2}{1},data{2})&&(data{1})
sum(indices)

Accepted Answer

Matt Tearle
Matt Tearle on 27 Dec 2011
Sounds like a job for categorical arrays! Huzzah! (Assuming you have Statistics Toolbox.) BTW, you said "dataset" but you're using cell arrays, so I assume you don't mean the dataset array in Stats TB. However, they may be a useful way to package your data. Anyway... why not make a new variable that is the combination of columns 2 and 3, and look for the unique values of that array:
twoandthree = nominal(strcat(data{2},'-',data{3}))
data = [data{:}];
domains = getlabels(twoandthree)
for k=1:length(domains)
foo = data(twoandthree==domains{k},:)
end
If you don't have Stats TB, you can achieve the same result with unique and strcmp:
twoandthree = strcat(data{2},'-',data{3})
data = [data{:}];
domains = unique(twoandthree)
for k=1:length(domains)
foo = data(strcmp(twoandthree,domains{k}),:)
end
Also, note I'm using [data{:}] to extract the four columns (each being a cell array) and concatenate them together into a single four-column table (ie a single n-by-4 cell array containing strings). If you're going to be accessing by rows, that's a nicer arrangement of data.
But, as I mentioned, dataset arrays may also make life nice, depending on what you're doing to do with the subsets.
data = dataset(data{:},'VarNames',strcat('cl',cellstr(num2str((1:4)'))))
twoandthree = nominal(strcat(data.cl2,'-',data.cl3))
domains = getlabels(twoandthree)
for k=1:length(domains)
foo = data(twoandthree==domains{k},:)
end
  3 Comments
Matt Tearle
Matt Tearle on 27 Dec 2011
[Strikes heroic pose] Don't thank me. Thank logical indexing. [Rides off into sunset]
Walter Roberson
Walter Roberson on 27 Dec 2011
Indexing! Indexing! Get your red-hot Logical Indexing here!
Authorized! Signed! Get your red-hot Logical Indexing!
Vectorized! Multidimensional! Endorsed by "Shane" Tearle!
Get your read-hot Logical Indexing!

Sign in to comment.

More Answers (0)

Categories

Find more on Cell Arrays in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!