Find unique rows in cell array with mixed data types

First I'm used to working in database structures where you have a table with a primary key linked to a table with foreign key. In matlab, I need to read an xls file where all the data is in one file. For example,
raw =
ID Type Customer Allergies SomeNumber Topping
1 pizza John NaN 9 Pepperoni
1 pizza John NaN 9 Mushrooms
1 pizza John NaN 9 Peppers
2 pizza Jill NaN NaN Olives
2 pizza Jill NaN NaN Canadian Bacon
I would then like to put this in a struct where the toppings are an array and the unique row values are represented one time. I tried using unique() but couldn't perform on a mixed cell array. Any ideas how to extract the unique information and then attach the non-unique elements? I ultimately want to construct a class with each entry so that I can loop through and output data to a different format.

 Accepted Answer

I cannot recreate your problem. Here is what I did:
AA = {'A','B','A','C'};
BB = [1 2 1 1];
CC= [true,true,true,false];
Z=table(AA',BB',CC')
Var1 Var2 Var3
____ ____ _____
'A' 1 true
'B' 2 true
'A' 1 true
'C' 1 false
then simply:
unique(Z)
ans =
AA BB CC
___ __ _____
'A' 1 true
'B' 2 true
'C' 1 false
Correctly gives unique rows. IT is also possible I didn't understand your problem.

4 Comments

Thank you for your response, but I don't think it is that simple. The file I'm working with has NaN's, numbers, strings, so unique treats each NaN as unique. I get an error processing the numbers saying input needs to be a cell array, etc. Other methods, I can process the individual results as a column but I need the unique values grouped as a row. This is what I have thus far but get the error Input A must be a cell array of strings.
unk = unique(raw(cellfun(@x all(~isnan(x)),raw(:,1:n))));
If you work out an example, it might be more clear what exactly you are trying to achieve.
Why are NaNs important? Just replace them with something dummy:
AA = {'A','B','A','C'};
BB = [1 2 1 1];
CC= [true,true,true,false];
DD = [ NaN,NaN,NaN,NaN];
Z=table(AA',BB',CC',DD')
>> unique(Z)
ans =
Var1 Var2 Var3 Var4
____ ____ _____ ____
'A' 1 true NaN
'A' 1 true NaN
'B' 2 true NaN
'C' 1 false NaN
Z.Var4(isnan(Z.Var4))=0;
>> unique(Z)
ans =
Var1 Var2 Var3 Var4
____ ____ _____ ____
'A' 1 true 0
'B' 2 true 0
'C' 1 false 0
Also make sure you are using the table format.
Thank you for your time to comment. To clarify, I am trying to eliminate duplicates and find all the unique rows and the file I'm reading from has mixed values strings, values, empty cells. So when I eliminate the nans and convert cell2table, I still get the error message "Input A must be a cell array of strings". My first value is a number but I don't know what difference that makes but obviously, Unique() won't process both numeric and string values in either a table or cell array.
I converted all the double data types to char using the following:
charArr = cellfun(@num2str, raw, 'Un', 0 );
Then
unique (charArr, 'rows');
and get the correction 'rows' is not supported for cell array inputs.
So then I converted my cell array to Table
T = cell2Table(charArr);
unk = unique(T, 'rows');
and that worked I got
1 pizza John "" 9
2 pizza Jill "" ""
Now I can loop through the toppings using the ID number and put into a struct. So the issue was converting to all one data type then using a Table for finding unique rows.

Sign in to comment.

More Answers (0)

Categories

Tags

Asked:

on 9 May 2016

Commented:

on 13 May 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!