In a table I have a column made of strings which are often repeated (sort of categories), I would like to substitute them with numbers which represents the categories

11 views (last 30 days)
Hi all, I have a table with a column filled with strings which represents some predefined attributes or categories whith some empty spots e.g. ('black', 'black', 'red', 'red', blue', 'red', ", 'blue', ", 'black'). I would like to substitute to them some numbers which represents the categories considering the empty spots as categories too e.g. (1, 1, 2, 2, 3, 2, 4, 3, 4,1).
'black' => 1 'red'=> 2 'blue'=>3 "=>4
I have tried the following, where data is the table:
for i=1:size(data,2)
if class(data{1,i})== "cell" | class(data{1,i})== "string"
column_to_change=double(categorical(data{:,i}));
column_to_change(ismissing(column_to_change))=max(column_to_change)+1;
for j=1:size(data,1)
data{j,i}={column_to_change(j)};
end
end
end
It works but with some problems:
1) the numbers do not start from 1 and are not consequential (I am probably missing something about how the categorical function works)
2) the final assignment in the table with the for cycle it is way too slow with big tables.
I am sure there are some better ways to do it, someone can help me?
Thank you in advance

Accepted Answer

Cris LaPierre
Cris LaPierre on 12 Apr 2022
I would first consider keeping it a categorical. You have not explained why you need it to be a numeric value. However, if you need a numeric representation, consider using findgroups. It will number the groups in alphabetical order, starting with the empty string.
A = ["black", "black", "red", "red", "blue", "red", "", "blue", "", "black"];
G = findgroups(A)
G = 1×10
2 2 4 4 3 4 1 3 1 2
  7 Comments
Cris LaPierre
Cris LaPierre on 13 Apr 2022
Edited: Cris LaPierre on 13 Apr 2022
Use the T.(expression) syntax. This is the third row of the table on the linked paged (Type of Output = Array, extracting data from one variable)
A = {'black', 'black', 'red', 'red', 'blue', 'red', '', 'blue', '', 'black'}';
data = table(A);
% overwrite table variable
data.(1) = findgroups(data.(1))
data = 10×1 table
A ___ 1 1 3 3 2 3 NaN 2 NaN 1
Square
Square on 14 Apr 2022
WoW! Awesome! I did not know about that!
Thank you so much Cris for all the precious insights and your time!

Sign in to comment.

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!