In a table I have a column made of strings which are often repeated (sort of categories), I would like to substitute them with numbers which represents the categories
11 views (last 30 days)
Show older comments
Hi all, I have a table with a column filled with strings which represents some predefined attributes or categories whith some empty spots e.g. ('black', 'black', 'red', 'red', blue', 'red', ", 'blue', ", 'black'). I would like to substitute to them some numbers which represents the categories considering the empty spots as categories too e.g. (1, 1, 2, 2, 3, 2, 4, 3, 4,1).
'black' => 1 'red'=> 2 'blue'=>3 "=>4
I have tried the following, where data is the table:
for i=1:size(data,2)
if class(data{1,i})== "cell" | class(data{1,i})== "string"
column_to_change=double(categorical(data{:,i}));
column_to_change(ismissing(column_to_change))=max(column_to_change)+1;
for j=1:size(data,1)
data{j,i}={column_to_change(j)};
end
end
end
It works but with some problems:
1) the numbers do not start from 1 and are not consequential (I am probably missing something about how the categorical function works)
2) the final assignment in the table with the for cycle it is way too slow with big tables.
I am sure there are some better ways to do it, someone can help me?
Thank you in advance
0 Comments
Accepted Answer
Cris LaPierre
on 12 Apr 2022
I would first consider keeping it a categorical. You have not explained why you need it to be a numeric value. However, if you need a numeric representation, consider using findgroups. It will number the groups in alphabetical order, starting with the empty string.
A = ["black", "black", "red", "red", "blue", "red", "", "blue", "", "black"];
G = findgroups(A)
7 Comments
Cris LaPierre
on 13 Apr 2022
Edited: Cris LaPierre
on 13 Apr 2022
Use the T.(expression) syntax. This is the third row of the table on the linked paged (Type of Output = Array, extracting data from one variable)
A = {'black', 'black', 'red', 'red', 'blue', 'red', '', 'blue', '', 'black'}';
data = table(A);
% overwrite table variable
data.(1) = findgroups(data.(1))
More Answers (0)
See Also
Categories
Find more on Logical in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!