Categorical to Numeric problem

Hi
I have a table that has numeric and categorical items in it. I have converted the catergorical items to numeric using the unique() function which works very well and I can then feed the matrix into an NN for training. The problem is when I feed new data to get results, I don't know how to make sure the converted categirical data in the new table matches ther numbers in the training data. i.e. if a categorical field in the training data is converted to the number 5, how do I make sure if that categorical data is in the new data, that it gets assigned the same number? I'm begining to think it may be a manual thing
SPG

 Accepted Answer

% Example Training Data (Categorical)
training_categorical_data = {'cat', 'dog', 'fish', 'dog', 'cat'};
% Convert Categorical Data to Numeric for Training
[unique_categories, ~, numeric_categories] = unique(training_categorical_data);
category_to_number_map = containers.Map(unique_categories, num2cell(1:length(unique_categories)));
numeric_training_data = cell2mat(values(category_to_number_map, num2cell(training_categorical_data)));
% Training Process with numeric_training_data
% [Your neural network training code goes here]
% Example New Data (Categorical)
new_categorical_data = {'dog', 'cat', 'bird'};
% Convert New Categorical Data to Numeric Using Training Mapping
numeric_new_data = zeros(size(new_categorical_data));
for i = 1:length(new_categorical_data)
if isKey(category_to_number_map, new_categorical_data{i})
numeric_new_data(i) = category_to_number_map(new_categorical_data{i});
else
% Handle unseen categories, e.g., assign a special number or ignore
numeric_new_data(i) = NaN; % Assign NaN for unseen categories
end
end
% Now, numeric_new_data is ready for use with the trained model
% [Your prediction or evaluation code goes here]
  • The training data training_categorical_data is a cell array of categorical strings. This is converted to numeric_training_data using a mapping (category_to_number_map).
  • The new data new_categorical_data is then converted using the same mapping. Unseen categories (like 'bird' in this example) are handled separately; here, I've assigned NaN to them, but you can choose another method as appropriate.
  • You'll need to insert your specific neural network training and prediction code where indicated. The numeric_training_data and numeric_new_data arrays are what you'd use for training and prediction, respectively.
------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
Professional Interests
  • Technical Services and Consulting
  • Embedded Systems | Firmware Developement | Simulations
  • Electrical and Electronics Engineering

4 Comments

I shall give it a go thanks. It looks really good.
WHen I run your code as a test, I get
Error using containers.Map/values
Specified key type does not match the type expected for this container.
Error in untitled1 (line 7)
numeric_training_data = cell2mat(values(category_to_number_map, num2cell(training_categorical_data)));
SPG
OK, using dictionary instead and it's working so far.
OK. I've got it to work now using dictionaries. Both this answer and the next one helped me get it working. AS yours includes how to use new data to I'll mark it as the answer. Thanks both for answering.

Sign in to comment.

More Answers (1)

Could you provide more details about your NN? I would think you should be able to pass categorical data into your network without having to convert it to numeric first.
If not, then I'd look into creating a dictionary, where you pass in the categorical value, and it returns the numberic value.
A = categorical({'medium' 'large' 'small' 'medium' 'large' 'small'});
names = unique(A)
names = 1×3 categorical array
large medium small
values = (1:length(names));
d = dictionary(names,values)
d = dictionary (categorical --> double) with 3 entries: large --> 1 medium --> 2 small --> 3
A(4)
ans = categorical
medium
x = d(A(4))
x = 2

4 Comments

It's a feedforwardnet type and you need to feed a matrix into it and not a table unfotunately but your idea looks good thanks.
They also accept cell arrays. What happens if you use table2cell to convert your input table to a cell array? Does it work then?
Unfortunately not. The code part is
InpsM = table2cell(Inps);
OutsM =table2cell(Outs);
InpsM=InpsM';
OutsM=OutsM';
net=feedforwardnet([96,48,24]);
net.trainFcn = 'trainlm';
net.inputs{1}.processFcns = {'mapstd'};
net=train(net,InpsM,OutsM,'useParallel','yes');
The error I get is
Error using nntraining.setup>setupPerWorker
Inputs X{1,1} is not numeric or logical.
Error in nntraining.setup (line 77)
[net,data,tr,err] = setupPerWorker(net,trainFcn,X,Xi,Ai,T,EW,enableConfigure);
Error in network/train (line 336)
[net,data,tr,err] = nntraining.setup(net,net.trainFcn,X,Xi,Ai,T,EW,enableConfigure,isComposite);
Error in untitled (line 52)
net=train(net,InpsM,OutsM,'useParallel','yes');
SPG
Found this, albeit on the trainnetwork page and not train, but it appears to still be applicable.
"To train a network using categorical features, you must first convert the categorical features to numeric."

Sign in to comment.

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Products

Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!