Hello,
Example data attached.
When selecting and assigning a portion of this table to another variable, MATLAB seems to remember that there were additional categories not included in the new table.
load("exampletbl.mat")
categories(a.name) % there are 15 categories
selection = a(a.type == "i",:) % select a portion of original table
categories(selection.name) % shows there to still be 15 cateogires
unique(selection.name) % when there should be 9....
Why are there still 15 categories?
I then wish to reorder the new categoires (the order of which I determine from data in other columns)
selection.name = reordercats(selection.name, {'K','I','H','G','F','E','D','C','B'})
Returns the following:
% ERROR using categorical/reorder cats (line 38)
% NEWORDER MUST BE A PERMUTATION OF THE EXISITNG CATEGORIES
It is a permutation of the exisitng categories.
However,
% if i include ALL 15 categories in "a"
selection.name = reordercats(selection.name,{'K','I','H','G','F','E','D','C','B','A','J','L','M','N','O'})
% this works
categories(selection.name) % but there are still 15
unique(selection.name) % 9 - in the correct order.
In the dataset that I am using (more columns, more categories and more rows than example given here) I am often creating a "selection" table based on multiple criteria, I then wish to reorder the categories based on another column, or two, so that when plotting on a categorical axis they are in the desired order. Should I in fact be sorting the complete table (here, "a") based on my criteria first, then reordering the categories, and then creating a "selection" table?
If anyone can shed light
I hope the above makes sense.
Thank you,

 Accepted Answer

Each categorical() creates a new enumerated datatype, and the order and associated values for the names exists as metadata that lives separately from the information about a particular array of values that uses the categories.
When you index into a categorical array, you get back a subset of values but the values keep the same datatype information.
If what you want to have happen were the way categories worked then if you did
A = categorical({'yes', 'no'})
A(1)==A(2)
then under your proposal the result of the comparison would have to be true. A(1) would, in your proposal, collapse the categorical down to just the single value represented, a single value with internal code 1 (categorical values are stored according to the offset of their name in the list of category names). A(2) would likewise have to collapse down to a single value with internal code 1. Then compare the codes, both 1, and you would say that the values must be equal.
Or create a categorical that uses a name order that is not alphabetical. Then take [A(1),A(2)]. If the categories automatically collapse as per your proposal, then the information about relative ordering must be discarded, and then if it goes by the names to build a larger categorical when you use [A(1),A(2)] then it would lose the name ordering information because it would use the default alphabetical order in putting them together. This would lead to the situation where [A(1),A(2)] was not the same as A(1:2), which is an obvious problem.
So the way you want categorical to behave is not a good way to have categorical behave.

4 Comments

Understood (I think).
Thank you.
Peter Perkins
Peter Perkins on 27 Jul 2021
Edited: Peter Perkins on 28 Jul 2021
[I edited this part of my original response because it was confusing the real implementation with the straw man that Walter described.] Actually, what Walter says is not quite true. categorical comparison is conceptually based on the category names, not the internal codes. The comparison is implemented using the internal codes, but only after combining the category name lists from the two arrays being compared. So in the real implementation, A(1)'s category is 'yes', A(2)'s is 'no', corresponding to 1 and 2 (because the categories are preserved in those two temporary subarrays), and so not equal.
Don't think about all that.
"MATLAB seems to remember that there were additional categories not included": Think about categorical this way: it lets you define the entire universe of possible values. That complete universe is there even if no elements of the categorical array contain some of the possible values. So, e.g., you can count up elements, and find out that while you have 35 smalls and 26 larges, you have no mediums. That's what sets categorical apart form an array of strings.
If you want unused categories to go away (and in many cases, you don't) use removecats.
Thanks, Peter, that is a useful clarification.
I hadn't come across removecats, and now I have. Thank you @Peter Perkins.

Sign in to comment.

More Answers (0)

Categories

Products

Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!