reordercats() confusion.

Question

0 votes

exampletbl.mat

Hello,

Example data attached.

When selecting and assigning a portion of this table to another variable, MATLAB seems to remember that there were additional categories not included in the new table.

load("exampletbl.mat")
categories(a.name) % there are 15 categories
selection = a(a.type == "i",:) % select a portion of original table 
categories(selection.name) % shows there to still be 15 cateogires
unique(selection.name) % when there should be 9....

Why are there still 15 categories?

I then wish to reorder the new categoires (the order of which I determine from data in other columns)

selection.name = reordercats(selection.name, {'K','I','H','G','F','E','D','C','B'})

Returns the following:

% ERROR using categorical/reorder cats (line 38) 
% NEWORDER MUST BE A PERMUTATION OF THE EXISITNG CATEGORIES 

It is a permutation of the exisitng categories.

However,

% if i include ALL 15 categories in "a"
selection.name = reordercats(selection.name,{'K','I','H','G','F','E','D','C','B','A','J','L','M','N','O'})
% this works
categories(selection.name) % but there are still 15
unique(selection.name) % 9 - in the correct order.

In the dataset that I am using (more columns, more categories and more rows than example given here) I am often creating a "selection" table based on multiple criteria, I then wish to reorder the categories based on another column, or two, so that when plotting on a categorical axis they are in the desired order. Should I in fact be sorting the complete table (here, "a") based on my criteria first, then reordering the categories, and then creating a "selection" table?

If anyone can shed light

I hope the above makes sense.

Thank you,

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Walter Roberson on 5 Jul 2021

Open in MATLAB Online

0 votes

Each categorical() creates a new enumerated datatype, and the order and associated values for the names exists as metadata that lives separately from the information about a particular array of values that uses the categories.

When you index into a categorical array, you get back a subset of values but the values keep the same datatype information.

If what you want to have happen were the way categories worked then if you did

A = categorical({'yes', 'no'}) 
A(1)==A(2)

then under your proposal the result of the comparison would have to be true. A(1) would, in your proposal, collapse the categorical down to just the single value represented, a single value with internal code 1 (categorical values are stored according to the offset of their name in the list of category names). A(2) would likewise have to collapse down to a single value with internal code 1. Then compare the codes, both 1, and you would say that the values must be equal.

Or create a categorical that uses a name order that is not alphabetical. Then take [A(1),A(2)]. If the categories automatically collapse as per your proposal, then the information about relative ordering must be discarded, and then if it goes by the names to build a larger categorical when you use [A(1),A(2)] then it would lose the name ordering information because it would use the default alphabetical order in putting them together. This would lead to the situation where [A(1),A(2)] was not the same as A(1:2), which is an obvious problem.

So the way you want categorical to behave is not a good way to have categorical behave.

4 Comments
Show 2 older comments Hide 2 older comments

Peter Perkins on 27 Jul 2021

Edited: Peter Perkins on 28 Jul 2021

[I edited this part of my original response because it was confusing the real implementation with the straw man that Walter described.] Actually, what Walter says is not quite true. categorical comparison is conceptually based on the category names, not the internal codes. The comparison is implemented using the internal codes, but only after combining the category name lists from the two arrays being compared. So in the real implementation, A(1)'s category is 'yes', A(2)'s is 'no', corresponding to 1 and 2 (because the categories are preserved in those two temporary subarrays), and so not equal.

Don't think about all that.

"MATLAB seems to remember that there were additional categories not included": Think about categorical this way: it lets you define the entire universe of possible values. That complete universe is there even if no elements of the categorical array contain some of the possible values. So, e.g., you can count up elements, and find out that while you have 35 smalls and 26 larges, you have no mediums. That's what sets categorical apart form an array of strings.

If you want unused categories to go away (and in many cases, you don't) use removecats.

Walter Roberson on 27 Jul 2021

Thanks, Peter, that is a useful clarification.

Edward Holt on 28 Jul 2021

I hadn't come across removecats, and now I have. Thank you @Peter Perkins.

Sign in to comment.

reordercats() confusion.

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

4 Comments
Show 2 older comments Hide 2 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

reordercats() confusion.

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

4 Comments Show 2 older comments Hide 2 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

4 Comments
Show 2 older comments Hide 2 older comments