How to separate a table into sub-tables by unique categories?

Hi all,
I have a table with the following categories:
(1) Date (as Datetime)
(2) Location (as string)
(3) Value (as integer)
I want to separate the table into multiple tables based on location. So if I have location = 'Utah', 'California', 'Washington', then I want three tables where the only location is the one it was filtered by.
So, basically, I want tables containing all of the information for the individual locations.
Let's say I have a table T with 10 rows and 3 locations where location 1 has 2 rows, location 2 has 4 rows, and location 3 has 4 rows. How would I write code to get 3 tables (2x3, 4x3, 4x3)?
Thank you!

9 Comments

EDIT:
Assume I didn't know how many different locations I have and I didn't know exactly what the strings were. How could I separate by unique strings?
This is generally a bad idea. Why do you want to do that? If it's to calculate some statistic according to the location then separating the data will make it much harder, not easier.
I need to compute statistics based on different locations, not cumulatively. I need information specifically only for individual locations.
I don't understand why this would be harder. I need to group individual locations anyway.
It's not that it's harder; it's dirtier and more prone to error. It's generally bad practice to separate datasets merely to apply grouped statistics or analysis.
Instead, create a logical column vector that selects the pertinent rows of your table and use that to select the data you'd like to analyze. If your table is named 'T',
selectedCityIdx = strcmp(T.Location, 'Utah');
' selectedCityIdx' will be a logical vector that selects the rows where Utah is the location. Use that to do your analysis.
utahMean = mean(T.Value(selectedCityIdx));
This looks like exactly what I want. Thank you!
Actually, that's only returning an individual logical (always just a 1x1 logical = 0). Do I need a loop?
Just kidding, answered my own question. It was categorical strings, not strings so I fixed it using char. Thanks!
I fixed it using char
Which is the wrong way to fix it (slow). Comparison with categorical is done with ==, not strcmp.
T.Location == 'Utah'
However, the whole approach is wrong. As I've demonstrated in my answer, Matlab has much better tools to perform calculation per location, for all locations in one go. Details of what exactly you want to do would allow me to refine my answer.

Sign in to comment.

Answers (1)

As said, it is much easier to calculate statistics by location if you don't split the table. You don't need to filter the table according to a given city either. For example, to calculate the mean of value for each location, this is trivially done:
%demo table according to your description
T = table(repelem({'Utah'; 'California'; 'Washington'}, [2 4 4]), randi(20, 10, 1), 'VariableNames', {'Location', 'Value'})
LocationMean = varfun(@mean, T, 'InputVariables', 'Value', 'GroupingVariables', 'Location')
Try doing that in one line, if you've split the table in multiple tables.

Categories

Asked:

on 17 Sep 2018

Commented:

on 17 Sep 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!