How to separate a table into sub-tables by unique categories?
Show older comments
Hi all,
I have a table with the following categories:
(1) Date (as Datetime)
(2) Location (as string)
(3) Value (as integer)
I want to separate the table into multiple tables based on location. So if I have location = 'Utah', 'California', 'Washington', then I want three tables where the only location is the one it was filtered by.
So, basically, I want tables containing all of the information for the individual locations.
Let's say I have a table T with 10 rows and 3 locations where location 1 has 2 rows, location 2 has 4 rows, and location 3 has 4 rows. How would I write code to get 3 tables (2x3, 4x3, 4x3)?
Thank you!
9 Comments
Erin Winkler
on 17 Sep 2018
Guillaume
on 17 Sep 2018
This is generally a bad idea. Why do you want to do that? If it's to calculate some statistic according to the location then separating the data will make it much harder, not easier.
Erin Winkler
on 17 Sep 2018
Erin Winkler
on 17 Sep 2018
Adam Danz
on 17 Sep 2018
It's not that it's harder; it's dirtier and more prone to error. It's generally bad practice to separate datasets merely to apply grouped statistics or analysis.
Instead, create a logical column vector that selects the pertinent rows of your table and use that to select the data you'd like to analyze. If your table is named 'T',
selectedCityIdx = strcmp(T.Location, 'Utah');
' selectedCityIdx' will be a logical vector that selects the rows where Utah is the location. Use that to do your analysis.
utahMean = mean(T.Value(selectedCityIdx));
Erin Winkler
on 17 Sep 2018
Erin Winkler
on 17 Sep 2018
Erin Winkler
on 17 Sep 2018
Guillaume
on 17 Sep 2018
I fixed it using char
Which is the wrong way to fix it (slow). Comparison with categorical is done with ==, not strcmp.
T.Location == 'Utah'
However, the whole approach is wrong. As I've demonstrated in my answer, Matlab has much better tools to perform calculation per location, for all locations in one go. Details of what exactly you want to do would allow me to refine my answer.
Answers (1)
Guillaume
on 17 Sep 2018
As said, it is much easier to calculate statistics by location if you don't split the table. You don't need to filter the table according to a given city either. For example, to calculate the mean of value for each location, this is trivially done:
%demo table according to your description
T = table(repelem({'Utah'; 'California'; 'Washington'}, [2 4 4]), randi(20, 10, 1), 'VariableNames', {'Location', 'Value'})
LocationMean = varfun(@mean, T, 'InputVariables', 'Value', 'GroupingVariables', 'Location')
Try doing that in one line, if you've split the table in multiple tables.
Categories
Find more on Matrices and Arrays in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!