How can I group rows based on value of a column?

31 views (last 30 days)
I know some programming language but I'm completely unfamiliar with MATLAB, since I've just started to learn MATLAB since yesterday.
Suppose I have data set that contains time, location, and temperature for each row. Data for all the location are stored in a same data file named 'temperature.csv'. I'd like to know the similarity of temperature time series between every combination of each location with calculating correlation of the data.
I read the data using readtable Temp = readtable('temperature.csv');
I stuck here, because I could not find how I can group the data based on location. I'd like to know good way to do like this.
Thanks in advance.

Answers (1)

Walter Roberson
Walter Roberson on 24 May 2015
You can do it like this, assuming the column names include "locations" and "temperature"
locations = Temp.location;
uniquelocs = unique(locations);
for K = 1 : uniquelocs
belongs_to_loc = locations == uniquelocs(K);
Temp_for_loc = Temp(belongs_to_loc);
%now do something with the information you extracted, which is in table() form
mean_temp(K) = mean(Temp_for_loc.temperature); %example
end
The above can be used for general calculations. Some of the specific calculations can be done much more efficiently:
locations = Temp.location;
[uniquelocs, ua, uc] = unique(locations);
temperatures = Temp.temperature;
mean_temperatures = accumarray(uc, temperatures, [], @mean);
accumarray() is a very useful routine that you should read about.
  2 Comments
Yasuhiko Watanabe
Yasuhiko Watanabe on 24 May 2015
Thank you very much!
I'll play around unique and accumarray. I'm looking for the latter way, because mathematical idea can be easily manipulated like function, not like procedure.
Walter Roberson
Walter Roberson on 24 May 2015
Note that a lot of the time you need to use the three-output form of unique(), in order to transform what might potentially be floating point and arbitrary range (possibly including negative) into category numbers. In some cases you might know that the data is positive integer valued and not very spread out; in such cases, unique() might not be needed.
In situations where data needs to be quantized, the two-output form of histc() is very useful for finding category numbers.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!