Structure array or Dataset

6 views (last 30 days)
Nuno Martins
Nuno Martins on 9 Mar 2012
Hi everyone,
My main objective is to organize data measured from 2 sonic anemometers. This data is divided in 10 minute files with 6000 lines and 4 columns for each sensor and is generated continuously. As so, the amount of data is quite big when trying to analyse longer periods.
The raw time-series information doesn't need to be accessible all the time, but only certain statistical information obtained after some processing. So, my idea is to organize the data in a structure array or dataset with a path name and file name pointing to the physical location of the original file and the statistical information obtained from the data processing. So, each data file would be represented by a line with the file location, time stamp, mean speeds, mean directions and others properties.
It would be very important for me to be able to easily identify data files with certain characteristics, for example: every file measured in the month of march with mean wind speed above 5m/s; or every file where wind speed is between 3 and 4m/s and direction between 90º and 180º.
I've been using structures for some time and know that this conditional selection can be an hard task. On the other side, I've never used datasets, but noticed that they take significantly more space and cannot be opened in a regular text editor.
I would like to ask you what is the best option for my specific case. Should I use structure array or dataset?
Thank you all in advance,
Nuno
  1 Comment
Laurens Bakker
Laurens Bakker on 10 Mar 2012
Hi Nuno,
In what context did you find datasets using more memory than structures? From my experience, data sets are usually more memory-efficient...
Cheers,
Laurens

Sign in to comment.

Accepted Answer

Oleg Komarov
Oleg Komarov on 10 Mar 2012
Try this example and tell me if it is viable.
Suppose you have n files with these features:
n = 100;
directory = arrayfun(@(x) sprintf('C:\\mydata\\something%03.f.dat',x),1:n,'un',0);
windspeed = rand(n,1)*20;
timestamp = now-abs(randn(n,1)*500);
direction = randi([0 360],n,1);
You will have a n by 1 structure
s = struct('directory',directory.','windspeed',num2cell(windspeed),...
'timestamp',num2cell(timestamp),'direction',num2cell(direction));
Or a dataset
dt = dataset({directory','directory'},{windspeed,'windspeed'},...
{timestamp,'timestamp'},{direction,'direction'});
Comparing sizes (with n = 100)
dt : 14.66 KB
s : 31.11 KB
Query 1: timestamp in march, windspeed > 5
idx = [s.windspeed] > 5 & month([s.timestamp]) == 3;
{s(idx).directory}'
idx = dt.windspeed > 5 & month(dt.timestamp) == 3;
dt.directory(idx)
Query 2: 3 < windspeed < 4 and 90 < direction < 180
idx = [s.windspeed] > 3 & [s.windspeed] < 4 & [s.direction] > 90 & [s.direction] < 180;
{s(idx).directory}'
idx = [dt.windspeed] > 3 & [dt.windspeed] < 4 & [dt.direction] > 90 & [dt.direction] < 180;
dt.directory(idx)
Note that month is a Financial Toolbox function but you can easily write one yourself. Also, making between comparisons is cumbersome but you can pack everything in ad hoc functions:
function idx = between(stuctOrDataset, fieldname, range)
Where you could allow for multiple fieldname and range selections.
  2 Comments
Nuno Martins
Nuno Martins on 10 Mar 2012
Thank you for your answer.
So I see that I can access data in a similar fashion to both structures and datasets.
Do I have any advantage in adopting one or the other (apart form the memory efeciency of datasets)?
Oleg Komarov
Oleg Komarov on 10 Mar 2012
Datasets come with many specific functions that may come in handy. For example, the JOIN functionalities (basically a wrapper for ismember calls and subsequent indexing selections) allow you to merge different datasets or perform ad hoc selections.
However, as you have seen, manipulating structures requires just a marginal additional effort, so I cannot say with full confidence that the functions that come with datasets are really the discriminant.

Sign in to comment.

More Answers (0)

Categories

Find more on Tables in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!