Appending to a saved dataset

5 views (last 30 days)
Anathea Pepperl
Anathea Pepperl on 20 Jun 2011
I'm trying to read data from a text file, do some data analysis, save the results in a dataset, and export my dataset into a .dat file using the export function.
The problem arises when I have several text files and I wind up with well over 100,000 observations and about 200 parameters. My approach right now is, I read data from the text file, save my data analysis in an interim dataset, concatenate my complete dataset with the interim, and at the end of it all I use the export function. So my code looks something like:
complete_ds = [];
for i = 1:length(textfiles),
current_file = textfiles(i);
fid = fopen(current_file);
data = ReadFile(fid);
fclose(fid);
interim_ds = AnalyzeData(data);
complete_ds = vertcat(complete_ds, interim_ds);
end
export(complete_ds, 'file', 'Allmydata.dat');
This is taking a lot of time and I'd like to be able to append to the exported dataset instead. Any suggestions? Also, I know that preallocating may help, but it is difficult to predict how much memory I want to set aside for the dataset since each text file may have a different number of observations.
  3 Comments
Image Analyst
Image Analyst on 21 Jun 2011
How many text files? How much time? Minutes? Hours? What is the difference between observations and parameters (if that matters)? You can take a guess at preallocating by looking at the file size. If you have 50,000 lines (estimated from a file size of, say, 50 kb), then preallocating say 40 or 50 thousand rows in the array would be faster than allocating none at all, even if you have to extend it a few rows or truncate it a few rows because you didn't use them all. Inside AnalyzeData(), can you possibly estimate the number of rows that interim_ds will need?
Anathea Pepperl
Anathea Pepperl on 21 Jun 2011
Jan, ReadFile is a function that I would use to read the text file and convert it into a Matlab matrix for easier "digestion" by the AnalyzeData function. If the data were put into a regular text file, it wouldn't be so bad; however, my data has missing values which are not handled well when put into a text file. Hence, the need to use the dataset array (unique to the Statistics toolbox).
Image Analyst, thanks for reminding me that I can look at the file size! This is probably going to be the best option for me.

Sign in to comment.

Answers (1)

Matt Tearle
Matt Tearle on 21 Jun 2011
If it just comes down to "I'd like to be able to append to the exported dataset instead", then here's one way to do it, but it's a bit of a nasty hack...
  1. Find the directory $MATLAB\toolbox\shared\statslib\@dataset (where $MATLAB is your installation directory -- eg C:\Program Files\MATLAB\R2011a).
  2. Copy the entire @dataset directory to somewhere local.
  3. Inside @dataset, make a copy of export.m and call it export_app.m (or whatever).
  4. Edit export_app.m. On line 1, change export to export_app. Change line 169 (in R2011a, at least -- it might be slightly different in other releases) from fid = fopen(filename,'wt'); to fid = fopen(filename,'at'); Save the file.
Then
>> export(x1,'file','testappend.dat')
>> export_app(x2,'file','testappend.dat','WriteVarNames',false)
should work for you.
Note, though, that you're now using a local version of the dataset class, so funky instabilities may ensue... Use with caution! Probably best to hide it away in a directory somewhere and go into that directory only for this purpose!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!