Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Working with large data files

Subject: Working with large data files

From: crazee

Date: 5 Feb, 2010 16:16:09

Message: 1 of 5

Hello,

I am working with huge ASCII files in Matlab (500MB-2GB per text file). The sampling frequency of my data is 900Hz, and the ASCII file has 10 columns.

I have made a GUI which plots either 10 seconds (=9000 samples) or 30 seconds (=27000 samples) of data (first two columns) at a time, and the user presses left or right button to see the previous or next data.

Because of huge file sizes, it is impossible for me to load the full data in Matlab in the starting itself and then plot the necessary data. Currently I am loading the data using the code is below:

fetchtime = 10;
fid = fopen('C:\xxx.txt','r');
fs = 900;
eeg = zeros(1:fetchtime*fs,1:2);
for i = 1:fetchtime*fs
    tline = fgetl(fid);
    if ~ischar(tline)
        flag_complete = 1;
        break;
    end
    tempdata = sscanf(tline, '%f%*c');
    eeg(i,1:2) = tempdata(1:2);
end
fclose(fid);

However it takes ~1 sec and ~7 sec for this code to run for fetchtimes of 10 sec and 30 sec respectively. This is huge and it takes a long time for the plot to refresh when the user presses the left/right button and hence gets really frustrating for the user (if the user wants to scan through the data fast).

I had an idea that I could convert the ASCII files to MAT files from before; but is it possible to load only part of a variable from MAT file?

Any other suggestions/ideas would be really appreciated.

Looking forward to your replies.

Thanks.

Subject: Working with large data files

From: crazee

Date: 5 Feb, 2010 16:29:04

Message: 2 of 5

I am adding two more lines from my original code to explain why I am using fgetl:

fetchtime = 10;
fs = 900;
fid = fopen('C:\xxx.txt','r');
fseek(fid, filepositions(end), 'bof');
eeg = zeros(1:fetchtime*fs,1:2);
for i = 1:fetchtime*fs
    tline = fgetl(fid);
    if ~ischar(tline)
        flag_complete = 1;
        break;
    end
    tempdata = sscanf(tline, '%f%*c');
    eeg(i,1:2) = tempdata(1:2);
end
filepositions = [filepositions; ftell(fid)];
fclose(fid);

Subject: Working with large data files

From: Andres

Date: 5 Feb, 2010 17:19:04

Message: 3 of 5

"crazee " <crazee@mathworks.com> wrote in message <hkhh0g$j84$1@fred.mathworks.com>...
> [..]
> for i = 1:fetchtime*fs
> tline = fgetl(fid);
> if ~ischar(tline)
> [..]

Some seconds for reading a few thousand lines is very slow indeed. The reason is fgetl used in the loop (e.g. check that in the profiler). Try to avoid the loop and use textscan or txt2mat (from the file exchange) for a fast data import. The latter has built in an option to import a specific range of rows only.

Subject: Working with large data files

From: Walter Roberson

Date: 5 Feb, 2010 19:08:55

Message: 4 of 5

crazee wrote:
> I am adding two more lines from my original code to explain why I am
> using fgetl:
>
> fetchtime = 10;
> fs = 900;
> fid = fopen('C:\xxx.txt','r');
> fseek(fid, filepositions(end), 'bof');
> eeg = zeros(1:fetchtime*fs,1:2);

zeros does not accept vectors as its arguments. Probably you meant

eeg = zeros(fetchtime*fs, 2);

> for i = 1:fetchtime*fs
> tline = fgetl(fid);
> if ~ischar(tline)
> flag_complete = 1;

You do not initialize this flag_complete elsewhere so it will not exist if you
reach the end of the for loop before reaching the end of file (or error.)

> break;
> end
> tempdata = sscanf(tline, '%f%*c');
> eeg(i,1:2) = tempdata(1:2);

If you are only going to use the first two values of tline, then why not use

tempdata = sscanf(tline, '%f%*c%f', 1);

That would avoid scanning and converting the rest of the elements on the line.

> end
> filepositions = [filepositions; ftell(fid)];
> fclose(fid);

Subject: Working with large data files

From: crazee

Date: 5 Feb, 2010 20:18:04

Message: 5 of 5

Thanks everybody. I have successfully solved the problem. I used fscanf. For some reason, earlier I thought that using fscanf, I won't be able to start reading from the middle of a file.

Thanks again.


Walter Roberson <roberson@hushmail.com> wrote in message <hkhqr6$4ao$1@canopus.cc.umanitoba.ca>...
> crazee wrote:
> > I am adding two more lines from my original code to explain why I am
> > using fgetl:
> >
> > fetchtime = 10;
> > fs = 900;
> > fid = fopen('C:\xxx.txt','r');
> > fseek(fid, filepositions(end), 'bof');
> > eeg = zeros(1:fetchtime*fs,1:2);
>
> zeros does not accept vectors as its arguments. Probably you meant
>
> eeg = zeros(fetchtime*fs, 2);

Thanks. I have corrected this.

>
> > for i = 1:fetchtime*fs
> > tline = fgetl(fid);
> > if ~ischar(tline)
> > flag_complete = 1;
>
> You do not initialize this flag_complete elsewhere so it will not exist if you
> reach the end of the for loop before reaching the end of file (or error.)

I had initialized it in my main program. But since extracting the data was a separate function in my matlab code, I forgot to copy paste this initialization.

>
> > break;
> > end
> > tempdata = sscanf(tline, '%f%*c');
> > eeg(i,1:2) = tempdata(1:2);
>
> If you are only going to use the first two values of tline, then why not use
>
> tempdata = sscanf(tline, '%f%*c%f', 1);
>
> That would avoid scanning and converting the rest of the elements on the line.
>
> > end
> > filepositions = [filepositions; ftell(fid)];
> > fclose(fid);

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us