Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
importing data

Subject: importing data

From: Jessica

Date: 29 May, 2013 19:03:14

Message: 1 of 4

I am trying to use textscan to import a very large file:

fid = fopen('05-09-13_No41.csv');
M=textscan(fid, '%s %s %f %f %f %f','delimiter', '\t');

It crashes in Matlab 2010a. In Matlab 2013a, it didn't crash but it ran for 8+ hr and the data still didn't load.

Does anyone have any suggestions for speeding up the import of the file?

Subject: importing data

From: dpb

Date: 29 May, 2013 19:16:08

Message: 2 of 4

On 5/29/2013 2:03 PM, Jessica wrote:
> I am trying to use textscan to import a very large file:
>
> fid = fopen('05-09-13_No41.csv');
> M=textscan(fid, '%s %s %f %f %f %f','delimiter', '\t');
>
> It crashes in Matlab 2010a. In Matlab 2013a, it didn't crash but it ran
> for 8+ hr and the data still didn't load.
>
> Does anyone have any suggestions for speeding up the import of the file?

How large is the file, for heaven's sake????

If you really must load that much data that it is the size that is the
problem and not some other bottleneck like over a very slow network
drive or somesuch, the only real ideas I have would be

a) create the data as stream unformatted instead of formatted,
b) reduce the amount you try to read/process at a time to something
reasonable, or
c) if the machine is limited, more memory and faster processor...

A non-Matlab solution or intermediary if can't recreate the data for a)
above _might_ be to write a Fortran routine that reads the data--perhaps
w/o all the extra complexity of textscan() in Matlab it might work. You
could then rewrite it as stream file and try again.

Basically, though, need some idea of just how big the data file is to
see if that really explains the problem.

Can you use the 'N' optional argument and read in a few entries
successfully?

--

Subject: importing data

From: Jessica

Date: 29 May, 2013 20:52:24

Message: 3 of 4

dpb <none@non.net> wrote in message <ko5k5k$st0$1@speranza.aioe.org>...
> On 5/29/2013 2:03 PM, Jessica wrote:
> > I am trying to use textscan to import a very large file:
> >
> > fid = fopen('05-09-13_No41.csv');
> > M=textscan(fid, '%s %s %f %f %f %f','delimiter', '\t');
> >
> > It crashes in Matlab 2010a. In Matlab 2013a, it didn't crash but it ran
> > for 8+ hr and the data still didn't load.
> >
> > Does anyone have any suggestions for speeding up the import of the file?
>
> How large is the file, for heaven's sake????
>
> If you really must load that much data that it is the size that is the
> problem and not some other bottleneck like over a very slow network
> drive or somesuch, the only real ideas I have would be
>
> a) create the data as stream unformatted instead of formatted,
> b) reduce the amount you try to read/process at a time to something
> reasonable, or
> c) if the machine is limited, more memory and faster processor...
>
> A non-Matlab solution or intermediary if can't recreate the data for a)
> above _might_ be to write a Fortran routine that reads the data--perhaps
> w/o all the extra complexity of textscan() in Matlab it might work. You
> could then rewrite it as stream file and try again.
>
> Basically, though, need some idea of just how big the data file is to
> see if that really explains the problem.
>
> Can you use the 'N' optional argument and read in a few entries
> successfully?
>

I am able to import my data within ~30 min using:

Data=importdata('05-09-13_No41.csv');

and it shows that the data is 30186766x6.

However, this import option is not ideal because the first two columns only report the first couple of letters from the file (that is, the first cell should be "08/04/2013", but this import option only shows an "8" in the first cell). For this reason, I tried using textscan.

How do I stream my data unformatted, as suggested in (a) above?

Subject: importing data

From: dpb

Date: 29 May, 2013 21:13:01

Message: 4 of 4

On 5/29/2013 3:52 PM, Jessica wrote:
> dpb <none@non.net> wrote in message <ko5k5k$st0$1@speranza.aioe.org>...
...[snipped for brevity]...

>
> I am able to import my data within ~30 min using:
>
> Data=importdata('05-09-13_No41.csv');
>
> and it shows that the data is 30186766x6.

That would translate to almost 1400 MB -- yeah, that's sizable file. Do
you really, really, really have to work on all of these data in memory
at one time??? Are you sure you can't process it a portion at a time?

> However, this import option is not ideal because the first two columns
> only report the first couple of letters from the file (that is, the
> first cell should be "08/04/2013", but this import option only shows an
> "8" in the first cell). For this reason, I tried using textscan.

That's because importdata() presumes a rectangular array of all numeric
data and the date format of the first entry terminates the scan for that
field.

> How do I stream my data unformatted, as suggested in (a) above?

That would require recreating the data from the source.

You could also try fscanf() and see if it's any better at handling the
humongous chunk in one swell foop or not.

You might also find that using textscan w/ the optional 'N' argument
where N is some large but reasonable size (say 100k or so) would "only"
be 5-6 MB at a time. Doing it in pieces that way _might_ be faster than
the whole thing.

I suspect strongly that what's happening is that you're thrashing the
disk as the system is paging memory continuously. Have you looked at
the system monitor while doing this to see?

Which OS/ML versions and how much physical memory is installed? How
much memory free does the ML 'memory' command return at the command line?

--

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us