MATLAB Answers

0

Slow readtable and low disk activity

Asked by Mohammad Abouali on 3 Jan 2019
Latest activity Commented on by Mohammad Abouali on 7 Jan 2019
Hi,
I am doing a readtable() on a big file. However, I don't see much of a disk activity. The files loads but at a much slower speed that I am expecting. The disk activity shown by windows 10 task manager stays around 1% at a rate of 4.4 MB/s.
Note that this is an SSD drive as the windows task manager says, I am expecting much higher disk activity than this.
It seems like something is limiting MATLAB to read from disk.
any idea?

  2 Comments

Sorry, not enough information here. What format is the file? what is the size and shape of the resulting table? How long does it actualy take?
I would start by profiling the time used by readtable and checking where most of it is going. It may be that your file is in xls format and that com interaction with Excel is the slow part.
Hi Philip,
The file was a csv file and it was kinda big too (about 4GB, with millions rows of data). However, I was expecting it to load in few minutes on a SSD drive and 64GB of RAM. But, it was taking about 2hours and the disk activity wasn't showing much of any activity.
The problem turn out to be that some of the rows had "null" written instead of a number. Hence, readtable was treating those columns as a cell-char column. Once I removed those lines (using grep command of linux bash - the Ubuntu kernel of windows), the load time is now around 2 minutes (138.075703 seconds to be more exact using tic-toc).
Although still during this 138 seconds I don't see any disk activity on windows task manager and for Matlab it is reported that the disk ativity is either 0.1MB/s or even 0 MB/s. So, there should be something wrong with windows measuring the disk activity of Matlab, because there is no way that you could load 4GB csv file within 138 seconds if the disk activity was really 0.1 MB/s or most of the time zero. (138 seconds for 4GB of file would average to about 30MB/s ignoring other activities such as parsing, which sounds about a portion of the total bandwidth of SSD drive)
Thank you for your response; It was helpful to find out what is causing the slowness.

Sign in to comment.

Tags

Products


Release

R2018b

1 Answer

Answer by Philip Borghesani on 7 Jan 2019
 Accepted Answer

With 64GB ram Windows can easily cache a 4GB file. The first time you run the code, or when another program accesses the file may be the only time the disk is read.

  1 Comment

Yes indeed. Apparently when those few lines of null was there, instead of double, everything was processed as string and that was killing it. Also memory usage was much higher, because it was storing 1B per digit and I had around 18 digits including decimals.
Is there any plan to change readtable to treat null as NaN in csv files!?

Sign in to comment.