Read/Write large CSV file

48 views (last 30 days)
Atif Shah
Atif Shah on 9 Jul 2018
Commented: Jan on 11 Jul 2018
I write a matrix of size 1721x196609 to CSV file using csvwrite. Now, when I read this file using csvread command its give me the array of 338364089x1, While I need the original size 1721x196609. However, when I reduce the matrix size to 1721x96000 which is almost the half, it works perfectly. My question is, how I can get the original size matrix when I read the csv file?
Thank you in advance.
Atif Shah
Atif Shah on 9 Jul 2018
I am using the following code for reading and writing these matrices M.
csvwrite('csvLargeFile.csv', M);
read_matrix = csvread('csvLargeFile.csv');

Sign in to comment.

Accepted Answer

OCDER on 9 Jul 2018
Edited: OCDER on 9 Jul 2018
Instead of saving as csv, perhaps saving it as a binary file would be better for transporting data - unless, a human is going to read this data manually...
Try this:
M = zeros(1721, 196609);
FileName = 'csvLargeFile.dat';
%To write
FID = fopen(FileName, 'w');
fwrite(FID, M, 'double');
%To load
FID = fopen(FileName, 'r');
A = fread(FID, [1721, Inf], 'double');
Atif Shah
Atif Shah on 10 Jul 2018
Edited: Atif Shah on 10 Jul 2018
Thank you! Yes, it's better to save as mat files.

Sign in to comment.

More Answers (1)

dpb on 9 Jul 2018
Edited: dpb on 9 Jul 2018
Confirmed behavior w/ R2017b; it's an issue with record length and textscan it appears...I didn't explore just where it actually breaks.
csvread simply calls dlmread with the comma delimiter and dlmread uses textscan internally with the default empty format string which normally will return the array shape as found in the file.
Looks like time for bug report...apparently internal logic has some line limitation in record size.
xlsread returns right data for the subsection it reads but only a 2x16384 subset. That's in the COM engine so not a reportable bug to TMW; I don't know what modern Excel lengths are; I thought they had been moved up to 32-bit but whether that really works or not I didn't try directly.
venerable textread is trying but hasn't yet returned to command prompt after a couple minutes...
One could try specific format string in textscan and see if that's a workaround; of course that presumes one know the record count a priori. One could scan a first record and determine that by using fgetl and sum(fgetl(fid)==',') to count delimiters and reshape based on return.
ADDENDUM Had to force-close ML to terminate that, explicit use of textscan shows--
fmt=repmat('%f',1,length(x)); % x=rand(2,196609);
whos y
Name Size Bytes Class Attributes
y 2x196609 3145744 double
Explicit format string works as expected
whos y
Name Size Bytes Class Attributes
y 393218x1 3145744 double
Problem is in the internal default (and afaict currently undocumented although used to be in an example) of no explicit format string returning the shape of the input file breaking at some undetermined record length.
  1 Comment
Jan on 11 Jul 2018
Atif Shah wrote:
Thank you for nice explanation.
@Atif Shah: Please use flags only to inform admins and editors about inappropriate content like spam or rudeness.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!