Read/Write large CSV file
22 views (last 30 days)
Show older comments
I write a matrix of size 1721x196609 to CSV file using csvwrite. Now, when I read this file using csvread command its give me the array of 338364089x1, While I need the original size 1721x196609. However, when I reduce the matrix size to 1721x96000 which is almost the half, it works perfectly. My question is, how I can get the original size matrix when I read the csv file?
Thank you in advance.
2 Comments
OCDER
on 9 Jul 2018
Can you show us the code? It's odd that csvwrite & csvread will work differently based on the matrix size.
Accepted Answer
OCDER
on 9 Jul 2018
Edited: OCDER
on 9 Jul 2018
Instead of saving as csv, perhaps saving it as a binary file would be better for transporting data - unless, a human is going to read this data manually...
Try this:
M = zeros(1721, 196609);
FileName = 'csvLargeFile.dat';
%To write
FID = fopen(FileName, 'w');
fwrite(FID, M, 'double');
fclose(FID);
%To load
FID = fopen(FileName, 'r');
A = fread(FID, [1721, Inf], 'double');
fclose(FID);
3 Comments
OCDER
on 9 Jul 2018
Another way is to make a custom file format that stores the file size as the first 2 double of the stream file.
FID = fopen(FileName, 'w');
fwrite(FID, size(M), 'double'); %First 2 double is the size of the matrix
fwrite(FID, M, 'double');
fclose(FID);
FID = fopen(FileName, 'r');
Size = fread(FID, 2, 'double'); %Get the first 2 double and assume it's the size
A = fread(FID, [Size(1), Size(2)], 'double');
fclose(FID);
But yes, the .mat file would be best for transporting data acrross matlab sessions. Would need the '-v7.3' option in this case for >2GB matrix.
save('myLargeMatrix.mat', M, '-v7.3')
More Answers (1)
dpb
on 9 Jul 2018
Edited: dpb
on 9 Jul 2018
Confirmed behavior w/ R2017b; it's an issue with record length and textscan it appears...I didn't explore just where it actually breaks.
csvread simply calls dlmread with the comma delimiter and dlmread uses textscan internally with the default empty format string which normally will return the array shape as found in the file.
Looks like time for bug report...apparently internal logic has some line limitation in record size.
xlsread returns right data for the subsection it reads but only a 2x16384 subset. That's in the COM engine so not a reportable bug to TMW; I don't know what modern Excel lengths are; I thought they had been moved up to 32-bit but whether that really works or not I didn't try directly.
venerable textread is trying but hasn't yet returned to command prompt after a couple minutes...
One could try specific format string in textscan and see if that's a workaround; of course that presumes one know the record count a priori. One could scan a first record and determine that by using fgetl and sum(fgetl(fid)==',') to count delimiters and reshape based on return.
ADDENDUM Had to force-close ML to terminate textread...post that, explicit use of textscan shows--
fmt=repmat('%f',1,length(x)); % x=rand(2,196609);
fid=fopen('atif.csv');
y=cell2mat(textscan(fid,fmt,'delimiter',',','collectoutput',1));
whos y
Name Size Bytes Class Attributes
y 2x196609 3145744 double
Explicit format string works as expected
frewind(fid)
y=cell2mat(textscan(fid,'','delimiter',',','collectoutput',1));
whos y
Name Size Bytes Class Attributes
y 393218x1 3145744 double
Problem is in the internal default (and afaict currently undocumented although used to be in an example) of no explicit format string returning the shape of the input file breaking at some undetermined record length.
fid=fclose(fid);
1 Comment
Jan
on 11 Jul 2018
Atif Shah wrote:
Thank you for nice explanation.
@Atif Shah: Please use flags only to inform admins and editors about inappropriate content like spam or rudeness.
See Also
Categories
Find more on Text Files in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!