How do I read a csv file that has text and numbers?

1 view (last 30 days)
Bill
Bill on 1 Jun 2017
Edited: dpb on 2 Jun 2017
I have a large csv file that contains rows of numbers and rows of text in a known pattern ( 6 rows of text then 2000 rows of numbers, and repeating). I want to read the numbers and remove the text. The file is too large to process as and Excel file as there are over 1.5 million lines in the file (xlsread might easily separate the numbers and text but for the file size). csvread expects files with only numbers, fgetl reads one line at a time so may take a while. Does anyone have an idea about how to do this?

Answers (1)

dpb
dpb on 2 Jun 2017
Edited: dpb on 2 Jun 2017
doc textscan
Was busy last night, sorry...
"... 6 rows of text then 2000 rows of numbers ... want to read the numbers"
If you know number of columns it's pretty simple--
fmt=repmat('%f',1,nCol); % format for the numeric rows
dat=zeros(VERRYBigNo, nCol); % allocate some room
i1=1;
nRow=2000;
fid=fopen('yourfile.csv','r');
ix1=1; ix2=nRow; % pointers to output array
while ~feof(fid)
dat(ix1:ix2,:)=cell2mat(fid,fmt,2000,'headerlines',6, ...
'delimiter',',', ...
'collectoutput',1)); % read the numeric group
ix1=ix2+1; ix2=ix2+nRow; % update counters
end
fid=fclose(fid);
Need error checking for array size over bounds, etc., ...
Also look at processing the data in the chunks as you read it rather than the whole thing in memory at once or at least creating a .mat or stream file as you go that can be used with memmapfile or matfile. Later versions all include support for tall arrays

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!