Reading a CSV file into a 2D cell array of strings

5 views (last 30 days)
I'm looking for a computationally efficient way of reading a CSV file containing numbers and strings, with a variable number of header lines before the column titles, and varying columns. Once you get down to the column data, these fles are usually a couple hundred columns and several hundred thousand lines. I'd like to return everything as a two-dimensional cell array of strings (both the text and the numbers). I can read the entire file pretty quick into a 1D string delimited by commas and EOL characters, and I can use strsplit to split the 1D string on the EOLs into a 1D cell array where each index is another line. I can, in turn, loop and use strsplit to split each line on the commas and build up a two dimensional cell array of strings dimensioned {row,col}. But this last step is amazingly slow.
Is there any 2D equivalent to strsplit where I can specify two different delimiters to split a 1D char array where it will split on one delimiter for rows and the other delimiter on columns?
  3 Comments
Stephen23
Stephen23 on 5 May 2022
Use READCELL. If required first open the file and read lines to find the end of the header, then close and READCELL.
John Feiereisen
John Feiereisen on 6 May 2022
Splitting on comma and EOLs then reshaping works, thanks. Never thought of that. But it turns out it's about as slow as the two-step splitting I was doing. And sadly, readcell came out in R2019a, and the newest Matlab version I have access to is R2018b.
I just cobbled together something that reads the header lines and the line with all the column titles (many of the titles are not valid Matlab variable names), closes the file, then uses readtable to read the column data into a table. Then, given the title of a column I'm looking for, I pull that column out of the table and pass it through table2array. I only need about a dozen columns out of a couple hundred, and I never know which order they'll be in, but I do know the titles are always the same. This seems to work pretty quick on files 1.1 million lines long. There's always a way.
Thanks for the ideas.

Sign in to comment.

Answers (0)

Categories

Find more on Cell Arrays in Help Center and File Exchange

Products


Release

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!