MATLAB Answers

Import mixed-format text file of unknown number of columns

34 views (last 30 days)
Hi, How do I import a text file that has unknown number of columns and contains various format? For example,
header1, header2, header 3, header4,...headerN
number, number, number, string,...number
number, number, number, string,...number
number, number, number, string,...number
...
I think the best bet is using fgetl and textscan, however I don't know how to deal with the unknown number of columns and therefore unknown pattern of format. I only know the first three and the last column are numeric.
Thank you very much
  2 Comments
Jason Yang
Jason Yang on 26 Apr 2017
Hi Walter, thanks for your reply. The rest of columns will be either numeric or string/text; a couple of those are in the format of date/time, but date/time will be dumped anyway. The challenge I am facing is that I cannot predict which column is numeric and which one is string because the number of inserted columns between the first and last few columns is unknown.
From the header I will know the format of data for that specific column though, and there are patterns in the headers. However, I still don't know how to read column by column for unknown numbers of columns. Thanks you.

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 26 Apr 2017
"You can also use the enclosed csv2table() that I wrote a few months ago. I designed it when a user had an UTF-16 encoded csv file that MATLAB was not automatically reading properly, so it can be used with most UTF-* encoded files as well as plain ASCII files. The code figures out which columns are numbers and which are dates, and for dates, it figures out whether you have datetime objects available and uses them if so, otherwise using serial date numbers. If the code detects that you do not even have tables (R2013b or later) then it will return a cell array instead of a table object."
  2 Comments
Walter Roberson
Walter Roberson on 26 Apr 2017
Which line of my code did the error occur on?
It is a bit difficult to debug without a sample file to work with, but if you could show the traceback I might be able to get somewhere, especially if you are able to show the output of
cellfun(@size, WhatEverVariableIsTheProblem, 'uniform', 0)

Sign in to comment.

More Answers (1)

Stephen
Stephen on 26 Apr 2017
Edited: Stephen on 26 Apr 2017
You can identify the column contents (numeric vs. string) with a few fgetl calls, before importing the data with one textscan call:
  1. open the file.
  2. read the header using fgetl.
  3. place a file marker using ftell.
  4. read the first data line using fgetl.
  5. split the line, identify the numbers using str2double.
  6. build the format string based on the identified numbers (the others can be string).
  7. rewind using fseek.
  8. call textscan using the format string.
  9. close the file.
If you just want the middle columns as strings then you won't need the second fgetl as you can just use the header to figure out the size: here is a working example:
opt = {'Delimiter',','};
fid = fopen('test.csv','rt');
hdr = fgetl(fid);
num = numel(regexp(hdr,',','split'));
fmt = repmat('%s',1,num);
C = textscan(fid,fmt,opt{:});
fclose(fid);
Tested on this file:
  2 Comments
Jason Yang
Jason Yang on 26 Apr 2017
Hi Stephen, thank you very much for your reply; your function worked nicely on reading the headers, but due to the mess of my data file (as described in my reply to Walter), the final cell array C has some problems. Your suggestion is still very useful, much appreciate your help.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!