Importing data without knowing number of columns

9 views (last 30 days)
Hello,
I have a data file (not a csv or xls) with columns that resemble this: date & time, data 1, data 2, data 3, ..., data n. The top row is text data labels, the 1st column is the date & time information, and the rest of the column data are numbers. I've been able to successfully import everything with MATLAB's import tool (I used it to make an import function), however there's a chance the columns' order and/or number of columns will be changed in the future, meaning new import functions would have to be made. I would like to avoid this. Can someone please tell me if there's a way to load the column data without knowing beforehand how many columns there are? I would also like to load the column text labels, also without knowing beforehand how many columns there are. Looking around, it seems that the number of columns to load is a necessary input, but I'm wondering if this is always true.
Thank you in advance!

Accepted Answer

Walter Roberson
Walter Roberson on 23 May 2016
With R2014b or later I recommend you parse times using the textscan %D format.
To determine the number of columns, I suggest that you fopen(), fgetl() into a variable, strsplit() that on the delimiter and count the number of results to determine the column count. Then you can use repmat() to build a format that you can then use with textscan()
  5 Comments
Walter Roberson
Walter Roberson on 24 May 2016
You know, readtable() would probably do a fine job on these kinds of files.
You can also use the enclosed csv2table() that I wrote a few months ago. I designed it when a user had an UTF-16 encoded csv file that MATLAB was not automatically reading properly, so it can be used with most UTF-* encoded files as well as plain ASCII files. The code figures out which columns are numbers and which are dates, and for dates, it figures out whether you have datetime objects available and uses them if so, otherwise using serial date numbers. If the code detects that you do not even have tables (R2013b or later) then it will return a cell array instead of a table object.
dpb
dpb on 25 May 2016
I recall that thread, Walter--that was a very nice piece of work.

Sign in to comment.

More Answers (3)

dpb
dpb on 20 May 2016
There is IF (the proverbial "big if") the data are all numeric; then textscan with an empty format field will automagically return the data in the shape it is found in the file.
Failing that, which your data file format does, unfortunately, there simply is no way to generically return any file content in any arbitrary order without knowing a priori what the field types and number of fields per record are.
The exception to the above rule is, of course, to read each record as a character string and write a parsing routine that determines the type of each column in the record and then translates that into machine form from the ASCII representation.
  2 Comments
D
D on 23 May 2016
Hi dpb, thank you very much for your helpful answer. I had 2 more thoughts:
  • Currently my date/time info is in the format "MM-DD-YYYY HH:MM:SS.FFF". Do you know if I split this column into separate "MM-DD-YYYY" and "HH:MM:SS.FFF" columns the textscan method would recognize these as numeric data?
  • Would it be possible for me to specify a number of columns greater than the max number I'd ever expect, and then just detect and delete the empty columns after all of the data is imported?
Thank you again!
dpb
dpb on 24 May 2016
Edited: dpb on 24 May 2016
  1. No because they're not valid numeric data, and
  2. No because an explicit format must match every element one-to-one in each record.
Walter pointed out the recent date formatting format string %D; again you need to know which column contains the date/time information to use it, though. It is, of course, possible to parse the date/time fields specifically to their numeric components.
I'd not really though of doing it, but in theory you could also use the dash '-' and colon in the 'delimiter' field and return them also as numeric in the output. Then the issue would be to pick out which columns contain those data which could be fairly simple or completely impossible(*) depending upon what other data were in the record if the positions were random from one file to another.
(*) Of course, there's always the recourse of parsing one record as character to determine where the tell-tale markers are, the problem comes in trying to fully detach the read from the a priori knowledge of the file structure.
The traditional manner in which the conundrum is solved is to incorporate a header into the file which indicates the following structure; the difficulty in implementing that is that that has to have been built into the tools which write the file which often aren't available...so, you're back to the same conundrum.
ADDENDUM
Of course, in your case if you can make only one a priori requirement of the data outside the date/time column are all numeric then you can with minimal grief, "have your cake and eat it, too!". As Walter and Stephen have shown, it's relatively simple to count the number of fields per record, if you can live with fixing the header content for the date and time fields to some specific string to locate them, then it's simple enough to extend those solutions to include them in any arbitrary location. Or, with those, the above idea of adding the ':-' to the delimiter string could be made to work I suspect.

Sign in to comment.


Stephen23
Stephen23 on 24 May 2016
Edited: Stephen23 on 24 May 2016
Here is some code that reads a text file with any number of columns. It works by reading the header line and using this to define the format string. This code assumes two things:
  • the first row is a header, each string is a valid fieldname.
  • the first column are dates (imported as strings), the rest are numeric (double).
The last two lines optionally convert to a structure: this allows the column order to be irrelevant.
dlm = ',';
[fid,msg] = fopen('test.csv','rt');
if fid<0
error('FOPEN could not read the file because:\n%s',msg)
end
hdr = cell2mat(textscan(fgetl(fid),'%s','Delimiter',dlm))';
fmt = ['%s',repmat('%f',1,numel(hdr)-1)];
dat = textscan(fid,fmt,'Delimiter',dlm);
fclose(fid);
vec = [hdr;dat(1),cellfun(@num2cell,dat(2:end),'UniformOutput',false)];
S = struct(vec{:});
Note that by using the structure the order of the columns becomes totally unimportant. Use the fieldnames to check to see which data is in the file:
>> fieldnames(S)
ans =
'Datestamp'
'Data1'
'Data2'
'Data3'
The rest of the data can be accessed using the standard methods for accessing data in a non-scalar structure (you can also define the fieldnames dynamically):
>> {S.Datestamp}
ans =
'2016-05-24'
'2016-05-23'
'2016-05-22'
>> S(2).Datestamp
ans = 2016-05-23
>> [S.Data1]
ans =
1 4 7
>> S(2).Data1
ans = 4
The example data file is attached here:

D
D on 31 May 2016
dpb, Walter, and Stephen, thank you very much for your help, and thank you especially for taking the time to make such thoughtful suggestions. I ended up using Walter's suggestion of the "readtable" function to read the data file into a table. I then used a "for" loop to convert my date/time string table column to numerical data:
for i=1:height(datatable(:,1))
time(i)=datenum(datatable{i,1},'mm-dd-yyyy HH:MM:SS.FFF');
end
This seems to have done the trick. Thank you very much again!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!