How to use setvaropts for strangely-formatted text file
Show older comments
Hello All,
Long time reader, first time asker - appreciate the great wealth of knowledge here!
I'm trying to use the "readtable" funciton to import data from a large (~600 MB) CSV file. The file contains separate columns for Date, Time, and a bunch of instrument channels. I've successfully used "detectImportOptions" to get the channel listing, eliminate unwanted columns using "SelectedVariableNames", and set the "VaribleTypes" for all of the desired columns. Note: by default, the function correclty flags the "Time" column as a duration, but the "Date" and individual data columns get flagged as "char" by default.
Here's an example line from the file, showing the header row, units row, and subsequent columns. I'm wanting to import "Date" (as a date/time), "Time" (as the default duration), and each of hte "CHU..." columns as a data channel.
Number,Date,Time,us,CHU1-1,CHU1-2,CHU1-3,CHU1-4,CHU1-5...
NO.,Date,Time,us,"uE","uE","uE","uE","uE",...
1,2014/10/30,17:56:41, 0,+ 194 ,+ 210 ,+ 196 ,- 180 ,+ 17 ,...
For the "Date" variable, I'm attempting to use the following code:
opts=setvaropts(opts,'Date','DatetimeFormat','yyyy/MM/dd');
However, despite the above, all of the dates end up importing as NaT. For the "CHU..." channels, I have changed the VariableTypes to double:
idx=find(contains(vnms,'CHU'));
ivars=[ivars idx];
opts.VariableTypes(idx)={'double'};
However, this also is not working, as I'm getting NaN for all values.
Can someone help me with the correct format/inputs to the "setvaropts" function to import the following?
- "2014/10/30" as Date/Time
- "+ 194 " as Signed Double.
Thanks much for any help!
Mark
------------------
Edit: attached a copy of a smaller-sized file (cannot upload the full 600 MB CSV that I'm actually trying to read, but I have verified that this one is formatted the same and returns similar results).
Here's the current version of my import code ('fpath' and 'fname' variables are identified from "uigetfile" call earlier in the script):
%determine import options and adjust
warning off;
opts=detectImportOptions(fullfile(fpath,fname));
ivars=[];
vnms=opts.VariableNames;
idx=find(ismember(vnms,'Date'));
ivars=[ivars idx];
opts.VariableTypes{idx}='datetime';
opts=setvaropts(opts,'Date','DatetimeFormat','yyyy/MM/dd');
idx=find(ismember(vnms,'Time'));
ivars=[ivars idx];
opts.VariableTypes{idx}='duration';
idx=find(contains(vnms,'CHU'));
ivars=[ivars idx];
opts.VariableTypes(idx)={'double'};
opts.SelectedVariableNames=vnms(ivars);
%read from CSV table
tt=readtable(fullfile(fpath,fname),opts);
4 Comments
"Can someone help me with the correct format/inputs to the "setvaropts" function to import the following?"
Yes, once you upload a sample data file by clicking the paperclip button.
As a "Long time reader" you should know that not providing a representative data file to work with just delays you getting a working solution. Showing part of a file here is not sufficient: real files have strange newline characters, unexpected hidden characters, special characters, trailing stuff, formatting that changes halfway through the file...
If I had ten cents for every time someone wrote "my file looks exactly like this..."
Mark Jaeger
on 9 Feb 2024
Just to confirm: the block of data you wish to import starts on line 120 (header) & 122 (data) to the EOF ?
Those are... rather strangely formatted numbers. Out of personal curiosity: what produced that number format? Do you know what language or formatting commands were used?
Mark Jaeger
on 9 Feb 2024
Accepted Answer
More Answers (1)
You need to set the 'InputFormat' instead of the 'DatetimeFormat'.
opts = detectImportOptions('chuSampleFile.csv');
opts = setvartype(opts,'Date','datetime');
opts = setvaropts(opts,'Date','InputFormat','yyyy/MM/dd');
As for converting your numeric data to doubles, I think that is easiest done after the table has been imported. To capture the sign and number, read these in first as strings.
idx=contains(opts.VariableNames,'CHU');
opts.VariableTypes(idx)={'string'};
data = readtable('chuSampleFile.csv',opts)
data(:,idx) = varfun(@(x)erase(x,' '),data,"InputVariables",idx)
data = convertvars(data,idx,'double')
3 Comments
Testing on the sample file provided.
opts = detectImportOptions('DEFAULT_CP3.CSV');
opts = setvartype(opts,'Date','datetime');
opts = setvaropts(opts,'Date','InputFormat','yyyy/MM/dd');
idx=contains(opts.VariableNames,'CHU');
opts.VariableTypes(idx)={'string'};
data = readtable('DEFAULT_CP3.CSV',opts);
data(:,idx) = varfun(@(x)erase(x,' '),data,"InputVariables",idx);
data = convertvars(data,idx,'double')
Mark Jaeger
on 9 Feb 2024
Cris LaPierre
on 9 Feb 2024
Edited: Cris LaPierre
on 9 Feb 2024
The 'can't convert cell to double' happens if you do not set the numbers to import as strings.
idx=contains(opts.VariableNames,'CHU');
opts.VariableTypes(idx)={'string'};
You could instead just set the text type to be string
opts = detectImportOptions('DEFAULT_CP3.CSV','TextType','string');
opts = setvartype(opts,'Date','datetime');
opts = setvaropts(opts,'Date','InputFormat','yyyy/MM/dd');
data = readtable('DEFAULT_CP3.CSV',opts);
idx = contains(data.Properties.VariableNames,'CHU');
data(:,idx) = varfun(@(x)erase(x,' '),data,"InputVariables",idx);
data = convertvars(data,idx,'double')
Categories
Find more on Data Import and Analysis in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!