Trouble using textread to ignore certain elements in a .csv file (new to matlab!)
7 views (last 30 days)
Show older comments
Hi all,
I am very new to MatLab, and need some help using textread to read a .csv file, ignoring certain elements. Below is a small example of the data.
timestamp,time completed,task,set_no,x,y,time trained,stars,guess,guess (display),trials complete,trials remaining
1375988040,"Thu, 08 Aug 2013 18:54:00 GMT",aud_spatial_match_crbi,10,3,0,255,4,6,6 syls,11,0 1375988312,"Thu, 08 Aug 2013 18:58:32 GMT",aud_spatial_match_crbi,10,3,0,262,5,6,6 syls,6,5 1375989376,"Thu, 08 Aug 2013 19:16:16 GMT",digit_span_crbi,0,0,0,768,2,5,5 objects,14,0
My goal is to have textread ignore GMT" in the dates/time, since matlab seems to not recognize this (08 Aug 2013 18:54:00 GMT") as a date/time. There is also the issue of the preceding "Thu, which MatLab considers its own field due to the extra comma. This would also be great to get rid of. In addition, I'm also trying to ignore the first line.
Here is what I've attempted so far: [timestamp, thu, timecompl, task, setnum, x, y, timetrained, stars, guess, guessdisp, trialscomp, trialsrem] = textread('BrainTEST.csv','%d %s GMT"%s %s %d %d %d %d %d %d %s %d %d','delimiter',',','headerlines',1,'whitespace','')
The variable "thu" is me trying to accommodate the extra comma.
I have been confronted with the following errors:
Error using dataread Trouble reading literal string from file (row 1, field 3) ==> 08 Aug 2013 18:54:00 GMT",aud_spatial_
Error in textread (line 175) [varargout{1:nlhs}]=dataread('file',varargin{:}); %#ok<REMFF1>
Any suggestions would be greatly appreciated!
Cheers, Sean C.
1 Comment
per isakson
on 10 Aug 2013
Edited: per isakson
on 10 Aug 2013
Use textscan rather than textread, since doc says so.
Why not read the full date string with %q and remove the leading day and trailing GMT in a separate step?
Accepted Answer
Cedric
on 10 Aug 2013
Edited: Cedric
on 10 Aug 2013
You could go for
buffer = fileread('BrainTEST.csv') ;
data = textscan(buffer,'%d %s %s %s %d %d %d %d %d %d %s %d %d', ...
'delimiter', ',', 'headerlines', 1, 'whitespace', '') ;
and then instead of processing e.g.
data{3}{k}
which would be ' 08 Aug 2013 18:54:00 GMT"' in your example for k=1, you process
data{3}{k}(2:end-5)
which is '08 Aug 2013 18:54:00'.
EDIT: just as an example, here is one way to tackle this with named regexp.
buffer = fileread('BrainTEST.csv') ;
data = regexp(buffer, '(?<timestamp>\d+),.{6}(?<datetime>[^G]+)GMT",(?<task>[^,]+),(?<setnum>\d+),(?<x>\d+),(?<y>\d+),(?<timeTrained>\d+),(?<stars>\d+),(?<guess>\d+),(?<guessDisplay>[^,]+),(?<trialsComplete>\d+),(?<trialsRemaining>\d+)', 'names') ;
which generates a struct array:
>> data(1)
ans =
timestamp: '1375988040'
datetime: '08 Aug 2013 18:54:00 '
task: 'aud_spatial_match_crbi'
setnum: '10'
x: '3'
y: '0'
timeTrained: '255'
stars: '4'
guess: '6'
guessDisplay: '6 syls'
trialsComplete: '11'
trialsRemaining: '0'
>> data(2)
ans =
timestamp: '1375988312'
datetime: '08 Aug 2013 18:58:32 '
task: 'aud_spatial_match_crbi'
setnum: '10'
x: '3'
y: '0'
timeTrained: '262'
stars: '5'
guess: '6'
guessDisplay: '6 syls'
trialsComplete: '6'
trialsRemaining: '5'
etc ...
but don't go for this solution as it is way less efficient than the first one in this case of very well structured file. This would be a good solution if your file was a mix of structured and unstructured data.
3 Comments
Cedric
on 13 Aug 2013
Edited: Cedric
on 13 Aug 2013
You should actually use functions like SIZE and CLASS to investigate the nature of the objects that you are working with. Just as a reminder, cell arrays are arrays of cells and if C is a cell array, C(1) is "block indexing" cell 1 of C, which means returning a cell, whereas C{1} is indexing the cell and returning its content. I will discuss an example based on the following file content:
A header
1375988040,"Thu, 08 Aug 2013 18:54:00 GMT",aud_spatial_match_crbi,10,3,0,255,4,6,6 syls,11,0
1375988312,"Thu, 08 Aug 2013 18:58:32 GMT",aud_spatial_match_crbi,10,3,0,262,5,6,6 syls,6,5
1375989376,"Thu, 08 Aug 2013 19:16:16 GMT",digit_span_crbi,0,0,0,768,2,5,5 objects,14,0
Executing
buffer = fileread('BrainTEST.csv') ;
data = textscan(buffer,'%d %s %s %s %d %d %d %d %d %d %s %d %d', ...
'delimiter', ',', 'headerlines', 1, 'whitespace', '') ;
builds the cell array data with the following content, that we will study..
>> data
data =
Columns 1 through 6
[3x1 int32] {3x1 cell} {3x1 cell} {3x1 cell} [3x1 int32] [3x1 int32]
Columns 7 through 12
[3x1 int32] [3x1 int32] [3x1 int32] [3x1 int32] {3x1 cell} [3x1 int32]
Column 13
[3x1 int32]
>> class(data)
ans =
cell
>> size(data)
ans =
1 13
So data is a row cell array made of 13 cells.
>> class(data(1)) % () indexing => return cell.
ans =
cell
>> class(data{1}) % {} indexing => return cell content.
ans =
int32
>> size(data{1})
ans =
3 1
So the content of cell 1 is a 3x1 numeric array of int32.
>> data{1}
ans =
1375988040
1375988312
1375989376
Note that as cell arrays can be nested, sometimes the content of a cell is itself a cell array, as illustrated with the time/date string column:
>> class(data{3}) % Content of cell 3 of data is a cell array.
ans =
cell
>> class(data{3}(1)) % Cell 1 of this cell array is a cell.
ans =
cell
>> class(data{3}{1}) % Content this cell is a string.
ans =
char
The best that you can do when indexing becomes complex is to create intermediary variables, e.g.
timeDateCellArray = data{3} ;
for k = 1 : length(timeDateCellArray)
timeDate = timeDateCellArray{k} ;
timeDate_cleaned = timeDate(2:end-5) ;
fprintf('[%d] %s\n', k, timeDate_cleaned) ;
end
Running this outputs:
[1] 08 Aug 2013 18:54:00
[2] 08 Aug 2013 18:58:32
[3] 08 Aug 2013 19:16:16
Then, when indexing is clear to you, you can eliminate intermediary steps and write more concise code. With that, you should be able to manage your indexing properly. Let me know if you need more information.
Cheers,
Cedric
More Answers (1)
Walter Roberson
on 10 Aug 2013
'%s GMT' does not cause the '%s' to "back up" as necessary so that there the GMT can be there. %s reads as far as possible considering the whitespace and delimiter settings.
% 1375989376,"Thu, 08 Aug 2013 19:16:16 GMT",digit_span_crbi,0,0,0,768,2,5,5 objects,14,0
fmt = '%d%s%d%s%d%s%s%s%d%d%d%d%d%d%d%s%d%d'
fid = fopen(''BrainTEST.csv', 'rt');
datacell = textscan(fid, fmt, 'delimiter', ',', 'headerlines', 1);
fclose(fid);
timestamps = datacell{1};
timecompleteds = strcat(datacell{3}, {' '}, datacell{4}, {' '}, datacell{5}, {' '}, datacell{6});
tasks = datacell{8};
% and so on
0 Comments
See Also
Categories
Find more on Characters and Strings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!