Trouble using textread to ignore certain elements in a .csv file (new to matlab!)

7 views (last 30 days)
Hi all,
I am very new to MatLab, and need some help using textread to read a .csv file, ignoring certain elements. Below is a small example of the data.
timestamp,time completed,task,set_no,x,y,time trained,stars,guess,guess (display),trials complete,trials remaining
1375988040,"Thu, 08 Aug 2013 18:54:00 GMT",aud_spatial_match_crbi,10,3,0,255,4,6,6 syls,11,0 1375988312,"Thu, 08 Aug 2013 18:58:32 GMT",aud_spatial_match_crbi,10,3,0,262,5,6,6 syls,6,5 1375989376,"Thu, 08 Aug 2013 19:16:16 GMT",digit_span_crbi,0,0,0,768,2,5,5 objects,14,0
My goal is to have textread ignore GMT" in the dates/time, since matlab seems to not recognize this (08 Aug 2013 18:54:00 GMT") as a date/time. There is also the issue of the preceding "Thu, which MatLab considers its own field due to the extra comma. This would also be great to get rid of. In addition, I'm also trying to ignore the first line.
Here is what I've attempted so far: [timestamp, thu, timecompl, task, setnum, x, y, timetrained, stars, guess, guessdisp, trialscomp, trialsrem] = textread('BrainTEST.csv','%d %s GMT"%s %s %d %d %d %d %d %d %s %d %d','delimiter',',','headerlines',1,'whitespace','')
The variable "thu" is me trying to accommodate the extra comma.
I have been confronted with the following errors:
Error using dataread Trouble reading literal string from file (row 1, field 3) ==> 08 Aug 2013 18:54:00 GMT",aud_spatial_
Error in textread (line 175) [varargout{1:nlhs}]=dataread('file',varargin{:}); %#ok<REMFF1>
Any suggestions would be greatly appreciated!
Cheers, Sean C.
  1 Comment
per isakson
per isakson on 10 Aug 2013
Edited: per isakson on 10 Aug 2013
Use textscan rather than textread, since doc says so.
Why not read the full date string with %q and remove the leading day and trailing GMT in a separate step?

Sign in to comment.

Accepted Answer

Cedric
Cedric on 10 Aug 2013
Edited: Cedric on 10 Aug 2013
You could go for
buffer = fileread('BrainTEST.csv') ;
data = textscan(buffer,'%d %s %s %s %d %d %d %d %d %d %s %d %d', ...
'delimiter', ',', 'headerlines', 1, 'whitespace', '') ;
and then instead of processing e.g.
data{3}{k}
which would be ' 08 Aug 2013 18:54:00 GMT"' in your example for k=1, you process
data{3}{k}(2:end-5)
which is '08 Aug 2013 18:54:00'.
EDIT: just as an example, here is one way to tackle this with named regexp.
buffer = fileread('BrainTEST.csv') ;
data = regexp(buffer, '(?<timestamp>\d+),.{6}(?<datetime>[^G]+)GMT",(?<task>[^,]+),(?<setnum>\d+),(?<x>\d+),(?<y>\d+),(?<timeTrained>\d+),(?<stars>\d+),(?<guess>\d+),(?<guessDisplay>[^,]+),(?<trialsComplete>\d+),(?<trialsRemaining>\d+)', 'names') ;
which generates a struct array:
>> data(1)
ans =
timestamp: '1375988040'
datetime: '08 Aug 2013 18:54:00 '
task: 'aud_spatial_match_crbi'
setnum: '10'
x: '3'
y: '0'
timeTrained: '255'
stars: '4'
guess: '6'
guessDisplay: '6 syls'
trialsComplete: '11'
trialsRemaining: '0'
>> data(2)
ans =
timestamp: '1375988312'
datetime: '08 Aug 2013 18:58:32 '
task: 'aud_spatial_match_crbi'
setnum: '10'
x: '3'
y: '0'
timeTrained: '262'
stars: '5'
guess: '6'
guessDisplay: '6 syls'
trialsComplete: '6'
trialsRemaining: '5'
etc ...
but don't go for this solution as it is way less efficient than the first one in this case of very well structured file. This would be a good solution if your file was a mix of structured and unstructured data.
  3 Comments
Cedric
Cedric on 13 Aug 2013
Edited: Cedric on 13 Aug 2013
You should actually use functions like SIZE and CLASS to investigate the nature of the objects that you are working with. Just as a reminder, cell arrays are arrays of cells and if C is a cell array, C(1) is "block indexing" cell 1 of C, which means returning a cell, whereas C{1} is indexing the cell and returning its content. I will discuss an example based on the following file content:
A header
1375988040,"Thu, 08 Aug 2013 18:54:00 GMT",aud_spatial_match_crbi,10,3,0,255,4,6,6 syls,11,0
1375988312,"Thu, 08 Aug 2013 18:58:32 GMT",aud_spatial_match_crbi,10,3,0,262,5,6,6 syls,6,5
1375989376,"Thu, 08 Aug 2013 19:16:16 GMT",digit_span_crbi,0,0,0,768,2,5,5 objects,14,0
Executing
buffer = fileread('BrainTEST.csv') ;
data = textscan(buffer,'%d %s %s %s %d %d %d %d %d %d %s %d %d', ...
'delimiter', ',', 'headerlines', 1, 'whitespace', '') ;
builds the cell array data with the following content, that we will study..
>> data
data =
Columns 1 through 6
[3x1 int32] {3x1 cell} {3x1 cell} {3x1 cell} [3x1 int32] [3x1 int32]
Columns 7 through 12
[3x1 int32] [3x1 int32] [3x1 int32] [3x1 int32] {3x1 cell} [3x1 int32]
Column 13
[3x1 int32]
>> class(data)
ans =
cell
>> size(data)
ans =
1 13
So data is a row cell array made of 13 cells.
>> class(data(1)) % () indexing => return cell.
ans =
cell
>> class(data{1}) % {} indexing => return cell content.
ans =
int32
>> size(data{1})
ans =
3 1
So the content of cell 1 is a 3x1 numeric array of int32.
>> data{1}
ans =
1375988040
1375988312
1375989376
Note that as cell arrays can be nested, sometimes the content of a cell is itself a cell array, as illustrated with the time/date string column:
>> class(data{3}) % Content of cell 3 of data is a cell array.
ans =
cell
>> class(data{3}(1)) % Cell 1 of this cell array is a cell.
ans =
cell
>> class(data{3}{1}) % Content this cell is a string.
ans =
char
The best that you can do when indexing becomes complex is to create intermediary variables, e.g.
timeDateCellArray = data{3} ;
for k = 1 : length(timeDateCellArray)
timeDate = timeDateCellArray{k} ;
timeDate_cleaned = timeDate(2:end-5) ;
fprintf('[%d] %s\n', k, timeDate_cleaned) ;
end
Running this outputs:
[1] 08 Aug 2013 18:54:00
[2] 08 Aug 2013 18:58:32
[3] 08 Aug 2013 19:16:16
Then, when indexing is clear to you, you can eliminate intermediary steps and write more concise code. With that, you should be able to manage your indexing properly. Let me know if you need more information.
Cheers,
Cedric
Sean
Sean on 14 Aug 2013
Thank you so much Cedric! Your answers are great; informative and easy to follow. I think I can figure it out from here. I will definitely let you know if I run into any more complications.
Cheers, Sean

Sign in to comment.

More Answers (1)

Walter Roberson
Walter Roberson on 10 Aug 2013
'%s GMT' does not cause the '%s' to "back up" as necessary so that there the GMT can be there. %s reads as far as possible considering the whitespace and delimiter settings.
% 1375989376,"Thu, 08 Aug 2013 19:16:16 GMT",digit_span_crbi,0,0,0,768,2,5,5 objects,14,0
fmt = '%d%s%d%s%d%s%s%s%d%d%d%d%d%d%d%s%d%d'
fid = fopen(''BrainTEST.csv', 'rt');
datacell = textscan(fid, fmt, 'delimiter', ',', 'headerlines', 1);
fclose(fid);
timestamps = datacell{1};
timecompleteds = strcat(datacell{3}, {' '}, datacell{4}, {' '}, datacell{5}, {' '}, datacell{6});
tasks = datacell{8};
% and so on

Categories

Find more on Characters and Strings in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!