Trouble using textread to ignore certain elements in a .csv file (new to matlab!)

Question

Sean on 10 Aug 2013

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/84481-trouble-using-textread-to-ignore-certain-elements-in-a-csv-file-new-to-matlab

Hi all,

I am very new to MatLab, and need some help using textread to read a .csv file, ignoring certain elements. Below is a small example of the data.

timestamp,time completed,task,set_no,x,y,time trained,stars,guess,guess (display),trials complete,trials remaining

1375988040,"Thu, 08 Aug 2013 18:54:00 GMT",aud_spatial_match_crbi,10,3,0,255,4,6,6 syls,11,0 1375988312,"Thu, 08 Aug 2013 18:58:32 GMT",aud_spatial_match_crbi,10,3,0,262,5,6,6 syls,6,5 1375989376,"Thu, 08 Aug 2013 19:16:16 GMT",digit_span_crbi,0,0,0,768,2,5,5 objects,14,0

My goal is to have textread ignore GMT" in the dates/time, since matlab seems to not recognize this (08 Aug 2013 18:54:00 GMT") as a date/time. There is also the issue of the preceding "Thu, which MatLab considers its own field due to the extra comma. This would also be great to get rid of. In addition, I'm also trying to ignore the first line.

Here is what I've attempted so far: [timestamp, thu, timecompl, task, setnum, x, y, timetrained, stars, guess, guessdisp, trialscomp, trialsrem] = textread('BrainTEST.csv','%d %s GMT"%s %s %d %d %d %d %d %d %s %d %d','delimiter',',','headerlines',1,'whitespace','')

The variable "thu" is me trying to accommodate the extra comma.

I have been confronted with the following errors:

Error using dataread Trouble reading literal string from file (row 1, field 3) ==> 08 Aug 2013 18:54:00 GMT",aud_spatial_

Error in textread (line 175) [varargout{1:nlhs}]=dataread('file',varargin{:}); %#ok<REMFF1>

Any suggestions would be greatly appreciated!

Cheers, Sean C.

1 Comment
Show -1 older commentsHide -1 older comments

per isakson on 10 Aug 2013

Edited: per isakson on 10 Aug 2013

Use textscan rather than textread, since doc says so.

Why not read the full date string with %q and remove the leading day and trailing GMT in a separate step?

Sign in to comment.

Sign in to answer this question.

Answer 1

Cedric on 10 Aug 2013

1
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/84481-trouble-using-textread-to-ignore-certain-elements-in-a-csv-file-new-to-matlab#answer_94047

Edited: Cedric on 10 Aug 2013

Open in MATLAB Online

You could go for

 buffer = fileread('BrainTEST.csv') ;
 data = textscan(buffer,'%d %s %s %s %d %d %d %d %d %d %s %d %d', ...
                 'delimiter', ',', 'headerlines', 1, 'whitespace', '') ;

and then instead of processing e.g.

data{3}{k}

which would be ' 08 Aug 2013 18:54:00 GMT"' in your example for k=1, you process

data{3}{k}(2:end-5)

which is '08 Aug 2013 18:54:00'.

EDIT: just as an example, here is one way to tackle this with named regexp.

 buffer = fileread('BrainTEST.csv') ;
 data = regexp(buffer, '(?<timestamp>\d+),.{6}(?<datetime>[^G]+)GMT",(?<task>[^,]+),(?<setnum>\d+),(?<x>\d+),(?<y>\d+),(?<timeTrained>\d+),(?<stars>\d+),(?<guess>\d+),(?<guessDisplay>[^,]+),(?<trialsComplete>\d+),(?<trialsRemaining>\d+)', 'names') ;

which generates a struct array:

 >> data(1)
 ans = 
          timestamp: '1375988040'
           datetime: '08 Aug 2013 18:54:00 '
               task: 'aud_spatial_match_crbi'
             setnum: '10'
                  x: '3'
                  y: '0'
        timeTrained: '255'
              stars: '4'
              guess: '6'
       guessDisplay: '6 syls'
     trialsComplete: '11'
    trialsRemaining: '0'
 >> data(2)
 ans = 
          timestamp: '1375988312'
           datetime: '08 Aug 2013 18:58:32 '
               task: 'aud_spatial_match_crbi'
             setnum: '10'
                  x: '3'
                  y: '0'
        timeTrained: '262'
              stars: '5'
              guess: '6'
       guessDisplay: '6 syls'
     trialsComplete: '6'
    trialsRemaining: '5'

etc ...

but don't go for this solution as it is way less efficient than the first one in this case of very well structured file. This would be a good solution if your file was a mix of structured and unstructured data.

3 Comments
Show 1 older commentHide 1 older comment

Cedric on 13 Aug 2013

Edited: Cedric on 13 Aug 2013

Open in MATLAB Online

You should actually use functions like SIZE and CLASS to investigate the nature of the objects that you are working with. Just as a reminder, cell arrays are arrays of cells and if C is a cell array, C(1) is "block indexing" cell 1 of C, which means returning a cell, whereas C{1} is indexing the cell and returning its content. I will discuss an example based on the following file content:

A header
1375988040,"Thu, 08 Aug 2013 18:54:00 GMT",aud_spatial_match_crbi,10,3,0,255,4,6,6 syls,11,0
1375988312,"Thu, 08 Aug 2013 18:58:32 GMT",aud_spatial_match_crbi,10,3,0,262,5,6,6 syls,6,5
1375989376,"Thu, 08 Aug 2013 19:16:16 GMT",digit_span_crbi,0,0,0,768,2,5,5 objects,14,0

Executing

 buffer = fileread('BrainTEST.csv') ;
 data = textscan(buffer,'%d %s %s %s %d %d %d %d %d %d %s %d %d', ...
                 'delimiter', ',', 'headerlines', 1, 'whitespace', '') ;

builds the cell array data with the following content, that we will study..

 >> data
 data = 
  Columns 1 through 6
    [3x1 int32] {3x1 cell} {3x1 cell} {3x1 cell} [3x1 int32] [3x1 int32]
  Columns 7 through 12
    [3x1 int32] [3x1 int32] [3x1 int32] [3x1 int32] {3x1 cell} [3x1 int32]
  Column 13
    [3x1 int32]
 >> class(data)
 ans =
 cell
 >> size(data)
 ans =
     1    13

So data is a row cell array made of 13 cells.

 >> class(data(1))           % () indexing => return cell.
 ans =
 cell
 >> class(data{1})           % {} indexing => return cell content.
 ans =
 int32
 >> size(data{1})
 ans =
     3     1

So the content of cell 1 is a 3x1 numeric array of int32.

 >> data{1}
 ans =
  1375988040
  1375988312
  1375989376

Note that as cell arrays can be nested, sometimes the content of a cell is itself a cell array, as illustrated with the time/date string column:

 >> class(data{3})         % Content of cell 3 of data is a cell array.
 ans =
 cell
 >> class(data{3}(1))      % Cell 1 of this cell array is a cell.
 ans =
 cell
 >> class(data{3}{1})      % Content this cell is a string.
 ans =
 char

The best that you can do when indexing becomes complex is to create intermediary variables, e.g.

 timeDateCellArray = data{3} ;
 for k = 1 : length(timeDateCellArray)
      timeDate = timeDateCellArray{k} ;
      timeDate_cleaned = timeDate(2:end-5) ;
      fprintf('[%d] %s\n', k, timeDate_cleaned) ;
 end

Running this outputs:

 [1] 08 Aug 2013 18:54:00
 [2] 08 Aug 2013 18:58:32
 [3] 08 Aug 2013 19:16:16

Then, when indexing is clear to you, you can eliminate intermediary steps and write more concise code. With that, you should be able to manage your indexing properly. Let me know if you need more information.

Cheers,

Cedric

Sean on 14 Aug 2013

Thank you so much Cedric! Your answers are great; informative and easy to follow. I think I can figure it out from here. I will definitely let you know if I run into any more complications.

Cheers, Sean

Sign in to comment.

Answer 2

Walter Roberson on 10 Aug 2013

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/84481-trouble-using-textread-to-ignore-certain-elements-in-a-csv-file-new-to-matlab#answer_94048

Open in MATLAB Online

'%s GMT' does not cause the '%s' to "back up" as necessary so that there the GMT can be there. %s reads as far as possible considering the whitespace and delimiter settings.

% 1375989376,"Thu, 08 Aug 2013 19:16:16 GMT",digit_span_crbi,0,0,0,768,2,5,5 objects,14,0
fmt = '%d%s%d%s%d%s%s%s%d%d%d%d%d%d%d%s%d%d'
fid = fopen(''BrainTEST.csv', 'rt');
datacell = textscan(fid, fmt, 'delimiter', ',', 'headerlines', 1);
fclose(fid);
timestamps = datacell{1};
timecompleteds = strcat(datacell{3}, {' '}, datacell{4}, {' '}, datacell{5}, {' '}, datacell{6});
tasks = datacell{8};
% and so on

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Trouble using textread to ignore certain elements in a .csv file (new to matlab!)

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Trouble using textread to ignore certain elements in a .csv file (new to matlab!)

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments