How to eliminate texts from the data file?

Asked by aneps
on 19 Sep 2013
Latest activity Commented on by aneps
on 6 May 2014

I have a .dat file which has three columns of numbers (X,Y and Z). An example file is attached here. In the file there are some texts which prevents me from calling to matlab. In the beginning of the data, it is written "Measurement contains X,Y,Z)" and the next line "Injection #1"... after some rows of values it again says "Injection #2" .. again after some rows "Injection #3" so and so.... I want to eliminate these texts and call the whose rows in each column as X, Y and Z. Can anyone please tell me how to call the file and eliminate those texts in it?

PS: The attached one is just an example. In the real files, I have more than thousands of rows. So, impossible to delete manually.


Does the file contain the first line "This measurement contains x,y and time" or is it something that you added? Do you need to read all data entries together or do you need to keep track of the injection number?

The file contains the text "This measurement contains x,y and time". An example file is attached with the question. Thank you.


1 Answer

Answer by Cedric Wannaz
on 19 Sep 2013
Edited by Cedric Wannaz
on 19 Sep 2013
 Accepted answer

If the first line "This measurement contains x,y and time" is not present (if you added it for us to understand), you can do something as simple as:

 buffer = fileread('Test.txt') ;
 buffer = regexprep( buffer, 'i.*?\n', '\n') ;
 buffer = textscan(buffer, '%f %f %f') ;
 data   = [buffer{:}] ;

Let me know if you have to eliminate the first line. We can replace FILEREAD with a few lines involving FOPEN/FGETL/FREAD, which skip the first line and read the rest of the file.

EDIT after your comment.

 % - Read file, skip 1st line.
 fid = fopen('Test.txt', 'r') ;
 fgetl(fid) ;                                  % Skip 1st line.
 buffer = fread(fid, [1,Inf], '*char') ;
 fclose(fid) ;
 % - Eliminate 'injection' headers.
 buffer = regexprep( buffer, 'i.*?\n', '\n') ;
 % - Convert to numeric type.
 data = textscan(buffer, '%f %f %f') ;

Then you can extract columns they way you need them.. you can build vectors x, y, t as follows:

 x = data{1} ;
 y = data{2} ;
 t = data{3} ;

or, if you prefer, build a 3 columns array as follows

 data = [data{:}] ;


Hello, I am stuck with a few more things related to the same question. May I ask your help please? In the above question, how can I get the number of events (each x,y and time is an event, means each row is an event) after each 'injections'?

Hi Aneps, if you still have this question, please send me an email, or post a new question and send me a link by email so I see it.

Well, what about the following actually? Look at data and counts, they contain all injections data by block and counts of events.

 content = fileread( 'Test.txt' ) ;
 blocks  = regexp( content, '#:\d+\s*([^i]+)', 'tokens' ) ;
 data    = cellfun( @(c) reshape(sscanf(c{1}, '%f'), 3, []).', blocks, ...
                    'UniformOutput', false ) ;
 counts  = cellfun( @(d) size(d, 1), data ) ;

Thanks a lot. I did program as you suggested, but it is not giving me what I am looking for. The first program you gave me is working wonderful, it reads the data the way I want and I made a long program from there

 data = [data{:}] ;

I would like to add a few more lines to get more information. I have posted it as a new question (<>. Thanks a lot.

