Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Reading text file multiple delimters and filtering data

Subject: Reading text file multiple delimters and filtering data

From: Jane

Date: 24 Feb, 2010 20:27:04

Message: 1 of 5

Hi,
I searched the archives yesterday and found something relevant, but can't find it today, I couldn't get the recommended solutions to work anyway.

I have a huge text file with approx 30000 rows of data in. The data is experimental and comes from a number of pieces of recording equipment, each of which has a unique code, which is specified at the start of each row in the text file.

There is no pattern to the order of data, so I need to be able to read all the lines and select those with an initial string '$BD3W' for example.

The data for the other pieces of equipment each contain a different number of fields, which is one of the reasons why I don't simply want to read the entire file and then search for the rows I want. The other reason being memory, I am running this code on a not too snazzy laptop.

Finally, most variables are seperated by commas, but each row also has one * delimiter. From looking at the thread I've since lost, I've tried the following

[fname, pname, ind] = uigetfile('*.gvl');
fid = fopen([pname,fname]);

start_string = '$GF3DW';

headerlines = 0;
line_no = 1;
while feof(fid) == 0
    tline{line_no} = fgetl(fid);
    line_no = line_no+1;
end
index = strmatch(start_string,tline(headerlines+1:length(tline)))+headerlines;
S = tline(index);
C = textscan (S(:), '%s %f %f %f %f %s', 'delimiter',',');

fclose(fid);

But this causes an error, I originally had
C = textscan (S, '%s %f %f %f %f %s', 'delimiter',',');
but only the line with the first occurance of the start_string was output.

As you can see I haven't tackled the multiply delimiter problem yet.

Any help would be gratefully received. Also, does anyone know if there are plans to increase the functionality of textscan to include a search?

Many thanks to all those who has managed to make their way to the end of this rambling post.

Jane

Subject: Reading text file multiple delimters and filtering data

From: Rune Allnor

Date: 24 Feb, 2010 21:21:11

Message: 2 of 5

On 24 Feb, 21:27, "Jane " <j.l.te...@hotmail.co.uk> wrote:
> Hi,
> I searched the archives yesterday and found something relevant, but can't find it today, I couldn't get the recommended solutions to work anyway.
>
> I have a huge text file with approx 30000 rows of data in.  The data is experimental and comes from a number of pieces of recording equipment, each of which has a unique code, which is specified at the start of each row in the text file.  
>
> There is no pattern to the order of data, so I need to be able to read all the lines and select those with an initial string '$BD3W' for example.
>
> The data for the other pieces of equipment each contain a different number of fields, which is one of the reasons why I don't simply want to read the entire file and then search for the rows I want.  The other reason being memory, I am running this code on a not too snazzy laptop.
>
> Finally, most variables are seperated by commas, but each row also has one * delimiter.  From looking at the thread I've since lost, I've tried the following
>
> [fname, pname, ind] = uigetfile('*.gvl');
> fid = fopen([pname,fname]);
>
> start_string = '$GF3DW';
>
> headerlines = 0;
> line_no = 1;
> while feof(fid) == 0
>     tline{line_no} = fgetl(fid);
>     line_no = line_no+1;
> end
> index = strmatch(start_string,tline(headerlines+1:length(tline)))+headerlines;
> S = tline(index);
> C = textscan (S(:), '%s %f %f %f %f %s', 'delimiter',',');
>
> fclose(fid);
>
> But this causes an error, I originally had
> C = textscan (S, '%s %f %f %f %f %s', 'delimiter',',');
> but only the line with the first occurance of the start_string was output.
>
> As you can see I haven't tackled the multiply delimiter problem yet.
>
> Any help would be gratefully received.  Also, does anyone know if there are plans to increase the functionality of textscan to include a search?

It seems your problem is flexing enough that the canned
routines might have a hard time.

However, there is a tool for exactly the type of problem
you have: Regular expressions.

The regular expressions take some time and effort to learn,
but once you start to get the hang of it, the kind of problems
you are struggling with is a walk in the park.

If you have an academic bookstore within walking distance, that
serves a computer science department, chances are that they
will have this one in stock:

http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=books&qid=1267046374&sr=8-1

Well worth the expense.

Rune

Subject: Reading text file multiple delimters and filtering data

From: Andres

Date: 24 Feb, 2010 22:26:19

Message: 3 of 5

Hi,
have a look at txt2mat (-> file exchange), it has some line filtering options that will probably help you to get in the lines you want quite quickly (compared to a loop based on fgetl) if they can just be identified by keywords like the string '$BD3W'. You may add post-processing with textscan - and perhaps regular expressions, as Rune mentioned, e.g. if some of the asterisks act as delimiters and others do not (check txt2mat's replacement options, too).

Subject: Reading text file multiple delimters and filtering data

From: Walter Roberson

Date: 24 Feb, 2010 23:57:18

Message: 4 of 5

Jane wrote:

> I have a huge text file with approx 30000 rows of data in. The data is
> experimental and comes from a number of pieces of recording equipment,
> each of which has a unique code, which is specified at the start of each
> row in the text file.
> There is no pattern to the order of data, so I need to be able to read
> all the lines and select those with an initial string '$BD3W' for example.

Consider using perl from within Matlab. perl is shipped with Matlab.
The main nuisance about mixing perl and Matlab is that the -e perl option is
not supported, so you will have to write your perl code into a file and invoke
perl with that as the name of the source file.

A sample perl script to return all the lines that begin with '$BD3W' would be:


while (<>) {print if /^\$BD3W/;}


On the other hand, what will get returned to you from the perl call will be a
string of text with newline delimiters between the lines, not a cell array of
lines. But you can easily convert this to a cell array by wrapping the perl
call in

OutputCell = textscan(perl('Perl Parameters Here'), '%s', 'Delimiter', '');

Subject: Reading text file multiple delimters and filtering data

From: Jane

Date: 25 Feb, 2010 17:45:22

Message: 5 of 5

Thanks for all your help. I'm dissapointed there is no easy solution.

I guess I shhould try and find out what regular expressions are, in the mean time I'll look at that txt2mat (sp?) file suggested.

Many thanks

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us