Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Mixed format strings

Subject: Mixed format strings

From: Baalzamon

Date: 12 Apr, 2010 14:36:04

Message: 1 of 8

Hi I have a tricksome problem (for me anyway)
I am using fgetl to read lines from a text file. However I only want the number parts of it. Is there a way to just extract the numbers before the ascii chars at then end and then use str2num to make a matrix?
     .0124 200 900 260 600 .2140 .1586 .3179 80.0000 10.0000 10.0000 .0000 -.0180 .0213 301.0000 .1000 .4100 .3700 .4100 .3700 4.6000 9.4000 100.0000 1.202 .077 405.00 337 99.36 .1176 .0000 100.0000 .0000 ‚¼kà|@@ .5562 296.1225 .0000**********ÙHjtµQA

This is an example of the line from my file. I don't want the bits from the comma (by the quarter symbol). However each line is different so the chars won't be the same.

Subject: Mixed format strings

From: dpb

Date: 12 Apr, 2010 14:50:51

Message: 2 of 8

Baalzamon wrote:
> Hi I have a tricksome problem (for me anyway)
> I am using fgetl to read lines from a text file. However I only want the
> number parts of it. Is there a way to just extract the numbers before
> the ascii chars at then end and then use str2num to make a matrix?
> .0124 200 900 260 600 .2140 .1586 .3179 80.0000
> 10.0000 10.0000 .0000 -.0180 .0213 301.0000 .1000
> .4100 .3700 .4100 .3700 4.6000 9.4000 100.0000 1.202
> .077 405.00 337 99.36 .1176 .0000 100.0000 .0000
> ‚¼kà|@@ .5562 296.1225 .0000**********ÙHjtµQA
>
> This is an example of the line from my file. I don't want the bits from
> the comma (by the quarter symbol). However each line is different so the
> chars won't be the same.

The font you used doesn't render well so not sure where there is a comma
  but if there are a fixed number of fields (that is each record is a
fixed number of records and the numeric fields are in the same positions
you should be able to use textscan() or textread() and a formatting
string otoo

fmt=[repmat('%f ',1,N) '%*s' repmat('%f ',1,M) '%*s'];

where N is the number of numeric fields before the first string to
ignore and M the second set of numeric fields and another %*s to finish
off the record.

--

Subject: Mixed format strings

From: Baalzamon

Date: 12 Apr, 2010 18:17:04

Message: 3 of 8

Ah cheers. However the number of fields can vary as sometimes the fields get merged together...If I add some form of loop (test statement loop etc) should this get around it?

Subject: Mixed format strings

From: dpb

Date: 12 Apr, 2010 18:36:50

Message: 4 of 8

Baalzamon wrote:
> Ah cheers. However the number of fields can vary as sometimes the fields
> get merged together...If I add some form of loop (test statement loop
> etc) should this get around it?

What does "fields get merged together" mean? If it's a fixed-width
field that doesn't always have room for a blank then simply create the
proper width format field for each.

If there are actually a variable number of fields/record then you'll
have to parse each line individually and determine what is/isn't a valid
number to convert. I'll not delve into that until confirm that's the
actual problem.

--

Subject: Mixed format strings

From: Branko

Date: 12 Apr, 2010 21:19:05

Message: 5 of 8

"Baalzamon " <baalzamon_moridin@yahoo.com> wrote in message <hpvb4k$hep$1@fred.mathworks.com>...
> Hi I have a tricksome problem (for me anyway)
> I am using fgetl to read lines from a text file. However I only want the number parts of it. Is there a way to just extract the numbers before the ascii chars at then end and then use str2num to make a matrix?
> .0124 200 900 260 600 .2140 .1586 .3179 80.0000 10.0000 10.0000 .0000 -.0180 .0213 301.0000 .1000 .4100 .3700 .4100 .3700 4.6000 9.4000 100.0000 1.202 .077 405.00 337 99.36 .1176 .0000 100.0000 .0000 ‚¼kà|@@ .5562 296.1225 .0000**********ÙHjtµQA
>
> This is an example of the line from my file. I don't want the bits from the comma (by the quarter symbol). However each line is different so the chars won't be the same.

doc regexp

Branko

Subject: Mixed format strings

From: Baalzamon

Date: 12 Apr, 2010 21:25:07

Message: 6 of 8

Ah sorry. I have another program (a) which stores parameters in tabulated form. However the program that does this hasn't forseen that some of the numbers in a field can be larger than the allowed size of the field. (Much like when you have a 8 digit number in a cell only big enough for say 5 digits). As a result it fills the field with stars like in the example shown previuosly.
In other instances lets say the contents of two adjacent cells are 260 and 1000. Then the output should be [200 1000] however if like mentioned before one of the numbers is too large the result becomes [2001000] as such the fields are merged.
In a previous m file i wrote this problem was somewhat solved as any rows that had merged cells also contained stars and these were discarded. Now, after some tinkering with the program (a), these are now 'reliable' data sets. So my previous method is no longer valid.
Below are two examples of lines from my file
     .0124 200 900 260 600 .2140 .1586 .3179 80.0000 10.0000 10.0000 .0000 -.0180 .0213 301.0000 .1000 .4100 .3500 .4100 .3500 4.7000 9.4000 100.0000 2.407 .077 811.32 337 100.00 .1181 .0000 100.0000 .0000 V$09Û¤B@ .5562 296.1021 .0000**********ªûßäƯQA

In this case the stars are joined to the .0000 cell.

C@ .5562 296.0874 .0000**********(Ÿ~Ç„¿QA
This is another line which I wish to discard.
For the first example this is a somewhat good set. I would be happy to just extract the cells before the numbers turned into ascii and letters. I'll be happy to show you a more complete text file if this would clarify my need.

Much thanks for the reply btw

Subject: Mixed format strings

From: dpb

Date: 13 Apr, 2010 00:11:18

Message: 7 of 8

Baalzamon wrote:
> Ah sorry. I have another program (a) which stores parameters in
> tabulated form. However the program that does this hasn't forseen that
> some of the numbers in a field can be larger than the allowed size of
> the field. (Much like when you have a 8 digit number in a cell only big
> enough for say 5 digits). As a result it fills the field with stars
> like in the example shown previuosly. In other instances lets say the
> contents of two adjacent cells are 260 and 1000. Then the output should
> be [200 1000] however if like mentioned before one of the numbers is
> too large the result becomes [2001000] as such the fields are merged. In
> a previous m file i wrote this problem was somewhat solved as any rows
> that had merged cells also contained stars and these were discarded.
> Now, after some tinkering with the program (a), these are now 'reliable'
> data sets. So my previous method is no longer valid. Below are two
> examples of lines from my file

Well, I'd start w/ the beginning file and use the proper field width
since they are fixed width fields. That solves the problem of valid
data "merged" -- they _aren't_ merged, they just are each filling the field.

But, the case of overflow creates a problem because, I presume, it can
occur in any column on any given record. Therefore, I think you'll have
to first use fgetl() to read a line, find whether there is or isn't an
asterisk and parse each line based on that.

But, if it were at all possible, I'd fix (or cause to be fixed) the
original program to quit making useless data sets.

--

Subject: Mixed format strings

From: Baalzamon

Date: 13 Apr, 2010 18:33:08

Message: 8 of 8

Ah thanks again people.

Mission solved. I used fgetl, again, but with some different conditional tests.
Mechanically found when the ascii came in by using length and then made sure all rows were greater than this (gets rid of terminated fits) and made sure that just before this the substring was 0000.
Then to get rid of rows where the fits had doubled I made sure the rows were smaller than a max value i chose (slightly longer than good fits).
Then to avoid str2num from crashing filtered out for stars in specific places by using strcmp.

On some small points...once the code worked some modifications were made.
Intial file 42Mb, 150 000 rowsand Cleaned matrix 33000 rows
1:) Initially read line by line, convert sub string into number array and place certain cell contents in the correct places I wanted. No pre allocation of array.

 Time taken 150s

2:) Now open file and count rows. Close and make array of zeros m by n. Re open file and follow steps above but into pre allocated matrix. Then used logical indexing to extract submatrix of rows with populated entries
q=output(output(:,1)>1,:);
time taken 35s

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us