Thread Subject: Text file with numbers and text, want to just read numbers

Subject: Text file with numbers and text, want to just read numbers

From: Matt

Date: 3 Jul, 2008 20:18:01

Message: 1 of 5

I have a series of text files I want to read. The files
begin with header information. The number of rows of
header information changes, there is no commenting symbol
such as %%, and the header information sometimes has
numbers such as a date or a time. I want to read in the
number and only the numbers.

This is an example of a file: (Again, the header info
changes, different # of lines, different text)

Vector Signal [Measurement4]: 09/27/2006, 09:05:24
Units: EU
Time Real
            0 0.053732
     9.6e-005 0.11968
     0.000192 0.085483
     0.000288 0.11968
     0.000384 0.18562

Currently I have Matlab code something like this:

[filename, pathname, filterindex] = uigetfile
({'*.txt','Text file (*.txt)';'*.*','All Files
(*.*)'},'Pick Time History File');
fid = fopen(filename);
C_text = textscan(fid,'%s');
fclose(fid);


I tried to sort out the header from C_text by creating a
loop searching for numbers. Example, seeing if the
absolute value is greater than one. This method failed
when it ran into dates and times.

Is there a better way? I've been trying to use textscan,
but I have also been looking at fscan and textread. I
think there may be some formatting with C_text that could
work, but I couldn't get it to work.





Subject: Text file with numbers and text, want to just read numbers

From: Paul

Date: 4 Jul, 2008 06:55:06

Message: 2 of 5

"Matt " <matthew.r.duncan@boeing.com> wrote in message
<g4jc5p$341$1@fred.mathworks.com>...
> I have a series of text files I want to read. The files
> begin with header information. The number of rows of
> header information changes, there is no commenting symbol
> such as %%, and the header information sometimes has
> numbers such as a date or a time. I want to read in the
> number and only the numbers.
>
> This is an example of a file: (Again, the header info
> changes, different # of lines, different text)
>
> Vector Signal [Measurement4]: 09/27/2006, 09:05:24
> Units: EU
> Time Real
> 0 0.053732
> 9.6e-005 0.11968
> 0.000192 0.085483
> 0.000288 0.11968
> 0.000384 0.18562
>
> Currently I have Matlab code something like this:
>
> [filename, pathname, filterindex] = uigetfile
> ({'*.txt','Text file (*.txt)';'*.*','All Files
> (*.*)'},'Pick Time History File');
> fid = fopen(filename);
> C_text = textscan(fid,'%s');
> fclose(fid);
>
>
> I tried to sort out the header from C_text by creating a
> loop searching for numbers. Example, seeing if the
> absolute value is greater than one. This method failed
> when it ran into dates and times.
>
> Is there a better way? I've been trying to use textscan,
> but I have also been looking at fscan and textread. I
> think there may be some formatting with C_text that could
> work, but I couldn't get it to work.
>
>
>
>
>

I would read the headers with fgetl and then decide if the
first character is a letter, which indicates a header line
in your example. Once past the headers, I would then read
the data with a textscan.

Subject: Text file with numbers and text, want to just read numbers

From: Andres

Date: 4 Jul, 2008 07:24:02

Message: 3 of 5

"Matt " <matthew.r.duncan@boeing.com> wrote in message
<g4jc5p$341$1@fred.mathworks.com>...
> I have a series of text files I want to read. The files
> begin with header information. The number of rows of
> header information changes, there is no commenting symbol
> such as %%, and the header information sometimes has
> numbers such as a date or a time. I want to read in the
> number and only the numbers.
>
> This is an example of a file: (Again, the header info
> changes, different # of lines, different text)
>
> Vector Signal [Measurement4]: 09/27/2006, 09:05:24
> Units: EU
> Time Real
> 0 0.053732
> 9.6e-005 0.11968
> 0.000192 0.085483
> 0.000288 0.11968
> 0.000384 0.18562
>
> [..]

Hi,
you may try txt2mat from the file exchange which heavily
uses regular expressions to separate the header from the
numeric data (if the user does not supply the number of
header lines). It returns this data as a numeric matrix
variable "A" and the header lines as a string "hl" if you
type

[A,ffn,nh,SR,hl] = txt2mat('c:\myFile.txt');

You can then parse "hl" separately to extract additional
numbers.

Regards
Andres

Subject: Text file with numbers and text, want to just read numbers

From: Jos

Date: 4 Jul, 2008 09:32:02

Message: 4 of 5

"Matt " <matthew.r.duncan@boeing.com> wrote in message
<g4jc5p$341$1@fred.mathworks.com>...
> I have a series of text files I want to read. The files
> begin with header information. The number of rows of
> header information changes, there is no commenting symbol
> such as %%, and the header information sometimes has
> numbers such as a date or a time. I want to read in the
> number and only the numbers.
>
> This is an example of a file: (Again, the header info
> changes, different # of lines, different text)
>
> Vector Signal [Measurement4]: 09/27/2006, 09:05:24
> Units: EU
> Time Real
> 0 0.053732
> 9.6e-005 0.11968
> 0.000192 0.085483
> 0.000288 0.11968
> 0.000384 0.18562
>
> Currently I have Matlab code something like this:
>
> [filename, pathname, filterindex] = uigetfile
> ({'*.txt','Text file (*.txt)';'*.*','All Files
> (*.*)'},'Pick Time History File');
> fid = fopen(filename);
> C_text = textscan(fid,'%s');
> fclose(fid);
>
>
> I tried to sort out the header from C_text by creating a
> loop searching for numbers. Example, seeing if the
> absolute value is greater than one. This method failed
> when it ran into dates and times.
>
> Is there a better way? I've been trying to use textscan,
> but I have also been looking at fscan and textread. I
> think there may be some formatting with C_text that could
> work, but I couldn't get it to work.
>
>
>
>
>

1) read all lines using
   s = textscan(filename,'%s','delimiter','\n') ;

2) determine at which lines data starts, e.g. line X
   This is the real problem!
   p = regexp(s,'[A-Za-z]') will give a cell array with non-
empty cells when a line contains text (but how to deal with
scientific notation ...). See help regexp for more options.
   X = max(find(~cellfun('isempty',p))) + 1

3) read again skipping the first (X-1) lines:
   data = textread(filename,'%f','headerlines',X-1) ;

4)

Subject: Text file with numbers and text, want to just read numbers

From: Andres

Date: 4 Jul, 2008 10:52:01

Message: 5 of 5

"Jos " <DELjos@jasenDEL.nl> wrote in message
<g4kqmi$ook$1@fred.mathworks.com>...
[..]
> 2) determine at which lines data starts, e.g. line X
> This is the real problem!
> p = regexp(s,'[A-Za-z]') will give a cell array with
non-
> empty cells when a line contains text (but how to deal
with
> scientific notation ...). See help regexp for more
options.
> X = max(find(~cellfun('isempty',p))) + 1
>
[..]

That's the crux! Depending on what you know in advance
about the file, you have to care about scientific notation,
as you mentioned, final header lines containing delimiters,
signs, whitespaces or other special characters only or just
being empty, and possibly NaN and Inf occurrences in the
number section.
All in all, this can result in quite a lengthy piece of
code.

Andres

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
header lines Andres 4 Jul, 2008 06:55:10
ascii import Andres 4 Jul, 2008 06:55:10
textscan Matt 3 Jul, 2008 16:20:06
rssFeed for this Thread
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com