Skip to Main Content Skip to Search
Login
File Exchange
MATLAB Newsgroup
Link Exchange
  Blogs  
 Contest 
MathWorks.com

Thread Subject: Reading data into matlab

Subject: Reading data into matlab

From: Felipe Sediles

Date: 26 Jul, 2007 22:40:27

Message: 1 of 19

Hi, I'm trying to read data off of a file that has more than just one set of data, and more than just data. I want to be able to tell matlab where to start reading without having to know before hand how many headerlines to skip. For example, my data file looks something like:

*Heading
** Job name: STB_model33 Model name: Model-1
** Geometry: half thickness = 5.800000200, width = 50.799999200, beta = 1/16th, gamma = 0.000000000
*Preprint, echo=No, history=NO, contact=NO
**
**Parts
**
*Part, name=Part-1
*End Part
**
*Part, name=torque_block2
*End Part
**
**
** ASSEMBLY
**
*Assembly, name=Assembly
**
*Instance, name=Part-1-1, part=Part-1
*Node
1, -31.750000000, 0.000000000, 3.174999960
2, -31.750000000, 5.800000200, 3.174999960

***********more data************

96615, 5.352380838, -5.800000200, 2.381249900
*Element, type=C3D20

Where the last line of data is the line starting with 96615. At present I'm going into this file and counting the number of headerlines, and the number of total lines of data. This gets tedious and I will eventually have to read files where these values (the headerlines and total data lines) vary. I basically want to use some kind of "landmark" in the data file to tell matlab where to begin and where to end. Please help. At present I'm using the textread function to read my data, so if possible, a solution using that function would be greatly appreciated!

Subject: Re: Reading data into matlab

From: roberson@ibd.nrc-cnrc.gc.ca (Walter Roberson)

Date: 26 Jul, 2007 23:21:08

Message: 2 of 19

In article <f8b7sr$gbt$1@fred.mathworks.com>,
Felipe Sediles <felipe.sediles.nospam@mathworks.com> wrote:
>Hi, I'm trying to read data off of a file that has more than just one
>set of data, and more than just data. I want to be able to tell matlab
>where to start reading without having to know before hand how many
>headerlines to skip. For example, my data file looks something like:

>*Heading
>** Job name: STB_model33 Model name: Model-1
>** Geometry: half thickness = 5.800000200, width = 50.799999200, beta = 1/16th, gamma = 0.000000000
>*Preprint, echo=No, history=NO, contact=NO
>**
>**Parts
>**
>*Part, name=Part-1
>*End Part
>**
>*Part, name=torque_block2
>*End Part
>**
>**
>** ASSEMBLY
>**
>*Assembly, name=Assembly
>**
>*Instance, name=Part-1-1, part=Part-1
>*Node
>1, -31.750000000, 0.000000000, 3.174999960
>2, -31.750000000, 5.800000200, 3.174999960
>
>***********more data************
>
>96615, 5.352380838, -5.800000200, 2.381249900
>*Element, type=C3D20

For your purposes, would it suffice to set '*' to be one
of the comment characters? That would eliminate all of the header lines
you show, and would eliminate the line after the data; on the other
hand, if there are other lines further down in the file that do not
start with * then it would calmly read those.


Myself, in a case like this, I'd just use a bit of perl or sed
--
  There are some ideas so wrong that only a very intelligent person
  could believe in them. -- George Orwell

Subject: Re: Reading data into matlab

From: Felipe Sediles

Date: 26 Jul, 2007 23:28:56

Message: 3 of 19

>>....

Yes, there are other lines later in the file that don't start with *, so I guess that won't work. Thanks, though! Wish I knew those languages!

Subject: Reading data into matlab

From: us

Date: 27 Jul, 2007 01:57:30

Message: 4 of 19

Felipe Sediles:
<SNIP wants to import formatted text

> *Heading
> ...
> *Node
> 1, -31.750000000, 0.000000000, 3.174999960
> 2, -31.750000000, 5.800000200, 3.174999960
> ***********more data************
> 96615, 5.352380838, -5.800000200, 2.381249900
> *Element, type=C3D20
> Where the last line of data is the line starting with 96615...

an easy task - as long as you reassure CSSM that THIS is true:
a data line consists of exactly four (4) numbers separated by a <,>...
we just want to make sure that you do not waste our time by later modifying your format, which happens much too often in this NG...

us

Subject: Reading data into matlab

From: Felipe Sediles

Date: 27 Jul, 2007 02:17:02

Message: 5 of 19

"us " <us@neurol.unizh.ch> wrote in message <f8bjea$jnq$1@fred.mathworks.com>...
> Felipe Sediles:
> <SNIP wants to import formatted text
>
> > *Heading
> > ...
> > *Node
> > 1, -31.750000000, 0.000000000, 3.174999960
> > 2, -31.750000000, 5.800000200, 3.174999960
> > ***********more data************
> > 96615, 5.352380838, -5.800000200, 2.381249900
> > *Element, type=C3D20
> > Where the last line of data is the line starting with 96615...
>
> an easy task - as long as you reassure CSSM that THIS is true:
> a data line consists of exactly four (4) numbers separated by a <,>...
> we just want to make sure that you do not waste our time by later modifying your format, which happens much too often in this NG...
>
> us

Yes, what you ask about is true, with one qualifier: there are two separate blocks of data in this format, and I want to read each of those two blocks separately. In addition, there are two other blocks of data, in a different format, that I'd like to treat in a similar fashion (that is, read these blocks of data separately). I hope I can modify what is recommended for reading this particular form of data to be able to read these other two blocks of data. I'll see about attaching a sample data file.

Subject: Reading data into matlab

From: Felipe Sediles

Date: 27 Jul, 2007 02:19:50

Message: 6 of 19

"us " <us@neurol.unizh.ch> wrote in message <f8bjea$jnq$1@fred.mathworks.com>...
> Felipe Sediles:
> <SNIP wants to import formatted text
>
> > *Heading
> > ...
> > *Node
> > 1, -31.750000000, 0.000000000, 3.174999960
> > 2, -31.750000000, 5.800000200, 3.174999960
> > ***********more data************
> > 96615, 5.352380838, -5.800000200, 2.381249900
> > *Element, type=C3D20
> > Where the last line of data is the line starting with 96615...
>
> an easy task - as long as you reassure CSSM that THIS is true:
> a data line consists of exactly four (4) numbers separated by a <,>...
> we just want to make sure that you do not waste our time by later modifying your format, which happens much too often in this NG...
>
> us

The answer to your question is yes, with one qualifier: there is another, separate, block of data that is formated in this way within the same file. I'd like to read this block of data as well, separately, though.

Subject: Reading data into matlab

From: us

Date: 27 Jul, 2007 15:43:20

Message: 7 of 19

Felipe Sediles:
<SNIP data import evergreen...

> The answer to your question is yes, with one qualifier: there is another, separate, block of data that is formated in this way within the same file. I'd like to read this block of data as well, separately, though...

one of the solutions is outlined below

% 1) assume this context in a file <foo.txt>
Heading
Node
1, -31.750000000, 0.000000000, 3.174999960
2, -31.750000000, 5.800000200, 3.174999960
*Heading
*Node
10, -31.750000000, 0.000000000, 3.174999960
20, -31.750000000, 5.800000200, 3.174999960
30, -31.750000000, 5.800000200, 3.174999960
this is a test end
300 40 50 60
400,

% the engine
     fnam='foo.txt'; % <- your file!
     a=textread(fnam,'%s','delimiter','','whitespace','');
     an=cell(numel(a),1);
for i=1:numel(a)
     [an{i,1:3}]=sscanf(a{i},'%f,');
end
     ixn=cellfun('isempty',an(:,3));
     ixs=cellfun(@(x) x==4,an(:,2));
     ib=strfind(ixs.',[0,1])+1;
     ie=strfind(ixs.',[1,0]);
if ~isempty(ib)
     nb=numel(ib);
     rb=cell(nb,1);
for i=1:nb
     rb{i,1}=reshape(cat(1,an{ib(i):ie(i),1}).',4,[]).';
end
end
% the result
     type(fnam);
     rb{:}

us

Subject: Reading data into matlab

From: Felipe Sediles

Date: 27 Jul, 2007 16:17:20

Message: 8 of 19

"us " <us@neurol.unizh.ch> wrote in message <f8d3qo$6ov$1@fred.mathworks.com>...
> Felipe Sediles:
> <SNIP data import evergreen...
>
> > The answer to your question is yes, with one qualifier: there is another, separate, block of data that is formated in this way within the same file. I'd like to read this block of data as well, separately, though...
>
> one of the solutions is outlined below
>
> % 1) assume this context in a file <foo.txt>
> Heading
> Node
> 1, -31.750000000, 0.000000000, 3.174999960
> 2, -31.750000000, 5.800000200, 3.174999960
> *Heading
> *Node
> 10, -31.750000000, 0.000000000, 3.174999960
> 20, -31.750000000, 5.800000200, 3.174999960
> 30, -31.750000000, 5.800000200, 3.174999960
> this is a test end
> 300 40 50 60
> 400,
>
> % the engine
> fnam='foo.txt'; % <- your file!
> a=textread(fnam,'%s','delimiter','','whitespace','');
> an=cell(numel(a),1);
> for i=1:numel(a)
> [an{i,1:3}]=sscanf(a{i},'%f,');
> end
> ixn=cellfun('isempty',an(:,3));
> ixs=cellfun(@(x) x==4,an(:,2));
> ib=strfind(ixs.',[0,1])+1;
> ie=strfind(ixs.',[1,0]);
> if ~isempty(ib)
> nb=numel(ib);
> rb=cell(nb,1);
> for i=1:nb
> rb{i,1}=reshape(cat(1,an{ib(i):ie(i),1}).',4,[]).';
> end
> end
> % the result
> type(fnam);
> rb{:}
>
> us
Thanks! I just ran it and I get:

??? Function name must be a string.

Error in ==> data_read at 9
     ixs=cellfun(@(x) x==4,an(:,2));

I've familiar with cellfun so I couldn't figure out how to solve this.

Subject: Reading data into matlab

From: us

Date: 27 Jul, 2007 16:53:57

Message: 9 of 19

Felipe Sediles:
<SNIP down to bad news...

> ??? Function name must be a string.

well, it seems that you do not have the latest ML version...
here (2007a) the output is as expected

     rb{:} % each group in a different cell:
% ans =
% 1 -31.75 0 3.175
% 2 -31.75 5.8 3.175
% ans =
% 10 -31.75 0 3.175
% 20 -31.75 5.8 3.175
% 30 -31.75 5.8 3.175

can you upgrade?
us

Subject: Reading data into matlab

From: Felipe Sediles

Date: 27 Jul, 2007 17:05:19

Message: 10 of 19

"us " <us@neurol.unizh.ch> wrote in message <f8d7v4$bva$1@fred.mathworks.com>...
> Felipe Sediles:
> <SNIP down to bad news...
>
> > ??? Function name must be a string.
>
> well, it seems that you do not have the latest ML version...
> here (2007a) the output is as expected
>
> rb{:} % each group in a different cell:
> % ans =
> % 1 -31.75 0 3.175
> % 2 -31.75 5.8 3.175
> % ans =
> % 10 -31.75 0 3.175
> % 20 -31.75 5.8 3.175
> % 30 -31.75 5.8 3.175
>
> can you upgrade?
> us

No, I'm at a university and version 7.0.4 is what they're running. Can a solution for this version be arrived at?

Subject: Reading data into matlab

From: us

Date: 27 Jul, 2007 17:12:40

Message: 11 of 19

Felipe Sediles:
<SNIP looking for a workaround...

> Can a solution for this version be arrived at?

one possible solution

% try to replace
     ixs=cellfun(@(x) x==4,an(:,2));
% with
     ixs=([an{:,2}]==4).';

us

Subject: Reading data into matlab

From: Felipe Sediles

Date: 27 Jul, 2007 17:19:19

Message: 12 of 19

"us " <us@neurol.unizh.ch> wrote in message <f8d928$5l2$1@fred.mathworks.com>...
> Felipe Sediles:
> <SNIP looking for a workaround...
>
> > Can a solution for this version be arrived at?
>
> one possible solution
>
> % try to replace
> ixs=cellfun(@(x) x==4,an(:,2));
> % with
> ixs=([an{:,2}]==4).';
>
> us
You're awesome, that worked! Thanks!

Subject: Reading data into matlab

From: us

Date: 27 Jul, 2007 17:24:01

Message: 13 of 19

us:
<SNIP incomplete code

a more frugal solution now looks like this

% your data file
     fnam='foo.txt'; % <- your file!
% the engine
     a=textread(fnam,'%s','delimiter','','whitespace','');
     an=cell(numel(a),1);
for i=1:numel(a)
     [an{i,1:3}]=sscanf(a{i},'%f,');
end
     ixs=[an{:,2}]==4;
     ib=strfind(ixs,[0,1])+1;
     ie=strfind(ixs,[1,0]);
if ~isempty(ib)
     nb=numel(ib);
     rb=cell(nb,1);
for i=1:nb
     rb{i,1}=reshape(cat(1,an{ib(i):ie(i),1}).',4,[]).';
end
end
% the result
     type(fnam);
     rb{:}

us

Subject: Reading data into matlab

From: Felipe Sediles

Date: 27 Jul, 2007 17:41:37

Message: 14 of 19

"us " <us@neurol.unizh.ch> wrote in message <f8d9nh$p4m$1@fred.mathworks.com>...
> us:
> <SNIP incomplete code
>
> a more frugal solution now looks like this
>
> % your data file
> fnam='foo.txt'; % <- your file!
> % the engine
> a=textread(fnam,'%s','delimiter','','whitespace','');
> an=cell(numel(a),1);
> for i=1:numel(a)
> [an{i,1:3}]=sscanf(a{i},'%f,');
> end
> ixs=[an{:,2}]==4;
> ib=strfind(ixs,[0,1])+1;
> ie=strfind(ixs,[1,0]);
> if ~isempty(ib)
> nb=numel(ib);
> rb=cell(nb,1);
> for i=1:nb
> rb{i,1}=reshape(cat(1,an{ib(i):ie(i),1}).',4,[]).';
> end
> end
> % the result
> type(fnam);
> rb{:}
>
> us
That works as well!

Subject: Reading data into matlab

From: Felipe Sediles

Date: 29 Jul, 2007 05:17:44

Message: 15 of 19

"us " <us@neurol.unizh.ch> wrote in message <f8d9nh$p4m$1@fred.mathworks.com>...
> us:
> <SNIP incomplete code
>
> a more frugal solution now looks like this
>
> % your data file
> fnam='foo.txt'; % <- your file!
> % the engine
> a=textread(fnam,'%s','delimiter','','whitespace','');
> an=cell(numel(a),1);
> for i=1:numel(a)
> [an{i,1:3}]=sscanf(a{i},'%f,');
> end
> ixs=[an{:,2}]==4;
> ib=strfind(ixs,[0,1])+1;
> ie=strfind(ixs,[1,0]);
> if ~isempty(ib)
> nb=numel(ib);
> rb=cell(nb,1);
> for i=1:nb
> rb{i,1}=reshape(cat(1,an{ib(i):ie(i),1}).',4,[]).';
> end
> end
> % the result
> type(fnam);
> rb{:}
>
> us

Help! So I have this data:

*Heading
** Job name: STB_model9 Model name: Model-1
** Geometry: half thickness = 2.900000100, width = 25.399999600, beta = 1/16th, gamma = 0.000000000
*Preprint, echo=No, history=NO, contact=NO
**
**Parts
**
*Part, name=Part-1
*End Part
**
*Part, name=torque_block2
*End Part
**
**
** ASSEMBLY
**
*Assembly, name=Assembly
**
*Instance, name=Part-1-1, part=Part-1
*Node
1, -31.750000000, 0.000000000, 1.587499980
2, -31.750000000, 2.900000100, 1.587499980
3, -95.250000000, 2.900000100, 1.587499980

...

And I'm running your code, but I get this:

??? Buffer overflow (bufsize = 4095) while reading string from
file (row 1, field 1) ==> 4.450000800, -

How do I fix this? 4.450000800 doesn't even show up as a possible value, far less a value in row 1, field 1! What's going on?

Subject: Reading data into matlab

From: Felipe Sediles

Date: 29 Jul, 2007 05:26:04

Message: 16 of 19

>>...

Apparently the file is too large. I cut most of the lines out and the buffer error didn't come up. Is there anyway of changing the buffersize?

Subject: Reading data into matlab

From: Felipe Sediles

Date: 29 Jul, 2007 05:28:00

Message: 17 of 19

>>...

Or, is there some way to overcome the bufersize limitation by modifying the code? In some of these files there are a million lines of data mixed with some string text!

Subject: Reading data into matlab

From: Miroslav Balda

Date: 29 Jul, 2007 14:47:11

Message: 18 of 19

"Felipe Sediles" <felipe.sediles.nospam@mathworks.com> wrote in message <f8h8h0$fsd$1@fred.mathworks.com>...
> >>...
>
> Or, is there some way to overcome the bufersize limitation by modifying the code? In some of these files there are a million lines of data mixed with some string text!

One million of lines means about 40MB of characters if a file has the structure you have presented. Such an amount of text should MATLAB manage. May be that the function for free-format read could help you. You will find it under

http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=9034&objectType=FILE

Of cause, it can be used if a structure of a file is known.

Subject: Reading data into matlab

From: us

Date: 29 Jul, 2007 22:21:13

Message: 19 of 19

Felipe Sediles:
<SNIP did not read the help...

> Is there anyway of changing the buffersize...

well, yes, did you peruse

     help textread;

at all?
did you, by any chance, <find> the <bufsize> option?
us

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
evergreen us 30 Jan, 2008 20:26:31
buffersize us 29 Jul, 2007 18:25:04
textread us 27 Jul, 2007 12:50:23
code us 27 Jul, 2007 12:50:23
cellfun us 27 Jul, 2007 12:50:23
file io us 27 Jul, 2007 12:50:23
rssFeed for this Thread

envelope graphic E-mail this page to a colleague

Public Submission Policy
NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.
Related Topics