Thread Subject: Reading textfile

Subject: Reading textfile

From: Bryan Heit

Date: 17 Sep, 2009 00:01:03

Message: 1 of 5

I am having trouble reading in a text file. What I want is to
generate an array of strins, 1 column wide by as many rows long as
there is lines in the dataset. The dataset is an HTML page saved as
text, containing bioinformatic information. I'm working on a script
that'll pull specific species data out of the dataset, but cannot make
much progress. I've tried several ways of reading the data
(importdata, textscan, etc) to no avail. At best the first 4-5 lines
get read in, then the read process is terminated (there are thousands
of lines). The data itself looks as follows:

--------------------------------------------------------------------------------
NPSA gnl|sp|P0C9I2 (1107L_ASFK5) Protein MGF 110-7L OS=African swine
fever virus (isolate Pig/Kenya/KEN-50/1950) GN=Ken-016 PE=3 SV=1

*****› PATTERN 1
 Site : 56- 64, Identity
   tyvescrfcw_DCEDGVCTS_riwgnnstsi
--------------------------------------------------------------------------------
NPSA gnl|sp|P0C9I3 (1107L_ASFM2) Protein MGF 110-7L OS=African swine
fever virus (isolate Tick/Malawi/Lil 20-1/1983) GN=Mal-013 PE=3 SV=1

*****› PATTERN 1
 Site : 56- 64, Identity
   tyvescrfcw_DCEDGVCTS_rvwgnnstsi
--------------------------------------------------------------------------------
NPSA gnl|sp|P0C9I4 (1107L_ASFP4) Protein MGF 110-7L OS=African swine
fever virus (isolate Tick/South Africa/Pretoriuskop Pr4/1996)
GN=Pret-017 PE=3 SV=1

*****› PATTERN 1
 Site : 56- 64, Identity
   tyvescrfcw_DCEDGICTS_rvwgnnstsi
--------------------------------------------------------------------------------

This goes on and on - I would like to read every line; even the
'-----' ones and blank ones, into the data array.

Any help would be greatly appreciated.

Bryan

Subject: Reading textfile

From: dpb

Date: 17 Sep, 2009 00:12:02

Message: 2 of 5

Bryan Heit wrote:
> I am having trouble reading in a text file. What I want is to
> generate an array of strins, 1 column wide by as many rows long as
> there is lines in the dataset. ...
> This goes on and on - I would like to read every line; even the
> '-----' ones and blank ones, into the data array.
>
> Any help would be greatly appreciated.

doc fgetl

--

Subject: Reading textfile

From: Rune Allnor

Date: 17 Sep, 2009 07:45:25

Message: 3 of 5

On 17 Sep, 02:01, Bryan Heit <bryans.spam.t...@gmail.com> wrote:
> I am having trouble reading in a text file.  What I want is to
> generate an array of strins, 1 column wide by as many rows long as
> there is lines in the dataset.  The dataset is an HTML page saved as
> text, containing bioinformatic information.  I'm working on a script
> that'll pull specific species data out of the dataset, but cannot make
> much progress.  I've tried several ways of reading the data
> (importdata, textscan, etc) to no avail.  At best the first 4-5 lines
> get read in, then the read process is terminated (there are thousands
> of lines).  The data itself looks as follows:
...
> Any help would be greatly appreciated.

You will have to write your own parser from scratch.

You should take some time to find out exactly what you
want to use these data for, and how, and come up with a
data structure that fits this use.

Once that's done, scan the file to extract (possibly
multi line) data items. Then scan the lines and extract
whatever data you want. Store the data in structures
or cell arrays.

My point is that this is a somewhat involved task that
might not be easily solved with canned routines. If you
think the above sounds daunting, find/hire somebody that
can help you - it is a standard programming task that any
computer science student can help with. Expect to spend
a bit of time explaining a helper how to separate the
data, though.

Rune

Subject: Reading textfile

From: Lucio Cetto

Date: 17 Sep, 2009 09:39:02

Message: 4 of 5

Bryan:
textscan does it; if the file is too large you should increase buffersize, if every record in the file always have the same number of rows you could reshape the output cell array or play a little more with the format string and you will get the data aranged into columns very easily...

fid = fopen('mcen.txt','r');
strs = textscan(fid,'%s','delimiter','\n')
strs{1}
fclose(fid)

HTH
Lucio

Subject: Reading textfile

From: Bryan

Date: 17 Sep, 2009 15:29:09

Message: 5 of 5

On Sep 17, 5:39 am, "Lucio Cetto" <lce...@nospam.mathworks.com> wrote:
> Bryan:
> textscan does it; if the file is too large you should increase buffersize, if every record in the file always have the same number of rows you could reshape the output cell array or play a little more with the format string and you will get the data aranged into columns very easily...
>
> fid = fopen('mcen.txt','r');
> strs = textscan(fid,'%s','delimiter','\n')
> strs{1}
> fclose(fid)
>
> HTH
> Lucio

Thanx everyone for your impost. Lucio, your method worked perfectly -
the array is huge, but it loads the whole file and I can parse it
easily to extract the data I want.

Once again, thank you everyone.

Bryan

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
read text file Sprinceana 17 Sep, 2009 05:07:49
fgetl Sprinceana 17 Sep, 2009 05:07:45
textread Sprinceana 17 Sep, 2009 05:07:39
rssFeed for this Thread

Contact us at files@mathworks.com