Thread Subject: read large text files

Subject: read large text files

From: Anandhi

Date: 28 Oct, 2009 01:14:04

Message: 1 of 6

Hi ,

I have text files having 6 columns of data, but the number of rows is greater than 100000. I do not know the exact row number.

When I use this prog I am able to get upto 100000 rows. How to get the rows beyond this till the end of file?

block_size = 100000;
format = '%f %f %f %f %f %f';
file_id = fopen(fno{i});
cnt=0;
segarray = textscan(file_id, format, block_size);

thanks in advance for the support

anandhi

Subject: read large text files

From: dpb

Date: 28 Oct, 2009 01:54:03

Message: 2 of 6

Anandhi wrote:
...
> When I use this prog I am able to get upto 100000 rows. How to get
> the rows beyond this till the end of file?
>
> block_size = 100000;
> format = '%f %f %f %f %f %f';
> file_id = fopen(fno{i});
> cnt=0;
> segarray = textscan(file_id, format, block_size);
...

Don't specify N and textscan() should read to EOF

Alternatively, see

doc textscan

and note one can call textscan repeatedly on the same fid and continue
from where left off.

Doc doesn't indicate it, but N=-1 in textread() is a flag for "read to
end of file"; one would presume that would have been implemented in
textscan() as well. Also, I'd presume inf would have the same effect.
I can't test these hypotheses as my version predates textscan().

--

Subject: read large text files

From: anandhi

Date: 28 Oct, 2009 03:39:18

Message: 3 of 6

On Oct 27, 9:54 pm, dpb <n...@non.net> wrote:
> Anandhi wrote:
>
> ...> When I use this prog I am able to get upto 100000 rows. How to get
> > the rows beyond this till the end of file?
>
> > block_size = 100000;
> > format = '%f %f %f %f %f %f';
> > file_id = fopen(fno{i});
> > cnt=0;
> > segarray = textscan(file_id, format, block_size);
>
> ...
>
> Don't specify N and textscan() should read to EOF
>
> Alternatively, see
>
> doc textscan
>
> and note one can call textscan repeatedly on the same fid and continue
> from where left off.
>
> Doc doesn't indicate it, but N=-1 in textread() is a flag for "read to
> end of file"; one would presume that would have been implemented in
> textscan() as well.  Also, I'd presume inf would have the same effect.
> I can't test these hypotheses as my version predates textscan().
>
> --

Thanks for the response, however

when i call textscan repeatedly on the same fid and continue
 it does continue upto 100000 lines only after which it does not
continue.

eg the file has 1179919 lines

segarray = textscan(file_id, format);
segarray1 = textscan(file_id, format);

I still get the size of segarray1 empty

Subject: read large text files

From: Praetorian

Date: 28 Oct, 2009 04:04:56

Message: 4 of 6

On Oct 27, 9:39 pm, anandhi <anandhi.san...@gmail.com> wrote:
> On Oct 27, 9:54 pm, dpb <n...@non.net> wrote:
>
>
>
> > Anandhi wrote:
>
> > ...> When I use this prog I am able to get upto 100000 rows. How to get
> > > the rows beyond this till the end of file?
>
> > > block_size = 100000;
> > > format = '%f %f %f %f %f %f';
> > > file_id = fopen(fno{i});
> > > cnt=0;
> > > segarray = textscan(file_id, format, block_size);
>
> > ...
>
> > Don't specify N and textscan() should read to EOF
>
> > Alternatively, see
>
> > doc textscan
>
> > and note one can call textscan repeatedly on the same fid and continue
> > from where left off.
>
> > Doc doesn't indicate it, but N=-1 in textread() is a flag for "read to
> > end of file"; one would presume that would have been implemented in
> > textscan() as well.  Also, I'd presume inf would have the same effect.
> > I can't test these hypotheses as my version predates textscan().
>
> > --
>
> Thanks for the response, however
>
> when i call textscan repeatedly on the same fid and continue
>  it does continue upto 100000 lines only after which it does not
> continue.
>
> eg the file has 1179919 lines
>
> segarray = textscan(file_id, format);
> segarray1 = textscan(file_id, format);
>
> I still get the size of segarray1 empty

You could try using my CSVIMPORT submission from FEX (http://
tinyurl.com/yjctr57).

HTH,
Ashish.

Subject: read large text files

From: Rune Allnor

Date: 28 Oct, 2009 06:32:03

Message: 5 of 6

On 28 Okt, 02:14, "Anandhi " <anan...@mathworks.com> wrote:
> Hi ,
>
> I have text files having 6 columns of data, but the number of rows is greater than 100000. I do not know the exact row number.
>
> When I use this prog I am able to get upto 100000 rows. How to get the rows beyond this till the end of file?

This is a trivial programming exercise on buffered I/O:

1) Decide on a buffer size
2) Clear block, set block ponter to star of buffer
3) Read till buffer is full or EOF is found
4) Process data
5) If EOF not yet found, repeat from 2)

Rune

Subject: read large text files

From: dpb

Date: 28 Oct, 2009 13:45:32

Message: 6 of 6

anandhi wrote:
...
> when i call textscan repeatedly on the same fid and continue
> it does continue upto 100000 lines only after which it does not
> continue.
>
> eg the file has 1179919 lines
>
> segarray = textscan(file_id, format);
> segarray1 = textscan(file_id, format);
>
> I still get the size of segarray1 empty

I'd suspect there's a problem in the file at that point then. From
Remarks section in documentation--

"When textscan reads a specified file or string, it attempts to match
the data to the format string. If textscan fails to convert a data
field, it stops reading and returns all fields read before the failure."

Perhaps during your experimenting you accidentally wrote an EOF or some
other data to the file???

I'd suggest using a text-listing/viewing tool to verify the file is,
indeed, still pristine (my hunch is you'll find it isn't).

--

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com