Is it possible to split a large text file into half and subsequently use textscan for both parts?

Question

Atipong on 13 May 2013

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/75643-is-it-possible-to-split-a-large-text-file-into-half-and-subsequently-use-textscan-for-both-parts

Hi,

This is my first time in this forum.

I am working on a large text file containing a large number of data 10^5 * 600 of 16-digit elements. I use the textscan command to read a string data. I already known the number of columns, so I am able to generate a format spec beforehand. The main part of my code is shown below:

array=textscan(fileID,Spec,NumRow,'Delimiter',delim,'MultipleDelimsAsOne',true,'HeaderLines',1,'ReturnOnError',false);

When I specify the NumRow (number of rows) as 50000 or below, it works fine and only took about 1 minute to run. However, my system seems to crash when I increase the NumRow to 100,000. I suspect that my virtual memory has reached its limit.

Therefore, I wonder that is there a way I can split the data into two parts. Say, from the 1st -50,000th row and 50000th -100000th row

Thanks! Ati

3 Comments
Show 1 older commentHide 1 older comment

Atipong on 14 May 2013

Hi,

It's something like this, with 10^5 rows and 600 columns separated by space.

-4.7533250000e-05 -4.8990000000e-05 -3.5166750000e-01

1.5550000000e-02 -1.5832100000e-09 -4.3949250000e-01

-1.9371000000e-04 -1.1074875000e-01 -6.1198500000e-01

Cedric on 14 May 2013

So when there is no minus sign, there are two spaces?

Sign in to comment.

Sign in to answer this question.

Answer 1

per isakson on 13 May 2013

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/75643-is-it-possible-to-split-a-large-text-file-into-half-and-subsequently-use-textscan-for-both-parts#answer_85292

Edited: per isakson on 13 May 2013

Something like this

    nRow = 50000;
    fid  = fopen( ... )
    buf1 = textscan( fid, ..., nRow, .... );
    ....
    buf2 = textscan( fid, ..., nRow, .... );
    fclose( fid );

3 Comments
Show 1 older commentHide 1 older comment

per isakson on 14 May 2013

Edited: per isakson on 14 May 2013

You have to process the data in buf1 and

clear buf1

before reading the rest of the file. Or

    buf = textscan( fid, ..., nRow, .... );
    ....
    buf = textscan( fid, ..., nRow, .... );

I guess, I would have written the data to one or more binary files and used memmapfile to work with the data.

Walter Roberson on 14 May 2013

per is correct.

To be explicit, textscan() does not read in the entire file when you specify the repeat count.

Sign in to comment.

Answer 2

Yao Li on 14 May 2013

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/75643-is-it-possible-to-split-a-large-text-file-into-half-and-subsequently-use-textscan-for-both-parts#answer_85341

You can use for loops to auto-generate the formatSpec for textscan(). For example, you can read two column at a time by defining formatSpec as:

for j=1:300
    for k=1:600
        temp{k}='%*f';
    end
    temp{2*j}='%f';
    temp{2*j-1}='%f';
    formatSpec_array{j}=strcat(temp{1},temp{2});
    for i=3:600
    formatSpec_array{j}=strcat(formatSpec_array{j},temp{i});
    end
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Is it possible to split a large text file into half and subsequently use textscan for both parts?

3 Comments
Show 1 older commentHide 1 older comment

Answers (2)

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Is it possible to split a large text file into half and subsequently use textscan for both parts?

3 Comments Show 1 older commentHide 1 older comment

Answers (2)

3 Comments Show 1 older commentHide 1 older comment

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments