Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
alternative to laod file?

Subject: alternative to laod file?

From: M K

Date: 29 Sep, 2009 10:36:04

Message: 1 of 7

Hi !

I have a series of large ASCII files which I want to process using Matlab. I do that by loading each file using the 'load(filename)' function. Howeever, I 've noticed that some of the files are incomplete and therefore the number of columns in the last line are not consistent with the rest of the rows in a given file. So what I'd like to do is load the incomplete files to only a certain number of rows (ie to the row number that is complete).

So for ex if the original file size is : 1000 rows x 1000 columns (the real numbers are much much larger), I want to load the file to 999 rows x 1000 columns since the 1000th row is corrupt. Is there any way I could do this in Matlab. One option would be to open each corrupt ASCII file and delete the last row manually but I have >500 files! So wondered if there was a way of doing this on my script file?

Any help will be much appreciated.

Subject: alternative to laod file?

From: Rune Allnor

Date: 29 Sep, 2009 10:46:16

Message: 2 of 7

On 29 Sep, 12:36, "M K" <mah...@mathworks.com> wrote:
> Hi !
>
> I have a series of large ASCII files which I want to process using Matlab. I do that by loading each file using the 'load(filename)' function. Howeever, I 've noticed that some of the files are incomplete and therefore the number of columns in the last line are not consistent with the rest of the rows in a given file. So what I'd like to do is load the incomplete files to only a certain number of rows (ie to the row number that is complete).
>
> So for ex if the original file size is : 1000 rows x 1000 columns (the real numbers are much much larger), I want to load the file to 999 rows x 1000 columns since the 1000th row is corrupt. Is there any way I could do this in Matlab. One option would be to open each corrupt ASCII file and delete the last row manually but I have >500 files! So wondered if there was a way of doing this on my script file?

If you know the number of lines in advance, TEXTSCAN can be
used to read a specified number of lines.

If you don't know the number of lines in advance, I would have
read each line, one by one, and parsed them to see if they are
complete.

Rune

Subject: alternative to laod file?

From: M K

Date: 29 Sep, 2009 10:57:02

Message: 3 of 7

Thanks for your reply, Rune.
 

I do know the number of rows and columns of uncorrupt data (say for ex 999 x 1000). So the following should work

u=1:500;

   FID=ID=['Rx' num2str(u) '.txt']
   D=texscan(FID,999,1000);
end

?

Subject: alternative to laod file?

From: Leslie McBrayer

Date: 29 Sep, 2009 12:38:47

Message: 4 of 7

> I do know the number of rows and columns of uncorrupt data (say for ex
> 999 x 1000). So the following should work
>
> u=1:500;
>
> FID=ID=['Rx' num2str(u) '.txt']
> D=texscan(FID,999,1000);
> end
>
> ?

Not quite. With textscan, you need to:
* Open the file with fopen to get fid.
* Describe the columns with format specifiers such as %d or %f.
* Specify only a single repetition factor (not rows and columns).

For your case, I would recommend the dlmread function. For example:

for u=1:500;

     filename = sprintf('Rx%d.txt', u);
     D = dlmread(filename, '', [0 0 999 1000]);

end

For more info, type "doc dlmread" at the command prompt.

Subject: alternative to laod file?

From: M K

Date: 29 Sep, 2009 14:49:01

Message: 5 of 7

Thanks for the replies. The dlmread function isn't happy with my input files. I think the issue is that each of files are approx 1.2GB!!!

I tried the fscanf function- strangely it doesn't extract the numbers from the ASCII file. My code is below

for u=1:500
  filename=['Rx' num2str(u) '.txt'];
  fid = fopen(filename);
  a = fscanf(fid, '%g %g', [14484 5557]); % It has 14484 rows x 5557 columns.
  fclose(fid)

end

The above is what I would use instead of D=load(filename) which works for uncorrupt data but sadly not in the corrupt files due to lack of symmetry.

It seems strange that I can't use the std functions. Would appreciate any thoughts/help on this. Thanks in advance.

Subject: alternative to laod file?

From: M K

Date: 29 Sep, 2009 16:07:02

Message: 6 of 7

could someone please help.


I've tried many things but I can't seem to get any function to do what load does for me.

Subject: alternative to laod file?

From: Andres

Date: 29 Sep, 2009 16:35:19

Message: 7 of 7

"M K" <maha_k@mathworks.com> wrote in message <h9tbb6$rma$1@fred.mathworks.com>...
> could someone please help.
>
>
> I've tried many things but I can't seem to get any function to do what load does for me.


A small script that uses txt2mat from the file exchange. txt2mat is not necessarily sensitive to incomplete rows, but it must loop through the file due to your huge memory demand (should be quite quick, though).
Try+vary if you like (i can't test here)


% your parameters
fn = 'c:\myhugefile.txt';
numRow = 14484;
numCol = 5557;
rowStep = 1000;

% initializations
D = zeros(numRow,numCol); % phew
fp = 0;
rowStart= 1;

% loop through file
while rowStart <= numRow
    rowEnd = min(rowStart+rowStep-1,numRow);
    [A,ffn,nh,SR,hl,fp] = txt2mat(fn,0,numCol,'%f',...
        'RowRange',[1,rowEnd-rowStart+1],...
        'FilePos',fp, 'ReadMode','block',...
        'InfoLevel',0);
    D(rowStart:rowEnd,:) = A;
    rowStart = rowEnd+1;
end

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us