Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Pre-determining the number of lines in a text file

Asked by Matt J on 4 Jul 2013
Latest activity Answered by Ken Atwell on 30 Oct 2014 at 3:06

Is there any programmatic way of determining in advance the number of lines in a text file, for use with dlmread, textscan, etc...? I mean other than some brute force way like reading line by line in a while loop until EOF is hit.

5 Comments

Matt J on 4 Jul 2013

Yes, but there are still independent reasons why it would be beneficial to know the number of lines in the file. DLMREAD would provide a very easy way to read column-by-column or blocks of columns if one knows the number of lines in advance.

Suppose I want to use the syntax

 M = dlmread(filename, delimiter, range)

where I want the "range" to designate all rows in the file, but only a subset of the columns.

Guru on 4 Jul 2013

Honestly, that type of behavior is exactly what TEXTSCAN should be used for instead of DLMREAD. The documentation of TEXTSCAN shows nice examples of ignoring various columns, and by default it will read all rows of the file.

Matt J on 5 Jul 2013

Well, okay, maybe that was a bad example. But surely, in general, it helps to know in advance how much data there is to read so you can plan, pre-allocate, etc...

Matt J

Products

4 Answers

Answer by Walter Roberson on 4 Jul 2013
Accepted answer

The only operating system that MATLAB has ever run on that supported that ability was DEC's VMS, and for technical reasons VMS's facility for that could not be used with MATLAB.

The modern treatment of "lines" as being delimited by a particular character or character pair (e.g., LF or CR+LF) does not offer any way to count the lines short of reading through the file and counting the delimiters.

3 Comments

Matt J on 4 Jul 2013

I see. But I still wonder why MATLAB doesn't provide a single command that will do that. It seems like it could be a useful bit of info to extract.

Guru on 4 Jul 2013

Well on that note, it isn't hard for you to write a simple function that can do that...

Matt J on 4 Jul 2013

Or, I think it should be possible to allow dlmread to specify Infs in its range argument. That could trigger the file reading to stop when the limits of the file were reached.

Walter Roberson
Answer by Guru on 4 Jul 2013
Edited by Guru on 4 Jul 2013

Just out of boredom, here's a function:

function n = linecount(fid)
n = 0;
tline = fgetl(fid);
while ischar(tline)
  tline = fgetl(fid);
  n = n+1;
end

Edited: Thanks for comment Walter

7 Comments

Matt J on 5 Jul 2013

Walter, my version and Guru's return the same result n in tests that I've done. I've also inspected the length of the test file manually and your version runs 1 line too low. However, the documentation coincides with what you are saying about feof, so I cannot immediately reconcile any of this.

In any case, fgetl() apparently does not check nargout before allocating memory as you surmised. Too bad, I guess. It seems like an easy enhancement. Oddly, my version is even slower than Guru's. Overhead in feof()???

Walter Roberson on 5 Jul 2013

In the text file you used to test with, does the last line end with the line terminator, or does it just end with no terminator?

Yes, feof() has overhead.

Matt J on 5 Jul 2013

I cannot scroll any further than the final line of text. I guess that means it ends with no terminator? Both my version and Guru's correctly count the number of lines of actual text, though.

Guru
Answer by Hyatt on 29 Oct 2014 at 17:20

Another approach is to use the underlying operating system's functionality. Specifically, UNIX/Linux (i.e. also Mac) include a command line method 'wc -l [filename]' to get the line count of [filename].

To implement in MATLAB you could do something like this

if (~ispc) 
  [status, cmdout]= system('wc -l filenameOfInterest.txt');
  if(status~=1)
      scanCell = textscan(cmdout,'%u %s');
      lineCount = scanCell{1}; 
  else
      fprintf(1,'Failed to find line count of %s\n',filenameOfInterest.txt);
      lineCount = -1;
  end
else
  fprintf(1,'Sorry, I don''t know what the equivalent is for a windows system\n');
  lineCount = -1;
end

0 Comments

Hyatt
Answer by Ken Atwell on 30 Oct 2014 at 3:06

If we can make two assumptions:

  • ASCII #10 is a reliable end-of-line marker
  • The entire file will fit into memory (that is, we're not talking about Big Data)

I would do the following (using the help for the plot command in this example):

 txt=fileread(fullfile(matlabroot, 'toolbox', 'matlab', 'graph2d', 'plot.m'));
 sum(txt==10)+1

This will be fast... certainly faster than "fgetl" approach, but maybe not as fast as the "wc" approach Hyatt put forth above (assuming you can live without Windows platform support).

0 Comments

Ken Atwell

Contact us