How can I read a specific range of lines from a text file without using a for loop?

I need to read data from text files (a lot of them), and they are formatted as follow:
.
.
.
internalField nonuniform List<scalar>
241920
(
0
0
0
0
0
.
.
.
0
0
0
)
;
.
.
.
The data I'm interested is the zero after the "(". The 241920 is the number of lines of data. The numbers are not necesarily 0 (if it matters, their values are constrained as ).
I want to get these numbers in an array. So far, I first read the line containing the number of data points (it is located in line 22 of the text file) to intialize the array, then I used textscan in a for loop to read the text file line by line starting from line 24. The problem is that this process is VERY slow, and I need to read hundreds of these files (the number of rows can vary, but is always specified in line 22)
Here is the for loop I'm using (I copied the textscan funtion from another post, I'm honestly not sure how it works)
for i=1:fieldSize
alpha(i) = str2double(string(textscan(fileID,'%s',1,'delimiter','\n', 'headerlines',(linenum+i-1)-1)));
fseek(fileID, 0, 'bof');
end
% Where fieldSize is the number found on line 22 as previously mentioned
% linenum is where the data starts (which is 24 for these text files). The -2 added to the linenum is just to match the code I got it from
% alpha is the array to where the data is being exported to
What I want to do then is to make this code very efficient, and to do that I believe I need to eliminate the for loop and use a function that can read a range of lines, not necesarily starting at the beginning of the text file.
EDIT:
I attached a sample text file. The first 22 lines are constant, just info file from the program that produced the text file.

4 Comments

Are those the onlyl times that parentheses come up in the file?
If so, you might be able to get away with some regexp stuff.
text = fileread('mytextfile.txt');
numbers = regexp(text,'[(]\s(.*)\s[)]','tokens');
From there you should end up with a cell that contains the stuff. You can split it into different elements with using regexp again.
numbers = regexp(numbers,'\s','split');
Then you should be able to use some cellfun, str2num, and cell2mat to make things into an array. You may have some troubles with cellfun because regexp has a tendency to bury things in a couple of levels deep in cells, but there are workarounds for that if needed.
sscanf is probably going to be the fastest way to read your file. Can you post an example text file so we can give you the correct code.
Certainly, the code you're using is not going to be efficient. For a start, you pointlessly convert a cell array of char vector to a string array, which you then convert to an array of double. You could convert the cell array directly to double, but best is to read the numbers directly as numbers rather than text.
Bob Nbob,
Yup, those are the only times that parenthesis pop up in the whole text files. I attached a sample text file in case you want to take a look. I will try this and report back.
Guillaume,
I posted the example text file.
Some tests we did about a week and a half ago showed that textscan is faster than fscanf.

Sign in to comment.

 Accepted Answer

Once you have just read the fieldSize, then
alpha = cell2mat(textscan(fileID,'%f',fieldSize, 'HeaderLines',1));

5 Comments

fileID = fopen('alpha.water.txt', 'r');
fieldSize = cell2mat(textscan(fileID, '%f', 1, 'HeaderLines', 21));
fgets(fileID); %flush to end of line
alpha = cell2mat(textscan(fileID,'%f',fieldSize, 'HeaderLines',1));
fclose(fileID);
Awesome! This worked perfectly! Can I ask how it works without the need of the for loop? I know that textscan is able to retain the last read line, but how do you make it read the whole range?
textscan starts reading from wherever the file pointer is. It is not specifically keeping track of positions: it just tells the file system to read and the file system takes care of the details.
In textscan right after the format you can pass in the number of times the format will be used, with each use creating a new row in the output cell arrays. This does not exactly mean the same as the number of lines to read, but in rectangular blocks with separated columns and no blank lines and a format that grabs as many entries as are in a standard line, then it works out the same as saying it is the number of lines to read.
After the last time the format is used according to the count, the file position is left right after the last data item read. That would typically be pointing to the end of line before having consumed the line termination.

Sign in to comment.

More Answers (0)

Categories

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!