How to read and process a text file?

Question

0 votes

sample.txt

I have attached a sample text file. As yOu can see the comment lines starting with $ must be ommited and the values alone must be read and assigned as variables. For the 'ID' set each column must be named separately like a1 to a4 for the second column and b1 to b4 for the third and so on. Similarly the 'NEXT' set must also be named as each column separately. Please help......

2 Comments
Show None Hide None

Walter Roberson on 15 Dec 2019

Don't do that. Put all of the data into the same variable. Use a structure with dynamic field names for the main variable and use vectors instead of a1 a2 and so on.

Bertilla Raque on 15 Dec 2019

Thank you for your suggestion....I need to perform calculations furthur with the stored variables... and i dont kno how to read the ID set separately and NEXT set separately. Textscan only reads once and then throws a error... how to read the text file from NEXT directly?

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Walter Roberson on 16 Dec 2019

Open in MATLAB Online

0 votes

S = fileread('sample.txt');
S = regexprep(S, {'^\$.*?\n', '\r'}, {'', ''}, 'lineanchors');
all_ids = string(regexp(S, '^\S*', 'match', 'lineanchors'));
noids = regexprep(S, '^\S*\s*', '', 'lineanchors');
mask = all_ids(1:end-1) ~= all_ids(2:end);
stops = find([mask true]);
eol_pos = regexp(noids, '.$', 'lineanchors');
block_lengths = diff([0, eol_pos(stops)]);
blocks_text = mat2cell(noids, 1, block_lengths);
ids = all_ids(stops);
values = cellfun(@(B) cell2mat(textscan(B, '', 'CollectOutput', true)), blocks', 'uniform', 0);

At this point, ids contains a string vector of the names such as NEXT, and values is a cell array of numeric arrays, with each entry containing the numeric data corresponding.

9 Comments
Show 7 older comments Hide 7 older comments

Walter Roberson on 18 Dec 2019

Open in MATLAB Online

The "right" procedure for that is not to do things that way.

http://www.mathworks.com/matlabcentral/answers/304528-tutorial-why-variables-should-not-be-named-dynamically-eval

You have a variable number of columns, so you would be dynamically generating the variable names; if you had a fixed number of columns it could potentially make sense to assign columns to fixed variables.

As on-lookers, we see an input file with a variable number of columns, and we must assume that in your real files, there is the possibility that you might have 9 or more columns and thus need to assign automatically to variables a, b, c, d, e, f, g, h, i which would interfere with using i as the loop control variable for i

It is also not at all clear to me whether a would refer to all 8 values that your sample file shows in column 2, or only to the first 4 of them, or if you intend there to be some method of triggering going on to the next block, so that variable a would first refer to the 4 in the second column of the first block and then would refer to the 4 in the second column of the second block. What should d refer to when you are processing the first block, since there is no 4th column in the first block but there is a 4th column in the second block?

T(i)=[a(i) b(i) c(i)]

The right hand side there would be a vector of length 3, but the left hand side is a scalar. If we extend the code to

T(i,:) = [a(i) b(i) c(i)];
t(i) = T(i,:)/norm(T(i,:));

then the right hand side would be a vector of length 3 because T(i,:) would be a vector of length 3, but the left hand side names a scalar location. So you would have to extend further to

T(i,:) = [a(i) b(i) c(i)];
t(i,:) = T(i,:)/norm(T(i,:));

It is not clear from what you write whether you want to normalize only the first 3 numeric columns in a block, or if you want to normalize all numeric columns in a block.

And at the end, are you wanting your t to contain the normalized data of all of the blocks together in one array, or do you want the normalized data block by block?

The code would be easier if you were using a newer MATLAB release...

S = fileread('sample.txt');
S = regexprep(S, {'^\$.*?\n', '\r'}, {'', ''}, 'lineanchors');
all_ids = regexp(S, '^\S*', 'match', 'lineanchors');
noids = regexprep(S, '^\S*\s*', '', 'lineanchors');
mask = ~strcmp(all_ids(1:end-1), all_ids(2:end));  %is each the same as the adjacent?
stops = find([mask true]);
eol_pos = regexp(noids, '.$', 'lineanchors');
block_lengths = diff([0, eol_pos(stops)]);
blocks = mat2cell(noids, 1, block_lengths);
ids = all_ids(stops);
values = cellfun(@(B) cell2mat(textscan(B, '', 'CollectOutput', true)), blocks', 'uniform', 0);
normalized_blocks = cellfun(@(B) cell2mat(arrayfun(@(R) B(R,:)./norm(B(R,:)), (1:size(B,1)).', 'uniform', 0)), values, 'uniform', 0);
%now normalized_blocks is a cell array in which each row of the block has
%been normalized independently

Walter Roberson on 20 Dec 2019

Again, why do you need to name the columns when you can just index instead?

Bertilla Raque on 20 Dec 2019

so that I can use in them in furthur calculations in a loop. I think I was having problems with the loop because of the scalar problem I'll try indexing.

Sign in to comment.

How to read and process a text file?

2 Comments
Show None Hide None

Answers (1)

9 Comments
Show 7 older comments Hide 7 older comments

Categories

Products

Release

Tags

Community Treasure Hunt

How to read and process a text file?

2 Comments Show None Hide None

Answers (1)

9 Comments Show 7 older comments Hide 7 older comments

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

2 Comments
Show None Hide None

9 Comments
Show 7 older comments Hide 7 older comments