How to read and process a text file?

I have attached a sample text file. As yOu can see the comment lines starting with $ must be ommited and the values alone must be read and assigned as variables. For the 'ID' set each column must be named separately like a1 to a4 for the second column and b1 to b4 for the third and so on. Similarly the 'NEXT' set must also be named as each column separately. Please help......

2 Comments

Don't do that. Put all of the data into the same variable. Use a structure with dynamic field names for the main variable and use vectors instead of a1 a2 and so on.
Thank you for your suggestion....I need to perform calculations furthur with the stored variables... and i dont kno how to read the ID set separately and NEXT set separately. Textscan only reads once and then throws a error... how to read the text file from NEXT directly?

Sign in to comment.

Answers (1)

S = fileread('sample.txt');
S = regexprep(S, {'^\$.*?\n', '\r'}, {'', ''}, 'lineanchors');
all_ids = string(regexp(S, '^\S*', 'match', 'lineanchors'));
noids = regexprep(S, '^\S*\s*', '', 'lineanchors');
mask = all_ids(1:end-1) ~= all_ids(2:end);
stops = find([mask true]);
eol_pos = regexp(noids, '.$', 'lineanchors');
block_lengths = diff([0, eol_pos(stops)]);
blocks_text = mat2cell(noids, 1, block_lengths);
ids = all_ids(stops);
values = cellfun(@(B) cell2mat(textscan(B, '', 'CollectOutput', true)), blocks', 'uniform', 0);
At this point, ids contains a string vector of the names such as NEXT, and values is a cell array of numeric arrays, with each entry containing the numeric data corresponding.

9 Comments

I'm sorry I'm new to MATLAB. What do the empty quotes in the second line mean and there is an error on executing this code in the third line saying the char is to be used instead of string. Thank you in advance
The empty quotes in that context mean that the patterns in the first {} are to be replaced with nothing -- that is, the action of the regexprep() is to find some patterns and delete them where it finds them.
I just now noticed your R2012a marking; I coded using functions that need R2016b or later. I will have to have another look at the code to figure out the best way to make it backwards compatible.
Okay I get it now... Thanks a lot.
all_ids = regexp(S, '^\S*', 'match', 'lineanchors');
noids = regexprep(S, '^\S*\s*', '', 'lineanchors');
mask = ~strcmp(all_ids(1:end-1), all_ids(2:end)); %is each the same as the adjacent?
stops = find([mask true]);
and at the end, ids would be a cell array of character vectors containing the names such as NEXT
This works for separating the text and the numeric data, but I am not able to access the numbers. It only returns a 2x1 cell array. I'm sorry if I was not clear in explaining. So if the numbers in the first column after ID are named as a1 to a4 and the next bolumn as b(i) then I want to call them for furthur calculations in a loop. Something like
for i=1:4
T(i)=[a(i) b(i) c(i)]
t(i)=T(i)/norm(T(i))
I know this isn't right, I want to know the right procedure for doing this
The "right" procedure for that is not to do things that way.
You have a variable number of columns, so you would be dynamically generating the variable names; if you had a fixed number of columns it could potentially make sense to assign columns to fixed variables.
As on-lookers, we see an input file with a variable number of columns, and we must assume that in your real files, there is the possibility that you might have 9 or more columns and thus need to assign automatically to variables a, b, c, d, e, f, g, h, i which would interfere with using i as the loop control variable for i
It is also not at all clear to me whether a would refer to all 8 values that your sample file shows in column 2, or only to the first 4 of them, or if you intend there to be some method of triggering going on to the next block, so that variable a would first refer to the 4 in the second column of the first block and then would refer to the 4 in the second column of the second block. What should d refer to when you are processing the first block, since there is no 4th column in the first block but there is a 4th column in the second block?
T(i)=[a(i) b(i) c(i)]
The right hand side there would be a vector of length 3, but the left hand side is a scalar. If we extend the code to
T(i,:) = [a(i) b(i) c(i)];
t(i) = T(i,:)/norm(T(i,:));
then the right hand side would be a vector of length 3 because T(i,:) would be a vector of length 3, but the left hand side names a scalar location. So you would have to extend further to
T(i,:) = [a(i) b(i) c(i)];
t(i,:) = T(i,:)/norm(T(i,:));
It is not clear from what you write whether you want to normalize only the first 3 numeric columns in a block, or if you want to normalize all numeric columns in a block.
And at the end, are you wanting your t to contain the normalized data of all of the blocks together in one array, or do you want the normalized data block by block?
The code would be easier if you were using a newer MATLAB release...
S = fileread('sample.txt');
S = regexprep(S, {'^\$.*?\n', '\r'}, {'', ''}, 'lineanchors');
all_ids = regexp(S, '^\S*', 'match', 'lineanchors');
noids = regexprep(S, '^\S*\s*', '', 'lineanchors');
mask = ~strcmp(all_ids(1:end-1), all_ids(2:end)); %is each the same as the adjacent?
stops = find([mask true]);
eol_pos = regexp(noids, '.$', 'lineanchors');
block_lengths = diff([0, eol_pos(stops)]);
blocks = mat2cell(noids, 1, block_lengths);
ids = all_ids(stops);
values = cellfun(@(B) cell2mat(textscan(B, '', 'CollectOutput', true)), blocks', 'uniform', 0);
normalized_blocks = cellfun(@(B) cell2mat(arrayfun(@(R) B(R,:)./norm(B(R,:)), (1:size(B,1)).', 'uniform', 0)), values, 'uniform', 0);
%now normalized_blocks is a cell array in which each row of the block has
%been normalized independently
First of all thank you so much for your time and effort. I want a to refer to the first block's 4 values in column 2. The 4 values in column 2 of the next block must have a different name. And thank you for the explanation on vectors and scalars.
Again, why do you need to name the columns when you can just index instead?
so that I can use in them in furthur calculations in a loop. I think I was having problems with the loop because of the scalar problem I'll try indexing.

Sign in to comment.

Products

Release

R2012b

Asked:

on 15 Dec 2019

Commented:

on 20 Dec 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!