How to read and process a text file?
Show older comments
I have attached a sample text file. As yOu can see the comment lines starting with $ must be ommited and the values alone must be read and assigned as variables. For the 'ID' set each column must be named separately like a1 to a4 for the second column and b1 to b4 for the third and so on. Similarly the 'NEXT' set must also be named as each column separately. Please help......
2 Comments
Walter Roberson
on 15 Dec 2019
Don't do that. Put all of the data into the same variable. Use a structure with dynamic field names for the main variable and use vectors instead of a1 a2 and so on.
Bertilla Raque
on 15 Dec 2019
Answers (1)
Walter Roberson
on 16 Dec 2019
S = fileread('sample.txt');
S = regexprep(S, {'^\$.*?\n', '\r'}, {'', ''}, 'lineanchors');
all_ids = string(regexp(S, '^\S*', 'match', 'lineanchors'));
noids = regexprep(S, '^\S*\s*', '', 'lineanchors');
mask = all_ids(1:end-1) ~= all_ids(2:end);
stops = find([mask true]);
eol_pos = regexp(noids, '.$', 'lineanchors');
block_lengths = diff([0, eol_pos(stops)]);
blocks_text = mat2cell(noids, 1, block_lengths);
ids = all_ids(stops);
values = cellfun(@(B) cell2mat(textscan(B, '', 'CollectOutput', true)), blocks', 'uniform', 0);
At this point, ids contains a string vector of the names such as NEXT, and values is a cell array of numeric arrays, with each entry containing the numeric data corresponding.
9 Comments
Bertilla Raque
on 17 Dec 2019
Walter Roberson
on 17 Dec 2019
The empty quotes in that context mean that the patterns in the first {} are to be replaced with nothing -- that is, the action of the regexprep() is to find some patterns and delete them where it finds them.
I just now noticed your R2012a marking; I coded using functions that need R2016b or later. I will have to have another look at the code to figure out the best way to make it backwards compatible.
Bertilla Raque
on 17 Dec 2019
Walter Roberson
on 17 Dec 2019
all_ids = regexp(S, '^\S*', 'match', 'lineanchors');
noids = regexprep(S, '^\S*\s*', '', 'lineanchors');
mask = ~strcmp(all_ids(1:end-1), all_ids(2:end)); %is each the same as the adjacent?
stops = find([mask true]);
and at the end, ids would be a cell array of character vectors containing the names such as NEXT
Bertilla Raque
on 18 Dec 2019
Walter Roberson
on 18 Dec 2019
The "right" procedure for that is not to do things that way.
You have a variable number of columns, so you would be dynamically generating the variable names; if you had a fixed number of columns it could potentially make sense to assign columns to fixed variables.
As on-lookers, we see an input file with a variable number of columns, and we must assume that in your real files, there is the possibility that you might have 9 or more columns and thus need to assign automatically to variables a, b, c, d, e, f, g, h, i which would interfere with using i as the loop control variable for i
It is also not at all clear to me whether a would refer to all 8 values that your sample file shows in column 2, or only to the first 4 of them, or if you intend there to be some method of triggering going on to the next block, so that variable a would first refer to the 4 in the second column of the first block and then would refer to the 4 in the second column of the second block. What should d refer to when you are processing the first block, since there is no 4th column in the first block but there is a 4th column in the second block?
T(i)=[a(i) b(i) c(i)]
The right hand side there would be a vector of length 3, but the left hand side is a scalar. If we extend the code to
T(i,:) = [a(i) b(i) c(i)];
t(i) = T(i,:)/norm(T(i,:));
then the right hand side would be a vector of length 3 because T(i,:) would be a vector of length 3, but the left hand side names a scalar location. So you would have to extend further to
T(i,:) = [a(i) b(i) c(i)];
t(i,:) = T(i,:)/norm(T(i,:));
It is not clear from what you write whether you want to normalize only the first 3 numeric columns in a block, or if you want to normalize all numeric columns in a block.
And at the end, are you wanting your t to contain the normalized data of all of the blocks together in one array, or do you want the normalized data block by block?
The code would be easier if you were using a newer MATLAB release...
S = fileread('sample.txt');
S = regexprep(S, {'^\$.*?\n', '\r'}, {'', ''}, 'lineanchors');
all_ids = regexp(S, '^\S*', 'match', 'lineanchors');
noids = regexprep(S, '^\S*\s*', '', 'lineanchors');
mask = ~strcmp(all_ids(1:end-1), all_ids(2:end)); %is each the same as the adjacent?
stops = find([mask true]);
eol_pos = regexp(noids, '.$', 'lineanchors');
block_lengths = diff([0, eol_pos(stops)]);
blocks = mat2cell(noids, 1, block_lengths);
ids = all_ids(stops);
values = cellfun(@(B) cell2mat(textscan(B, '', 'CollectOutput', true)), blocks', 'uniform', 0);
normalized_blocks = cellfun(@(B) cell2mat(arrayfun(@(R) B(R,:)./norm(B(R,:)), (1:size(B,1)).', 'uniform', 0)), values, 'uniform', 0);
%now normalized_blocks is a cell array in which each row of the block has
%been normalized independently
Bertilla Raque
on 20 Dec 2019
Walter Roberson
on 20 Dec 2019
Again, why do you need to name the columns when you can just index instead?
Bertilla Raque
on 20 Dec 2019
Categories
Find more on String Parsing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!