Read data from .txt with regexp

I want to load numeric data from a .txt file, which contains both string a numeric and contains non standard arranging. I am attaching an example file for you to see.
The data is divided in sections, but all the numeric data I am interested in have the same structure: '->' + 'some string' + ':' + 'numeric value with format %.4f' (Lines in the .txt not containing the desired expression should not be considered).
I tested different manners of doing so, with readfile, testscan etc., however due to the complexitu of the input file I could not reach my objective.
I think the most appropiate way to try to solve the problem is by using regexp and properly telling matlab which is the expression it has to look for in the .txt file.
I am not familarised with regexp and cannot properly understand how to code the proper expression for me to work, and would greatly appreaicate any help with that.
Thank you very much!
============================================================================
| DATA VALUES FOR INPUT VARIABLES |
============================================================================
Model: Example
(data file produced by on 29-Mar-2022 17:30:14)
============================================================================
FIRST DATA SECTION:
============================================================================
-> z [val]: 13.0000
-> m [in mm]: 3.0000
-> Type [str]: Ball
-> c_s [valval]: 0 0 0
----------------------------------------------------------------------------
SECOND DATA SECTION:
============================================================================
-> r_b [strandnumber]: C50
-> s0_r [in mm]: 0.0000
-> I_r [val]: 15.0000
-> Only text line
----------------------------------------------------------------------------
THIRD DATA SECTION:
============================================================================
-> Values found = 0.
----------------------------------------------------------------------------
FOURTH DATA SECTION:
============================================================================
-> n_1 [val_1]: 1.0000

5 Comments

Jan
Jan on 29 Mar 2022
Edited: Jan on 29 Mar 2022
The person, who has invented this file format, hates programmers. It looks smart with the pile of horizontal lines, but it is as hard to read as mud.
What is the desired output?
Dou you have a documentation, which explains the magic keywords: val, valval, val_1, str, in mm.
I detest violence. Please give the inventor of this format a respectful hint, that there are standard output formats as XML, which avoids the troubles you have.
@Jan I guess you are right, and this .txt file is not surely the most apropiate for programmers. However, I cannot change that, but I will keep in mind the problems it generates when it will be me, the one coding this type of files.
[val, valval, val_1, str, in mm] are just dummy text. Descriptions were deleted for confidencial reasons.
We can be glad, that Stephen used his artistic power to solve the problem!
Yes that was really really fantastic!!
For basic data like this, even JSON should be preferred. Just about every half-decent programming language can read it. Composing matrices is a bit tricky, but as long as you stick to scalars or vectors everything should be fine.

Sign in to comment.

 Accepted Answer

rgx = '^\s*->\s*(\w+)\s*\[[\w\s]+\]:\s*([^\r\n]+)';
str = fileread('ExampleTextFile.txt');
tkn = regexp(str,rgx,'tokens','lineanchors');
tkn = vertcat(tkn{:})
tkn = 8×2 cell array
{'z' } {'13.0000'} {'m' } {'3.0000' } {'Type'} {'Ball' } {'c_s' } {'0 0 0' } {'r_b' } {'C50' } {'s0_r'} {'0.0000' } {'I_r' } {'15.0000'} {'n_1' } {'1.0000' }
vec = str2double(tkn(:,2));
idx = ~isnan(vec);
tkn(idx,2) = num2cell(vec(idx))
tkn = 8×2 cell array
{'z' } {[ 13]} {'m' } {[ 3]} {'Type'} {'Ball' } {'c_s' } {'0 0 0'} {'r_b' } {'C50' } {'s0_r'} {[ 0]} {'I_r' } {[ 15]} {'n_1' } {[ 1]}
out = cell2struct(tkn(:,2),tkn(:,1),1)
out = struct with fields:
z: 13 m: 3 Type: 'Ball' c_s: '0 0 0' r_b: 'C50' s0_r: 0 I_r: 15 n_1: 1

4 Comments

Very nice, @Stephen! You might want to also convert the c_s field to double like this:
out.c_s = sscanf(out.c_s, '%f')
At least it appears to me like this field was meant to be numeric.
Aurea94
Aurea94 on 29 Mar 2022
Edited: Aurea94 on 29 Mar 2022
@Stephen this is just increible! Thank you soo much! It works perfectly for my file.
You saved me hours of work!!!
@Stephen; Can I get your email. I need a favour from you.
I expect your chances would be much better if you posted your request as a separate question and post the link here. People tend to be protective of their inbox.

Sign in to comment.

More Answers (0)

Categories

Products

Release

R2022a

Asked:

on 29 Mar 2022

Commented:

Rik
on 18 Apr 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!