How to replace parts of the text using regexprep

6 views (last 30 days)
Hi all,
I have a very large text file which I imported as a char. vector which whas the following pattern:
text=NEW SCOMPONENT /JAFHB0099
DESC 'FLANGE F7805 SLIP-ON 10K FF 900A'
GTYP FLAN
PARA 900 1095 56 FBIA BWD 13
END
NEW SCOMPONENT /JAFHB00aa
DESC 'FLANGE F7805 SLIP-ON 10K FF 1100A'
GTYP FLAN
PARA 1100 1225 18 FBIA BWD 14
END
I want to replace the parts after DESC and PARA with some of my own values, e.g.
nDESC = {'Description 1'; 'Description 2'} ;
nPARA = {'1500 15300 20 FBDIA BWD 14' ; '1600 1623 20 FBDIA SWM 13'} ;
For the above, I have developed the following code, also with the help of the MATLAB community which let me know about the regexp function:
%Extracts what lies after the word PARA in the PARA line & Replaces it with the nPARA
newtext = regexprep(text, 'PARA\s+(\d+\.?\d*\s+\d+\.?\d*\s+\d+\.?\d*\s+\w*\s+w*\s+\d+\.?\d*)', nPARA) ;
I follow a similar logic for the case of the DESC.
However 2 problems occur.
1. The parenthesis after the \s+ and \w* for some reason do not capture the tokens only in the parenthesis after the PARA word, which instead of returning 1100 1225 18 FBIA BWD 14, I get PARA 1100 1225 18 FBIA BWD 14. However I can work around this so its not a big deal, I should be missing something out there.
2. The result that I get from the above does not replace each individual line with each string in the cell array, however it takes the last cell in the cell array and replaces every line with that cell.
  1 Comment
Stephen23
Stephen23 on 5 Nov 2018
Edited: Stephen23 on 5 Nov 2018
1) regexprep does not replace the tokens, it replaces the matched substring. So what you see is the expected behavior. Tokens are entirely optional, and can be used in dynamic operations. But the entire matched substring is replaced. You could resolve this using a look-around operation.

Sign in to comment.

Accepted Answer

Stephen23
Stephen23 on 5 Nov 2018
Edited: Stephen23 on 5 Nov 2018
This uses a slightly different approach using regexp and strncmp, which is based on the assumption that each command is on its own line. You did not supply an example file so I created one (attached).
>> nDESC = {'Description 1'; 'Description 2'};
>> nPARA = {'1500 15300 20 FBDIA BWD 14' ; '1600 1623 20 FBDIA SWM 13'};
>> S = fileread('temp1.txt')
S = NEW SCOMPONENT /JAFHB0099
DESC 'FLANGE F7805 SLIP-ON 10K FF 900A'
GTYP FLAN
PARA 900 1095 56 FBIA BWD 13
END
NEW SCOMPONENT /JAFHB00aa
DESC 'FLANGE F7805 SLIP-ON 10K FF 1100A'
GTYP FLAN
PARA 1100 1225 18 FBIA BWD 14
END
>> C = regexp(S,'^\s*([A-Z]+\s*)(.*)$','tokens','dotexceptnewline','lineanchors');
>> C = vertcat(C{:}).';
>> C(2,strncmp(C(1,:),'DESC',4)) = nDESC;
>> C(2,strncmp(C(1,:),'PARA',4)) = nPARA;
>> Z = sprintf('\n%s%s',C{:});
>> Z = Z(2:end)
Z = NEW SCOMPONENT /JAFHB0099
DESC Description 1
GTYP FLAN
PARA 1500 15300 20 FBDIA BWD 14
END
NEW SCOMPONENT /JAFHB00aa
DESC Description 2
GTYP FLAN
PARA 1600 1623 20 FBDIA SWM 13
END

More Answers (0)

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!