Removing specific characters from string in nested cells
Show older comments
I have a series of strings which are contained within a nested cell array (because regexp loves to nest cells), and I would like to remove any non numeric or white space characters from them so that I can convert them to doubles, namely astrick.
I'm looking for the least painful way of removing any of these special characters from all strings. I do not have a sample file to attach, sorry, but I have dictated the shape of a sample array below.
X == 1x1 cell
X{1} == 1x1 cell (because regexp can't help itself apparently)
X{1}{1} = {'1234., ';'12.,* ';'1234., ','123.,* ',' 321.,* '};
12 Comments
Stephen23
on 13 Jun 2018
@ Bob Nbob: is this related to your earlier question?:
If so, it would probably be easier to fix the regular expression. Please upload a sample file that you want to get the data from.
Why not just
x = {'1234., ';'12.,* ';'1234., ';'123.,* ';' 321.,* '};
x = regexprep(x,'[^\d]','');
?
As mentioned by Stephen, it's probably easier to fix the regex used in your earlier question. I left a comment there.
Bob Thompson
on 13 Jun 2018
Not the prettiest but does the job, try this:
[tokens,matches]=regexp(yourtext,'(COLUMN[1,3]=\s*)(\d*.?\d*)(?:\,\s*)(\d*.\d*)(?:\,\s*)(\d*.\d*)(?:\,\s*)(\d*.\d*)(?:\*?\,\s*)(\d*.\d*)(?:\*?\,\s*)(\d*.\d*)(?:\*?\,\s)','tokens','match');
tokens{1}:
1×7 cell array
{'COLUMN1= '} {'1.12'} {'2.23'} {'3.34'} {'4.45'} {'5.56'} {'6.67'}
tokens{2}:
1×7 cell array
{'COLUMN3= '} {'1.23'} {'0.34'} {'3.45'} {'5.78'} {'6.54'} {'8.23'}
Bob Thompson
on 13 Jun 2018
OCDER
on 13 Jun 2018
Would something like this work?
Str = 'COLUMN3= 1.23, 0.34, 3.45, 5.78*, 6.54*, 8.23, 2, -3., 24.*';
EqIdx = find(Str == '=', 1);
if ~isempty(EqIdx)
Num = str2double(regexp(Str(EqIdx+1:end), '\-?\d+\.?\d*', 'match'));
end
Bob Thompson
on 13 Jun 2018
OCDER
on 13 Jun 2018
Might need more information of the start-to-end issue you're having. How are you reading in the text file? With fileread or fgetl or textscan? If you use fgetl or textscan, then you can get each row of text and then get the one you want. If you're using fileread, then it's much harder.
FID = fopen('textfile.txt');
TXT = textscan(FID, '%s', 'Delimiter', '\n');
TXT = TXT{1};
fclose(FID);
Num = cell(size(TXT));
for f = 1:length(TXT)
Str = TXT{f};
if contains(Str, 'CONTAINS=') %Specify condition for line you want here
EqIdx = find(Str == '=', 1); %Example, you want values after "="???
Num{f} = str2double(regexp(Str(EqIdx+1:end), '\-?\d+\.?\d*', 'match'));
end
end
Bob Thompson
on 14 Jun 2018
Edited: Bob Thompson
on 14 Jun 2018
"I'm not really sure what the ':' from Paolo's comment is supposed to do, I don't see it anywhere in the regexp documentation..."
Open the documentation, then use ctrl+f to search the webpage for ?:
Bob Thompson
on 15 Jun 2018
Stephen23
on 15 Jun 2018
@Bob Nbob: you are right, it does not appear in the Mfile help. I notice that many other useful regular expression features also do not appear in the Mfile help: notably missing are dynamic expressions, lookaround operators, and named capture.
Both the inbuilt help and the page I linked to give a very useful introduction, and explain all features of regular expressions in MATLAB:
doc regexp
doc('Regular Expressions')
Accepted Answer
More Answers (1)
George Abrahams
on 30 Dec 2022
The others are right to fix the root problem causing the tricky nested cell array. Having said that, for future reference, my deepreplace function on File Exchange / GitHub would have done exactly what you requested.
x = {{{'1234., ';'12.,* ';'1234., ';'123.,* ';' 321.,* '}}};
% Remove any character except for digits (0-9) and period (.)
match = regexpPattern('[^\d.]');
x = deepreplace(x,match,'');
% x = 1×1 cell array
% {1×1 cell}
% x{1} = 1×1 cell array
% {5×1 cell}
% x{1}{1} = 5×1 cell array
% {'1234.'}
% {'12.' }
% {'1234.'}
% {'12310'}
% {'321.' }
Categories
Find more on Characters and Strings in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!