extracting numbers from strings
Info
This question is closed. Reopen it to edit or answer.
Show older comments
I am trying to read my experimental data from an excel file, where what is in one column decides that a number should be read in another, in multiple verses, in multiple files, and creates a double as an output. (I will omit the part of the code that loads the files)
the below code works well for numerical values.
NegativeView=[];
for nl=1:length(logs(:,1))
if (strfind(logs{nl,5},'negative_view_start'))==1 % jak
NegativeView(end+1)=str2num(logs{nl,6});
end;
that is if the stuff in column six is a number, I get what I wanted to get
however for another variable I have a mixed string, namely the value in a column that needs to be read will be output_33, output_66 etc, and I'd like to have a double with just 33 or 66 as a numerical value.
Tried using the regexprep function, to transform output_33 to 33 etc.; with no success. HELP
an example of what I tried is below:
rate={}
output=[]
for nl=1:length(logs(:,1))
if strcmp(logs{nl,4},'output_')
rate(end+1)=(regexprep(logs{nl,5},'output_',''))
output(end+1)=str2num(rate)
end;
Answers (2)
Guillaume
on 15 Dec 2016
Using regexprep seems roundabout. Why not use regexp to extract what you need rather than replacing what you don't need?
One possible regex:
output(end+1) = str2double(regexp(logs{nl, 5}, '(?<=output_).*', 'match', 'once'))
2 Comments
moniken
on 15 Dec 2016
Guillaume
on 15 Dec 2016
The regular expression language is well detailed in matlab's documentation and, if it's not enough, there are plenty of tutorials on the net.
(?<= ) is a lookbehind. It means that the match must be preceded by the expression in the lookbehind, in this case, output_
. is a match for all characters. * is a quantifier which means match 0 or more of the preceding character. Actually, I should have used + (1 ore more).
So the regular expression match a sequence of 0 or more of any character immediately following output_. There are many other ways you could have written the expression depending on what you want to accept/reject. E.g:
regexp(logs{nl, 5}, '\d+', 'match', 'once')
may also work for you if you're only looking at integer (it simply extracts any sequence of numeric digits.
As per the documentation of regexp, 'match' tells it to return the match (by default it just return the start position), and 'once' tells it to only do the matching once. It's not strictly necessary in your case.
This question is closed.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!