MATLAB Answers

How to set regexp so that it stops to the first istance?

4 views (last 30 days)
Hi all,
I need to extract the urls from the following html code and I am using regexp.
a='<option value="http://www.tractordata.com/farm-tractors/003/9/0/3906-massey-ferguson-7465-transmission.html">2004-2007</option><option value="http://www.tractordata.com/farm-tractors/006/7/0/6706-massey-ferguson-7465-transmission.html" selected>2008-2012</option></select></form></td></tr>';
urls=regexp(a,'(?<=option value.*)http.*html','match');
and the result is:
http://www.tractordata.com/farm-tractors/003/9/0/3906-massey-ferguson-7465-transmission.html">2004-2007</option><option value="http://www.tractordata.com/farm-tractors/006/7/0/6706-massey-ferguson-7465-transmission.html
As you can see the sting extract a string which respects the pattern but it includes two different urls. I need the two following results:
http://www.tractordata.com/farm-tractors/003/9/0/3906-massey-ferguson-7465-transmission.html
http://www.tractordata.com/farm-tractors/006/7/0/6706-massey-ferguson-7465-transmission.html
How may I fix this problem?
Thanks
Pietro

  0 Comments

Sign in to comment.

Accepted Answer

Stephen Cobeldick
Stephen Cobeldick on 14 Jun 2017
Edited: Stephen Cobeldick on 14 Jun 2017
You could use a lazy quantifier ? (explained in the regular expression documentation):
>> urls = regexp(a,'(?<=option value.*)http.*?\.html','match');
>> urls{:}
ans =
http://www.tractordata.com/farm-tractors/003/9/0/3906-massey-ferguson-7465-transmission.html
ans =
http://www.tractordata.com/farm-tractors/006/7/0/6706-massey-ferguson-7465-transmission.html
A more robust method would be to not match " characters:
>> urls = regexp(a,'(?<=option value=")[^"]+\.html','match');
If you want to experiment with regular expressions then you might like to try my Interactive Regular Expression Tool, which shows the outputs of regexp as your type the parse and match strings. You can download it here:

  0 Comments

Sign in to comment.

More Answers (0)

Sign in to answer this question.

Products