I am doing some webscraping code and consequently, I am using regular expressions. I need to isolate the words from a string, of course html tags should not be included. Html tags are words included in < > (e.g.br). Unfortunately, my code does not work out and I am wondering why. Here an example:
My expected results is 'qu' but instead I get 'qu' and 'q'. The code works with this string 'quq'. What may I do to solve this issue?
The following code works regexp('quqa','(?!<)\w*(?!>)','match')