Using Regexp to extract complete addresses

1 view (last 30 days)
I'm trying to extract the addresses from a large pdf. Here is a screenshot of the pdf:
Here is the code I'm using:
str = extractFileText("document_name.pdf");
expression = '\d{1,8}\s\w*\s\w*\n';
startIndex = regexp(str,expression,'match');
However, this code only extracts addresses that begin with 1-8 digits, then have a space, then some letters, then a space, then more letters, then a new line.
As you can see in the screenshot, not all addresses are in this format. Some start with numbers, a space, then one word, then a new line, some have numbers then several words, then a new line, etc. How can I extract every full address?

Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!