Using Regexp to extract complete addresses
1 view (last 30 days)
Show older comments
I'm trying to extract the addresses from a large pdf. Here is a screenshot of the pdf:
Here is the code I'm using:
str = extractFileText("document_name.pdf");
expression = '\d{1,8}\s\w*\s\w*\n';
startIndex = regexp(str,expression,'match');
However, this code only extracts addresses that begin with 1-8 digits, then have a space, then some letters, then a space, then more letters, then a new line.
As you can see in the screenshot, not all addresses are in this format. Some start with numbers, a space, then one word, then a new line, some have numbers then several words, then a new line, etc. How can I extract every full address?
0 Comments
Answers (0)
See Also
Categories
Find more on String Parsing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!