How can I extract a words with separate letters from a text?

2 views (last 30 days)
I have some ill-structured files like this:
file1: o n l i n e U N I V E R S I T Y Obtain Diploma.
file2: Howdy There C E R T I F I E D U N I V E R S I T Y D I P L O M A S and d e g r e sWouldn't it be great to get a Masters degree.
I would like to extract just the words with spaces in this case I will get:
f1: 'o n l i n e' 'U N I V E R S I T Y'
f2:'C E R T I F I E D U N I V E R S I T Y D I P L O M A S' 'd e g r e s'
I tried with regular expressions but it didn't work for me.
thanks for any help
  2 Comments
Image Analyst
Image Analyst on 8 Apr 2014
Edited: Image Analyst on 8 Apr 2014
For f2, why is the final "s" to be extracted as part of 'd e g r e s', and not considered as part of sWouldn't? I would think that it should not be included because it's attached to sWouldn't which is a word without spaces and thus, not to be extracted. Please clarify.
Nadjate
Nadjate on 9 Apr 2014
because if I want to preprocess the text with the correct english words it will be like:
f2: Howdy There certified university diplomas and degrees Wouldn't it be great to get a Masters degree.
so the letter s belong to the word 'd e g r e s' and not considered as part of 'sWouldn't'
thanks for your comment

Sign in to comment.

Accepted Answer

Azzi Abdelmalek
Azzi Abdelmalek on 8 Apr 2014
Edited: Azzi Abdelmalek on 8 Apr 2014
file1=' o n l i n e U N I V E R S I T Y Obtain Diploma.'
f1=regexp(file1,'(?<=\s)(\w\s)+','match')
f1(cellfun(@numel,f1)==2)=[]
  2 Comments
Nadjate
Nadjate on 9 Apr 2014
your answer is working for the first file. thank you.
but any suggestion for the second file??????
Azzi Abdelmalek
Azzi Abdelmalek on 9 Apr 2014
This also works
file1=' Howdy There C E R T I F I E D U N I V E R S I T Y D I P L O M A S and d e g r e sWouldn''t it be great to get a Masters degree.'
f1=regexp(file1,'(?<=\s)(\w\s)+','match');
f1(cellfun(@numel,f1)==2)=[];
celldisp(f1)

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!