identify words from a sentence

2 views (last 30 days)
The image in the below link is a sentence.
Is it possible to divide the sentence into words..... that is I want to draw a box around the words and display each word separately.... Please can someone help me.... How to identify words from an image..... Please do reply....

Accepted Answer

Image Analyst
Image Analyst on 26 Mar 2013
Edited: Image Analyst on 26 Mar 2013
I gave code for that here: http://www.mathworks.com/matlabcentral/answers/67860#answer_79275. Try to adapt it.
Try thresholding, then dilating to connect letters in the same word. Then call regionprops() to get the BoundingBox of each word. That's what I did in that link. Try to adapt that code. Post your code back here as a comment below this answer if you have trouble with it. That code will get you sub-images that are the chunks cropped from the original image. But if you want a string of ASCII text that says " A MOVE to stop Mr. Gaihhell from urinating" then you'll have to use OCR and that is a lot more complicated, especially for handwriting as bad as what you've shown.
  4 Comments
Image Analyst
Image Analyst on 26 Mar 2013
Are you saying that the thresholding, dilating, and regionprops algorithm does not work for those images? Because I see no reason why it should fail for the images you posted. Did you actually run the code on them?
Walter Roberson
Walter Roberson on 26 Mar 2013
I think the word might be "nominating", but it is difficult to tell.

Sign in to comment.

More Answers (1)

Walter Roberson
Walter Roberson on 26 Mar 2013
imdilate(). regionprops() to find the resulting bounding boxes.
Or alternately, regionprops() to find bounding boxes. Merge any areas whose bounding boxes touch or overlap. Now, find the distances between bounding boxes. You will find that they have an uneven distribution, small distances between adjacent letters, larger distances between words. Merge the areas that are only a small distance apart. You might want to use a ratio of the size of the existing bounding boxes to help determine what "small distance" means.
Strings such as
'...'
could give you trouble, though.
  5 Comments
Image Analyst
Image Analyst on 27 Mar 2013
Calculate the area and centroid of all blobs. If the area is about the size of a dot, and it's fairly round, then combine the bounding box or mask of that dot with the closest word. In pseudocode
if area < largestDotArea
% It's a dot
if itIsCircular
for 1 to allOtherBlobs
distance = hypot(centroid1, centroid2)
if distance < mergingDistance
% Merge bounding boxes
newBoundingBox = f(wordBoundingBox, dotBoundingBox)
break;
end
end
end
Do that in a loop over all blobs to check whether it is a dot and thus needs to be combined with the closest word. Circularity is the Perimeter^2/(4*pi*area).

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!