MATLAB Answers


identify words from a sentence

The image in the below link is a sentence.

Is it possible to divide the sentence into words..... that is I want to draw a box around the words and display each word separately.... Please can someone help me.... How to identify words from an image..... Please do reply....


Log in to comment.

2 Answers

Answer by Image Analyst
on 26 Mar 2013
Edited by Image Analyst
on 26 Mar 2013
 Accepted Answer

I gave code for that here: Try to adapt it.

Try thresholding, then dilating to connect letters in the same word. Then call regionprops() to get the BoundingBox of each word. That's what I did in that link. Try to adapt that code. Post your code back here as a comment below this answer if you have trouble with it. That code will get you sub-images that are the chunks cropped from the original image. But if you want a string of ASCII text that says " A MOVE to stop Mr. Gaihhell from urinating" then you'll have to use OCR and that is a lot more complicated, especially for handwriting as bad as what you've shown.


Show 1 older comment

sir for few of my images it is showing extra words with the code in the above link.... how to resolve the problem with such images..... please do reply sir....

Are you saying that the thresholding, dilating, and regionprops algorithm does not work for those images? Because I see no reason why it should fail for the images you posted. Did you actually run the code on them?

I think the word might be "nominating", but it is difficult to tell.

Log in to comment.

Answer by Walter Roberson
on 26 Mar 2013

imdilate(). regionprops() to find the resulting bounding boxes.

Or alternately, regionprops() to find bounding boxes. Merge any areas whose bounding boxes touch or overlap. Now, find the distances between bounding boxes. You will find that they have an uneven distribution, small distances between adjacent letters, larger distances between words. Merge the areas that are only a small distance apart. You might want to use a ratio of the size of the existing bounding boxes to help determine what "small distance" means.

Strings such as


could give you trouble, though.


 % Dilate to connect all the letters
 binaryImage = imdilate(binaryImage, true(7));

sir here instead of 7 i gave 15.... all the words are coming correctly but the dot of i and all are coming as a word... how to remove those..... please do reply sir....

Calculate the area and centroid of all blobs. If the area is about the size of a dot, and it's fairly round, then combine the bounding box or mask of that dot with the closest word. In pseudocode

if area < largestDotArea
   % It's a dot
   if itIsCircular
      for 1 to allOtherBlobs
        distance = hypot(centroid1, centroid2)
        if distance < mergingDistance
          % Merge bounding boxes
          newBoundingBox = f(wordBoundingBox, dotBoundingBox)

Do that in a loop over all blobs to check whether it is a dot and thus needs to be combined with the closest word. Circularity is the Perimeter^2/(4*pi*area).

Log in to comment.

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today