Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

To resolve issues starting MATLAB on Mac OS X 10.10 (Yosemite) visit: http://www.mathworks.com/matlabcentral/answers/159016

identify words from a sentence

Asked by Elysi Cochin on 26 Mar 2013

The image in the below link is a sentence.

http://img201.imageshack.us/img201/2552/96690278.png

Is it possible to divide the sentence into words..... that is I want to draw a box around the words and display each word separately.... Please can someone help me.... How to identify words from an image..... Please do reply....

0 Comments

Elysi Cochin

Products

No products are associated with this question.

2 Answers

Answer by Image Analyst on 26 Mar 2013
Edited by Image Analyst on 26 Mar 2013
Accepted answer

I gave code for that here: http://www.mathworks.com/matlabcentral/answers/67860#answer_79275. Try to adapt it.

Try thresholding, then dilating to connect letters in the same word. Then call regionprops() to get the BoundingBox of each word. That's what I did in that link. Try to adapt that code. Post your code back here as a comment below this answer if you have trouble with it. That code will get you sub-images that are the chunks cropped from the original image. But if you want a string of ASCII text that says " A MOVE to stop Mr. Gaihhell from urinating" then you'll have to use OCR and that is a lot more complicated, especially for handwriting as bad as what you've shown.

4 Comments

Elysi Cochin on 26 Mar 2013

sir for few of my images it is showing extra words with the code in the above link.... how to resolve the problem with such images..... please do reply sir....

http://img11.imageshack.us/img11/4058/38318663.png

http://img201.imageshack.us/img201/2552/96690278.png

Image Analyst on 26 Mar 2013

Are you saying that the thresholding, dilating, and regionprops algorithm does not work for those images? Because I see no reason why it should fail for the images you posted. Did you actually run the code on them?

Walter Roberson on 26 Mar 2013

I think the word might be "nominating", but it is difficult to tell.

Image Analyst
Answer by Walter Roberson on 26 Mar 2013

imdilate(). regionprops() to find the resulting bounding boxes.

Or alternately, regionprops() to find bounding boxes. Merge any areas whose bounding boxes touch or overlap. Now, find the distances between bounding boxes. You will find that they have an uneven distribution, small distances between adjacent letters, larger distances between words. Merge the areas that are only a small distance apart. You might want to use a ratio of the size of the existing bounding boxes to help determine what "small distance" means.

Strings such as

'...'

could give you trouble, though.

5 Comments

Elysi Cochin on 27 Mar 2013
 % Dilate to connect all the letters
 binaryImage = imdilate(binaryImage, true(7));

sir here instead of 7 i gave 15.... all the words are coming correctly but the dot of i and all are coming as a word... how to remove those..... please do reply sir....

Image Analyst on 27 Mar 2013

Calculate the area and centroid of all blobs. If the area is about the size of a dot, and it's fairly round, then combine the bounding box or mask of that dot with the closest word. In pseudocode

if area < largestDotArea
   % It's a dot
   if itIsCircular
      for 1 to allOtherBlobs
        distance = hypot(centroid1, centroid2)
        if distance < mergingDistance
          % Merge bounding boxes
          newBoundingBox = f(wordBoundingBox, dotBoundingBox)
          break;
        end
   end
end

Do that in a loop over all blobs to check whether it is a dot and thus needs to be combined with the closest word. Circularity is the Perimeter^2/(4*pi*area).

Elysi Cochin on 28 Mar 2013

thank u sir....

Walter Roberson

Contact us