MATLAB Answers

Amit
1

OCR low-res image from specified set of characters

Asked by Amit
on 8 Feb 2016
Latest activity Answered by Amit
on 21 Feb 2016
Hello all:
Is there a way in which I provide the super set of characters (no characters to be expected outside the set) to the OCR. I mean to say for example if I know that my images has only [U, T, C, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, :, space] as the only characters, can I feed it to matlab OCR and thus expect better predictions.
Can this help somehow on low res image to increase the accuracy?
Kindly see the attached image I am struggling to get good results with.
Thanks much for your attention. Any thoughts will be immensely helpful.
Regards,
Amit

  0 Comments

Sign in to comment.

3 Answers

Answer by Image Analyst
on 8 Feb 2016
 Accepted Answer

For something like that you can probably just crop out each character and find the area. Then, assuming each character has a unique area, have a look up table where a certain area means the character must be a certain character.

  2 Comments

Ingenious! So yo mean using functions like 'regionprops'?
Thanks @Image Analyst. Such an approach might be helpful for many other related things though I am just hoping for a simpler solution for this one particular case.
Please let me know if you have functions other than 'regionprops' in mind.
Regards,
Amit
Right. Something like
measurements = regionprops(labeledImage, 'Area');
allAreas = [measurements.Area];
% Define character areas
characterAreas = [300,410,130,500,........] % Whatever they are.
for k = 1 : length(allAreas)
differences = abs(measurements(k).Area - characterAreas);
[~, closestCharacterIndex] = min(differences);
% Now you know what character it is....
end

Sign in to comment.


Answer by Anand
on 8 Feb 2016
Edited by Anand
on 8 Feb 2016

This exact functionality is available in the ocr function. Use the 'CharacterSet' Name-Value pair to achieve this. Something like this:
ocrResults = ocr(yourImage,'CharacterSet','UTC1234567890')
In your case, you may have even more information that you can supply to the ocr function. If it's a valid assumption that the left half of the image only contains characters, you could have two calls to ocr with different ROI's.
For example (this is pseudo-code),
leftROI = [1 1 floor(size(im,2)/2) size(im,1)-1];
ocrLeftResults = ocr(im, leftROI, 'CharacterSet','UTC');
rightROI = [floor(size(im,2)/2)+1 1 floor(size(im,2)/2) size(im,1)-1];
ocrRightResults = ocr(im, rightROI, 'CharacterSet', '0123456789');

  5 Comments

Great. And special characters ': #' etc. are not a problem? How about the french and German accented characters 'àâæçéèêëïîôœùûüÿ' etc.?
Thank you very much.
Amit, that should be perfectly fine. There is a list of supported languages which you can see here.
You need to add the Name-Value pair 'Language'.
Hello Anand, thank you. Its very helpful, though the low resolution of my image I guess is resulting in very bad quality of predictions.
Any thoughts.
Thanks again indeed.

Sign in to comment.


Answer by Amit
on 21 Feb 2016

Dear all:
Opening the question again. I have 2864 files such as one attached. I have not been able to find anything, MATLAB or otherwise that works reliably to give me the OCR out. That was my Sunday.
Any of your kind suggestions/directions will be immensely helpful.
Thanks much.
Amit

  0 Comments

Sign in to comment.