High speed OCR and parallel processing

Question

0 votes

Hello,

I've written a number of programs to read tabular data in images using Matlab's ocr function. I have cleaned up the image files before using OCR (binarize, etc.). However it is taking about ~4 secs per 100 rows of single column data. Unfortunately I have hundreds of thousands of rows to work with so I need a way to speed this up. Using ROI or cropping the image into individual table cells didn't make much difference. Can someone help by pointing out some options?

Is there a way to make OCR run faster?
I have seen some documentation on parallel processing and was wondering if that could help. My computer has 4 cores. Should I explore the following?

hyperthreading
increase number of workers more than the number of cores
increase number of threads per worker.

In essence I'm looking to split the hundreds of image files to be processed separately and want to maximise the speed.

Thank you.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Walter Roberson on 29 Jan 2017

0 votes

To get ocr() to maybe run faster you would need to train a custom network. This assumes that fonts and handwriting sloppiness are more restricted for your situation (eg. one font of one size) ; if you have a general written OCR problem then training your own network is not likely to speed anything up.

The general task of OCR could, I suspect, be done faster using different algorithms. I say that thinking about the speed of the automatic mail sorters. On the other hand those do not have to deal with hundreds of rows.

You need to profile your code. Hyperthreading is an advantage if you are waiting on I/O. If you are busy with computations then Hyperthreading can slow things down. Assigning more threads than cores or more workers than cores leads to contention for resources unless they typically spend a lot of time waiting for interrupts.

parfor and SPMD are not always more productive. They are most effective for low IO high computation where the matrices involved are small or moderate and you do not do extensive tasks such as eigenvalues or \ operation. With larger matrices and vectorized code especially code that does linear algebra then you would typically get better performance leaving it not explicitly parallel so that it can use the multithreaded high performance libraries (those have much lower overhead than creating workers)

1 Comment
Show -1 older comments Hide -1 older comments

Azura Hashim on 2 Feb 2017

Thanks for your help!

Sign in to comment.

High speed OCR and parallel processing

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

1 Comment
Show -1 older comments Hide -1 older comments

More Answers (0)

Categories

Products

Tags

Community Treasure Hunt

High speed OCR and parallel processing

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

1 Comment Show -1 older comments Hide -1 older comments

More Answers (0)

Categories

Products

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments