I will upload my spin on using tesseract and screen capture for OCR data harvesting very soon.
Watch this Author's files
Amateur work, both code and concepts. Not robust. Cant support fonts, noise or files different from what the up-loader provides. But great as a concepts demo for undergrad-and-below classes.