There is a really simple yet robust tool for extracting highlights and notes from your pdf-files available at: http://www.sumnotes.net . Not only it supports various advanced features like selective extraction or predictive extraction, but it also allows you to save extracted highlights into TXT or DOC files. All desktop browsers and operating systems are supported. We are in cloud, so no installation is needed. And yes, it is for free. Try it out.
Nice work. It would be better if you can handle the java warnings. For example, you have "pdfdoc" variable defined for different tasks. You should use different variables. Also, you need to close the java object in your demo.
The author even notes it does not work inside the m-file!
java.lang.Throwable: Warning: You did not close the PDF Document
at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
at java.lang.ref.Finalizer.runFinalizer(Unknown Source)
at java.lang.ref.Finalizer.access$100(Unknown Source)
at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)
Apparently, pdfbox creates two objects on instantiation and loading respectively. I got rid of the warning mentioned in the code by using "pdfdoc = org.apache.pdfbox.pdmodel.PDDocument.load(filename)" directly (also, .apache added since newer releases of pdfbox).
Also, after the pdfdoc variable is created (inside a try..catch), "pdfdoc.close()" must also be called.
10 May 2008
I am lucky, I guess. Worked ok, except the warning that Dimitri mentions.
05 May 2008
See also this submission: