How to Read PDF file in Matlab?
Show older comments
I want to read pdf file and make some changes in it and then save them in excel.... I have tried my best but fail every time....Need your help....Any effort will be greatly appreciated..Thanks in advance.....
20 Comments
Geoff Hayes
on 16 Aug 2014
What kind of changes do you want to make to the PDF that you wish to then save to Excel? What is the code that you have written so far?
azizullah khan
on 16 Aug 2014
azizullah khan
on 25 Aug 2014
Geoff Hayes
on 25 Aug 2014
azizullah - I noticed that you looked at Dimitri Shvorob's extract text from PDF on the MATLAB File Exchange, but you had some problems with it. Did you download the two libraries that are needed for this submission, and modify the pdfParseDemo.m file as per the author's instructions?
One of the comments in the above submission indicates that there is a utility called pdftotext that you may be able to call from within the MATLAB code. Have you looked in to this?
José-Luis
on 25 Aug 2014
What is your goal with this? It might be that Matlab is not the best tool for this.
azizullah khan
on 25 Aug 2014
azizullah khan
on 25 Aug 2014
Geoff Hayes
on 25 Aug 2014
Is there just one PDF file, or several? What data in particular are you looking for in the pdf - a table of numeric data, some text, or ..?
José-Luis
on 25 Aug 2014
Why go through Matlab at all? Use Excel directly. A quick google search will tell you how to import pdf's to Excel.
azizullah khan
on 25 Aug 2014
Geoff Hayes
on 25 Aug 2014
Have you considered using pdftotext? Or any other converter, to HTML for example? Supposing that you are able to convert the file to text, what would you be looking in it for? Is there just one page of data that you need or one line from each page or..?
You might want to provide an example of a PDF that you wish to extract data from, and indicate which data in the file you want.
Jan
on 26 Aug 2014
@azizullah khan: You wrote "but pdfParsedemo makes a problem with me...". Please explain the problems. Your question is much to vague to be answered efficiently.
azizullah khan
on 26 Aug 2014
Edited: Walter Roberson
on 25 May 2015
azizullah khan
on 26 Aug 2014
Geoff Hayes
on 26 Aug 2014
Azizullah - you did not include an attachment.
As for the error, the AFMParser is part of the FontBox library. Did you add the FontBox jar file path to your Java class path? I looked at the pdfParsedemo.m script, and while it doesn't have a command to do so, you probably should. So if you updated
javaaddpath('M:\My Documents\MATLAB\PDF Exercise\PDFBox-0.7.3\lib\PDFBox-0.7.3.jar')
to the path on your workstation that corresponds to PDFBox-0.7.3.jar (or whatever the jar file is), then you should add an equivalent statement for the FontBox
javaaddpath('whateverYourPathIsTo\FontBox-someVersionIds.jar')
(I don't know what the name of the jar is, so FontBox-someVersionIds.jar is just an example.)
azizullah khan
on 27 Aug 2014
Geoff Hayes
on 27 Aug 2014
Unfortunately, this is not something that I have considered and so am not aware of any other means of reading the pdf into MATLAB. You could always try the pdftotext program.
I am no expert but could not find a way to read a pdf file to Matlab. People talk here a bout text, but pdf is usually a series of pics. I go to professional adobe reader and export the pages of the pdf document either by file/save as or by Advanced/Export. This produces a png or jpeg file for each page of the document. From there it is easy in Matlab - loop over the pages with the imread function.
Walter Roberson
on 15 Jun 2016
pdf is effectively a programming language; you need to execute the commands in order to determine what the output is.
Stefanie Schwarz
on 5 Jan 2021
Following up with Naftali's comment, there is also a way to convert a PDF to an image file in MATLAB. See: https://www.mathworks.com/matlabcentral/answers/709623-how-can-i-convert-a-scanned-pdf-to-an-image-using-matlab
Accepted Answer
More Answers (1)
sugga singh
on 28 Feb 2021
0 votes
thanks for it.
Categories
Find more on Spreadsheets in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!