How can I extract images from a PDF using MATLAB?
24 views (last 30 days)
Show older comments
MathWorks Support Team
on 11 Jan 2021
Answered: MathWorks Support Team
on 11 Jan 2021
I would like to extract embedded images from a native PDF file using MATLAB. How can I do this?
Accepted Answer
MathWorks Support Team
on 11 Jan 2021
MATLAB ships with the Apache PDFBox Java library which allows importing and processing PDF files. Use the following MATLAB function extractImagePDF() to extract images from a native PDF and save them as JPG files:
function extractImagePDF(pdfFile)
import java.io.*
import javax.imageio.ImageIO.*
import org.apache.pdfbox.*
filename = fullfile(pwd,pdfFile);
jFile = File(filename);
document = pdmodel.PDDocument.load(jFile);
catalog = document.getDocumentCatalog();
pages = catalog.getPages();
iter = pages.iterator();
% look for image objects on each page of the PDF
while (iter.hasNext())
page = iter.next();
resources = page.getResources();
pageImages = resources.getXObjectNames;
if ~isempty(pageImages)
imageIter = pageImages.iterator();
i = 1;
% extract each image object from page and write to destination folder
while (imageIter.hasNext())
key = imageIter.next();
if (resources.isImageXObject(key))
xObject = resources.getXObject(key);
img = xObject.getImage();
outputfile = File("Img_"+i +".jpg");
write(img, "jpg", outputfile);
end
i = i+1;
end
end
end
document.close();
Note that the above code will not work for scanned PDF files.
0 Comments
More Answers (0)
See Also
Categories
Find more on Environment and Settings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!