How can I extract images from a PDF using MATLAB?

Question

MathWorks Support Team on 11 Jan 2021

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/714253-how-can-i-extract-images-from-a-pdf-using-matlab

Answered: MathWorks Support Team on 11 Jan 2021

Accepted Answer: MathWorks Support Team

I would like to extract embedded images from a native PDF file using MATLAB. How can I do this?

Sign in to answer this question.

Answer 1

MathWorks Support Team on 11 Jan 2021

2
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/714253-how-can-i-extract-images-from-a-pdf-using-matlab#answer_595583

extractImagePDF.m

MATLAB ships with the Apache PDFBox Java library which allows importing and processing PDF files. Use the following MATLAB function extractImagePDF() to extract images from a native PDF and save them as JPG files:

function extractImagePDF(pdfFile)
import java.io.*
import javax.imageio.ImageIO.*
import org.apache.pdfbox.*
filename = fullfile(pwd,pdfFile);
jFile = File(filename);
document = pdmodel.PDDocument.load(jFile);
catalog = document.getDocumentCatalog();
pages = catalog.getPages();
 
iter = pages.iterator();
% look for image objects on each page of the PDF
while (iter.hasNext())  
    page = iter.next();
    resources = page.getResources();
    pageImages = resources.getXObjectNames;
    if ~isempty(pageImages)
        imageIter = pageImages.iterator();
        i = 1;
        % extract each image object from page and write to destination folder
        while (imageIter.hasNext())
            key = imageIter.next();
            if (resources.isImageXObject(key))
                xObject = resources.getXObject(key);            
                img = xObject.getImage();
                outputfile = File("Img_"+i +".jpg");
                write(img, "jpg", outputfile);
            end
            i = i+1;
        end
    end
    
end
document.close();

Note that the above code will not work for scanned PDF files.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

How can I extract images from a PDF using MATLAB?

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How can I extract images from a PDF using MATLAB?

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments