Read text from a PDF document
Editor's Note: This file was selected as MATLAB Central Pick of the Week
% PDFREAD reads a PDF file using the iText java library.
%
% INPUT:
% PDF_LOCATION:
% String specifying the location of the PDF.
%
% OUTPUT:
% PDFTEXT:
% Cell array, each cell corresponds to each page of the parsed PDF
% file. Images are not extracted, only text.
%
% D. Wood, 7/3/2017
.
.
NOTES:
This software uses the open-source iText library.
The source .jar is included in the zip file, but more information can be found here:
https://github.com/ymasory/iText-4.2.0
.
Before the included pdfRead() function can be executed, simply run this command once:
javaaddpath('iText-4.2.0-com.itextpdf.jar')
The command can be run via console or script, but only needs to be done once.
.
This method is relatively robust, however it will not always return all the text in the document if the PDF has an unusual or complicated formatting (i.e. multiple non-fixed-width columns or excessive image captions).
Cite As
Derek Wood (2026). Read text from a PDF document (https://www.mathworks.com/matlabcentral/fileexchange/63615-read-text-from-a-pdf-document), MATLAB Central File Exchange. Retrieved .
MATLAB Release Compatibility
Platform Compatibility
Windows macOS LinuxCategories
Tags
Discover Live Editor
Create scripts with code, output, and formatted text in a single executable document.
| Version | Published | Release Notes | |
|---|---|---|---|
| 1.0.0.0 | (Updated description text slightly)
|
