Read text from a PDF document

Read the text from a simple PDF document into MATLAB as a string

You are now following this Submission

Editor's Note: This file was selected as MATLAB Central Pick of the Week

% PDFREAD reads a PDF file using the iText java library.
%
% INPUT:
% PDF_LOCATION:
% String specifying the location of the PDF.
%
% OUTPUT:
% PDFTEXT:
% Cell array, each cell corresponds to each page of the parsed PDF
% file. Images are not extracted, only text.
%
% D. Wood, 7/3/2017
.
.
NOTES:
This software uses the open-source iText library.
The source .jar is included in the zip file, but more information can be found here:
https://github.com/ymasory/iText-4.2.0
.
Before the included pdfRead() function can be executed, simply run this command once:
javaaddpath('iText-4.2.0-com.itextpdf.jar')
The command can be run via console or script, but only needs to be done once.
.
This method is relatively robust, however it will not always return all the text in the document if the PDF has an unusual or complicated formatting (i.e. multiple non-fixed-width columns or excessive image captions).

Cite As

Derek Wood (2026). Read text from a PDF document (https://www.mathworks.com/matlabcentral/fileexchange/63615-read-text-from-a-pdf-document), MATLAB Central File Exchange. Retrieved .

General Information

MATLAB Release Compatibility

  • Compatible with any release

Platform Compatibility

  • Windows
  • macOS
  • Linux
Version Published Release Notes Action
1.0.0.0

(Updated description text slightly)
(Updated text again)
(Text again)
(I added a title image)