Extract text from a PDF document

(if you are lucky)

Dimitri Shvorob

Version 1.0.0.0 (164 KB)

9K Downloads

(18)

4 Apr 2016

Download

Open in MATLAB Online

Download

Open in MATLAB Online

The submission calls on PDFTextStripper class of Ben Litchfield's PDFBox Java library to extract text from a PDF document.
1. Download PDFBox library from http://sourceforge.net/projects/pdfbox/
2. Download FontBox library from http://sourceforge.net/projects/fontbox/
3. Modify the file paths in pdfParseDemo.m
4. Enable cell mode and step through pdfParseDemo.m

The code does not handle files that have 'Content Copying' permission protected by a password; collaboration to remedy the issue is enthusiastically welcomed!

Cite As

Dimitri Shvorob (2026). Extract text from a PDF document (https://www.mathworks.com/matlabcentral/fileexchange/19798-extract-text-from-a-pdf-document), MATLAB Central File Exchange. Retrieved May 2, 2026.

MATLAB Release Compatibility

Compatible with any release

Platform Compatibility

Windows
macOS
Linux

Open in new tab

Version	Published	Release Notes	Action
1.0.0.0	4 Apr 2016	BSD	Download

Extract text from a PDF document

Cite As

Categories

Tags

General Information

MATLAB Release Compatibility

Platform Compatibility