Extract plain text from a PDF

This is a simple server that extracts text from a PDF using Apache PDFBox.

You can try it from the pdf2txt page on FileFormat.Info.

This is really just a webapp that runs the org.apache.pdfbox.text.PDFTextStripper class.

Running your own copy

The code is deliberately simple to avoid dependencies. All necessary libraries are included.

The easiest way to run it is with the included super-simple Dockerfile. See the run.sh and docker-run.sh shell scripts to see how I run it in development and production.

The code should work on any recent Java web server. There is nothing to compile: all the code is in the .jsp files.

Environment variables:

FORM_URL: the full URL of a form that should be used instead of the form in index.jsp. This will also trigger logging if it doesn't match the referrer.

License

GNU Affero General Public License v3.0

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
www		www
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
README.md		README.md
build.xml		build.xml
docker-run.sh		docker-run.sh
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extract plain text from a PDF

Running your own copy

License

Credits

About

Packages

Languages

License

FileFormatInfo/ff-pdf2txt

Folders and files

Latest commit

History

Repository files navigation

Extract plain text from a PDF

Running your own copy

License

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Packages 0

Languages

Packages