This is a simple server that extracts text from a PDF using Apache PDFBox.
You can try it from the pdf2txt page on FileFormat.Info.
This is really just a webapp that runs the org.apache.pdfbox.text.PDFTextStripper class.
The code is deliberately simple to avoid dependencies. All necessary libraries are included.
The easiest way to run it is with the included super-simple Dockerfile. See the run.sh and docker-run.sh shell scripts to see how I run it in development and production.
The code should work on any recent Java web server. There is nothing to compile: all the code is in the .jsp
files.
Environment variables:
FORM_URL
: the full URL of a form that should be used instead of the form in index.jsp. This will also trigger logging if it doesn't match the referrer.