OCR on PDF Files - Some images are not Processed #2360
hpcnetworks
started this conversation in
General
Replies: 1 comment 8 replies
-
Hi Bruno! |
Beta Was this translation helpful? Give feedback.
8 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there, good evening!
I have processed a PDF file with hundreds os pages, with a lot of scanned documents in it. OCR worked fine on most of these scanned documents (images). Unfortunetely, some images have not been processed by PCR module.
Analysing log files I found this:
2024-11-11 18:20:36 [WARN] [parsers.misc.PDFTextParser] Plugin JPEG2000 not found, JPX images will not be decoded from PDFs. You can download it from https://mvnrepository.com/artifact/com.github.jai-imageio/jai-imageio-jpeg2000/1.3.0 and put it in plugins folder. Warn: that plugin is worse to decode JPX outside of PDFs!
ModuleNotFoundError: No module named 'numpy'
ModuleNotFoundError: No module named 'numpy'
I downloaded version 1.4.0 of the plugin and copied it into plugin folder, but no success.
Can you help me undestand what is happening?
Thanks in advance,
Bruno Costa
Beta Was this translation helpful? Give feedback.
All reactions