Repository for code underlying the paper 'Assessing the Impact of OCR Quality on Downstream NLP Tasks'
-
Updated
Oct 16, 2024 - Jupyter Notebook
Repository for code underlying the paper 'Assessing the Impact of OCR Quality on Downstream NLP Tasks'
Implementation of a couple of heuristics that estimate OCR quality without reliance on ground truth data, focusing on historical documents written in English.
Add a description, image, and links to the ocr-quality topic page so that developers can more easily learn about it.
To associate your repository with the ocr-quality topic, visit your repo's landing page and select "manage topics."