Skip to content

Latest commit

 

History

History
42 lines (31 loc) · 1.38 KB

README.md

File metadata and controls

42 lines (31 loc) · 1.38 KB

OCR Reader Writer

Technologies

Project was created and tested with:

  • Windows 10
  • Python 3.8.2
  • Tesseract v5.0.0-alpha.20200328
  • OCR Space

This project was tested with .jpg and .png only images.

Description

Project created in order to automate process of extracting text from images set into textfiles. This project currently supports only .jpg and .png images extensions.

Example application

This project can be used in order to extract text from large amount of images with use of Tesseract or OCR Space technology. User can place directory tree with images into "ocr_reader_writer\input" catalogue, and texts from all images will be saved into .txt files in identical directory tree in "ocr_reader_writer\output" catalogue.

Setup

  • Run command in ocr_reader_writer\ catalogue:
python -m virtualenv venv
cd venv
cd Scripts
activate
cd ..
cd ..
pip install -r requirements.txt

Run

Go to ocr_reader_writer\ and run command:

python ocr_reader_writer.py