OCR Reader is an app for organizing and reading scans of physical Japanese books and manga. Each page is run through OCR (optical character recognition) which allows for selecting text and use of pop-up dictionaries such as yomichan. Reader also integrates with JPDB to automatically parse the text and highlight unknown words. There is also a JPDB vocabulary popup which can be used to check words definitions and to quickly add them to your decks.
The app also includes a text hooker page, it can also parse the text and highlight unknown words thanks to the JPDB integration.
- Node.js (v18 LTS recommended).
- There is no need to install "Tools for Native Modules".
- Google Cloud account is required to OCR images.
- Must be either an account with billing activated or in a trial period.
- 1000 images per month can be processed for free, then $1.50 for each 1000 images, see details.
- Account is not required if you import OCR data created by someone else or if you just want to use the text hooker.
- Optionally, JPDB SRS account for text parsing and highlighting unknown words.
- Create a new project in the GCP console.
- Go to the Cloud Vision API and press "Enable".
- Go to the Service Accounts and press "Create service account".
- Enter some name for the account and press "Create and continue" then press "Done".
- Click on the newly created account and go to the "Keys" tab.
- Press "Add key" then "Create new key".
- JSON should be selected, press "Create".
- JSON key file will be downloaded automatically, you will need it in the next steps. Never share this file with anyone.
- Download release ZIP, extract it.
- If you have JSON key file for your Google Cloud account rename it to
gcp.json
and place it inside thedata
folder. - Run the application with
start.bat
.- On Linux or macOS execute
start.sh
from your terminal.
- On Linux or macOS execute
To update to a different version:
- Download new release ZIP, extract it.
- Move
data
folder to the new version. - Start the application as usual.
Default locations of the data directory and Google Cloud key file can be changed in the .env
file (it may be
hidden by default on Linux and macOS). If you do that don't forget to copy this file when updating to a different
version.
After starting the app you will see its URL in the terminal window, usually that will be http://localhost:3000. Open this address in your browser to access the app.
If you have a JPDB account and want to use it for parsing text and words highlighting:
- Go to the JPDB settings page, scroll to the bottom and copy your API key.
- In OCR Reader go to the "Settings" page.
- Paste your API key into the "JPDB API key" field.
- Optionally, select your mining deck.
- Save settings.
To add books to the reader, place images inside the data
folder following this structure:
data
├───Author 1
│ ├───Title 1
│ │ 001.jpg
│ │ 002.jpg
│ │ ....jpg
│ └───Title 2
│ 001.jpg
│ 002.jpg
│ ....jpg
└───Title 3
001.jpg
002.jpg
....jpg
- Images can be scans, photos, screenshots etc. If it's readable OCR should handle it.
- Names can be anything you want. Image names will be used for page sorting.
- Only images are supported (JPG, PNG).
- PDFs are not supported, but you can convert PDFs to images using other tools (e.g. ImageMagick, pdfimages).
Warning: images with EXIF rotation in metadata won't be handled correctly. This is mainly a concern when using photos. Make sure all EXIF rotation data is removed and images are rotated correctly before continuing. XnView MP works well for removing EXIF and fixing rotation.
Press Rescan books
on the home page after changing books.
Press OCR pending...
to start OCR, after it's done you will be able to press Read
button.
You can also use the menu to download OCRed text. This is useful for creating JPDB decks (for this you should download text with line breaks removed).
In reader mode, you can:
- Use your favorite pop-up dictionary as-if you were reading normal text.
- Adjust page zoom.
- Analyze the text and highlight unknown words with JPDB.
- Use the reading timer to track elapsed time and reading speed.
- Change minimum OCR confidence level.
- Change reading direction (right to left by default).
- Change page view mode.
- Change page display from single page to two pages.
- Adjust the overlay:
- Those options are meant for finding issues with the OCR, most of the time there's no need to use them.
- Show overlaid text and highlight detected paragraphs.
- Override font size and detected text direction (hold Alt key while menu is open to show these options).
Left
/Right
arrow keys - next/previous page.[
/]
- zoom in/out (in fixed page view mode).Alt
(hold)- disable dragging to allow for text selection (in floating page view mode).
- show additional options in the reader menu.
a
- analyze using JPDB. After page is analyzed show/hide analysis result.s
- show/hide overlaid text.d
- show/hide detected text paragraphs.
Text hooker page works with text extractors and apps with support for WebSocket server, those are:
- Textractor
- TextractorSender extension required.
- Agent
- Server must be manually enabled in settings.
- mpv
- mpv_websocket plugin required. You must either edit the Lua script to use port 9001 or configure OCR Reader to use port 6677.
Other apps might work too but are not tested.
Assuming your text extractor is configured correctly you should automatically see that WebSocket is connected after opening the text hooker page.
You can also just paste text directly into the page even when the WebSocket is disconnected.
If you have configured JPDB then click on the Analyze with JPDB
checkbox to enable text parsing and highlighting words.