GitHub - nramkissoon/JMdictParser: Script for parsing Japanese word data from the JMdict project

JMdictParser

JMdictParser is a script for parsing through JMdict data http://www.edrdg.org/jmdict/edict_doc.html for Japanese words that are made up entirely of kanji and are of lengths greater than 1 character. The script returns a Python dictionary containing readings, definitions, and JLPT data for each word and exports that dictionary to compound_dict.json.

Usage

compound_dict.json is already included along with a copy of the JMdict_e file needed to build it.

Simply run the script in the same directory as the JMdict file in order to build a new dictionary.

About the compound_dict dictionary

The current version of the dictionary contains 96390 entries, all of which have relevant definitions, reading, and JLPT data. Regarding usage, kanji words are keys that return sub-dictionaries that use 'reading', 'meaning', 'jlpt' as keys to access the data.

License information

JMdictParser.py is free to use and modify. kanji_dict.json is built using the script found at https://github.com/nramkissoon/Kanjidicparser and utilizes data from the KANJIDIC project. JLPT data from the KANJIDIC projects is subject to conditions found at https://www.edrdg.org/edrdg/licence.html. JMdict data is subject to conditions found at http://www.edrdg.org/.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
JMdictParser.py		JMdictParser.py
JMdict_e		JMdict_e
README.md		README.md
compound_dict.json		compound_dict.json
kanji_dict.json		kanji_dict.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

nramkissoon/JMdictParser

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages