Skip to content
/ nihongo Public

japanese language data and dictionary

Notifications You must be signed in to change notification settings

sph-mn/nihongo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

a fast dictionary, lists related to vocabulary and kanji, and scripts to compile the lists.

nihongo dictionary

  • kanji stroke order, meaning, and common readings lookup
  • top 30000 words fuzzy search that searches for similar pronunciation and sorts results by frequency
  • single-file browser application. the file under compiled/ can be downloaded and used offline in a browser but is also hosted here
  • compiled/nihongo-dictionary.html

csv lists

under data:

  • components-ck.csv: [components, kanji] alternative to radkfile
  • components-kc.csv: [kanji, components] alternative to kradfile
  • jouyou-kanji.csv: [kanji, meaning, readings] the 2136 jouyou kanji as of 2020 sorted by stroke count with single word meaning and common readings
    • some meanings use relatively uncommon english words, for example: acquiesce, adroit, ardent, beckon, confer, consign, consort, consummate, portent. in a few cases the words are ambiguous. for example "vice" isnt meant in the sense of "shortcoming" but in the sense of "deputy"
    • order by stroke count roughly corresponds to complexity, components come first
    • the jouyou kanji in general exclude some commonly seen kanji, for example: 嬉萌伊綺嘘菅貰縺繋呟也
  • jouyou-kanji-learning.csv: [[kanji, meaning, readings], [word, reading, meanings]] kanji information and example words with translations. sorted by number of common readings and readings alphabetically. kanji with few common readings come first
  • jouyou-kanji-only-words.csv: [word, readings, meanings] frequently used example words for the jouyou kanji
  • jouyou-kanji-with-words.csv: [kanji, meaning, readings, words] like jouyou-kanji.csv but with an additional column for newline separated example words
  • jouyou-stroke-count.csv: [kanji, stroke-count]
  • jouyou-two-shared-components.csv: [component, kanji ...] list of kanji that share at least two components
  • jouyou-with-shared-readings.csv: [kanji, readings, multiple-kanji-per-reading]
  • kanji-radicals.csv: [stroke-count, radical, meaning, variants, note, is_new]
  • multiple-kanji-to-reading.csv: [multiple-kanji, reading]
  • ideophones.csv: [romaji, meanings] onomatopoeia, sound symbolisms
  • jouyou-kanji-learning-oneline.csv: [kanji, meaning, readings, example-words] like jouyou-kanji-learning.csv but words in one separate column
  • chinese-japanese-overlap.csv: sino-japanese cognates, words with the same characters in both languages

some lists can be customized, see exe/update-kanji-words.

anki deck

  • data/ja-kanji-learning.apkg: [kanji, [readings, meaning, example words]] and reverse cards

data sources

data is included. all other data of this project, including the source code, is licensed under cc-by-sa-4.0.

technical

  • the generator scripts uses node.js and its package manager npm. code is written in coffeescript which is javascript with a reduced syntax
  • how to recreate the csv files
    • initialise the development environment once with "npm install" to install dependencies, which creates a node_modules directory in the current directory
    • see files under exe/, they are shell scripts and can be executed with ./exe/filename. some contain further configuration options
    • for source code see src/, especially main.coffee and dictionary.coffe
  • how to recreate nihongo-dictionary
    • create data files with "./exe/update-dictionary-data"
    • execute "./exe/update-dictionary"
    • the result file is build from src/dictionary-template.html
  • good to know regarding unicode: multiple kanji components/radicals and kanji that look exactly the same exist at separate codepoints. see wikipedia: kangxi radical unicode