Skip to content

Latest commit

 

History

History
39 lines (31 loc) · 2.13 KB

README.md

File metadata and controls

39 lines (31 loc) · 2.13 KB

Compound words

Compound formation is very common in the Swedish language and there are many combinations of words that can be reversed to form another correct word; sometimes with a completely different meaning. For some reason @HerrLantz thinks these are really exciting so I wanted to help him find more of them 🔍

But I don't like to think for myself so I made this program to do that for me.

How-to

$   make                # defaults to `make all`

This command runs the following operations:

  1. Download dictionaries from source
  2. Combine and sanitize the data into a complete dataset
  3. Divide the dataset into partitions
  4. Find reversed compound words

Or run them separately:

$   make dictionary     # download the dictionaries from source
$   make prepare        # build the combined dataset from source
$   make divide         # divide the dataset into partitions for each letter
$   make conquer        # find reversed compound words using the partitions

The combined dictionary ends up in dump/dictionary.txt and the partitioned version is created in dump/lexicon/. The list of possible reversed compound words are piped into dump/compounds.txt.

Swedish dictionaries

All dictionaries are drawn from Nordic Words where they reside in the public domain as of 2018-03-19.

Specifically, the following dictionaries are incorporated:

The combined dataset has 306164 entries.

Encoding

The files stored in the dictionaries/ subdirectory are encoded as ISO-8859-1, which makes åäö behave a little weird in some contexts. The dictionary generated by make prepare is however encoded as UTF-8