Skip to content

Collection of datasets for Swedish words and tool for finding reverse compound formations

License

Notifications You must be signed in to change notification settings

remnestal/compounds

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Compound words

Compound formation is very common in the Swedish language and there are many combinations of words that can be reversed to form another correct word; sometimes with a completely different meaning. For some reason @HerrLantz thinks these are really exciting so I wanted to help him find more of them 🔍

But I don't like to think for myself so I made this program to do that for me.

How-to

$   make                # defaults to `make all`

This command runs the following operations:

  1. Download dictionaries from source
  2. Combine and sanitize the data into a complete dataset
  3. Divide the dataset into partitions
  4. Find reversed compound words

Or run them separately:

$   make dictionary     # download the dictionaries from source
$   make prepare        # build the combined dataset from source
$   make divide         # divide the dataset into partitions for each letter
$   make conquer        # find reversed compound words using the partitions

The combined dictionary ends up in dump/dictionary.txt and the partitioned version is created in dump/lexicon/. The list of possible reversed compound words are piped into dump/compounds.txt.

Swedish dictionaries

All dictionaries are drawn from Nordic Words where they reside in the public domain as of 2018-03-19.

Specifically, the following dictionaries are incorporated:

The combined dataset has 306164 entries.

Encoding

The files stored in the dictionaries/ subdirectory are encoded as ISO-8859-1, which makes åäö behave a little weird in some contexts. The dictionary generated by make prepare is however encoded as UTF-8

About

Collection of datasets for Swedish words and tool for finding reverse compound formations

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published