Skip to content

keeleleek/grapheme-stats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple grapheme statistics of text files

This is a simple Python program that reads a text file and outputs csv table with statistics of all graphemes encountered in that file.

It is meant for debugging different kinds of problems.

Graphemes are composed characters that can consist of one or more characters. For example the grapheme õ can be represented both as LATIN SMALL LETTER O WITH TILDE and LATIN SMALL LETTER O + COMBINING TILDE.

Output statistics

The output statistics file include the grapheme, count, number of codepoints, names of the codepoints. Here is an example output table which could be used for finding problems of double representations of the grapheme õ.

grapheme count number of codepoints codepoint names
õ 3 1 LATIN SMALL LETTER O WITH TILDE
1 2 LATIN SMALL LETTER O; COMBINING TILDE

Usage

Run the program with the command python grapheme-stats.py filename.txt.

About

Create simple grapheme statistcs of text files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages