Skip to content

kojiishi/east_asian_spacing

Repository files navigation

CI PyPI Dependencies

East Asian Contextual Spacing

This directory contains tools for the OpenType Contextual Half-width Spacing feature for Japanese/Chinese/Korean typography.

This feature enables the typography described in JLREQ 3.1.2 Positioning of Punctuation Marks (Commas, Periods and Brackets) 句読点や,括弧類などの基本的な配置方法 for Japanese, and CLREQ 3.1.6.1 Punctuation Adjustment Space 标点符号的调整空间 標點符號的調整空間 for Chinese. Following is a figure from JLREQ:

An early discussion at Adobe CJK Type blog article and Part II may help to understand the feature better.

Demo

You can find sample text here. This sample page uses fonts built with this tool.

OpenType Font Features

OpenType defines 4 feature tags for fonts to support this feature:

  • The "chws" feature tag, and the "vchw" feature tag as its vertical flow counterpart.
  • The "halt" feature tag, and the "vhal" feature tag as its vertical flow counterpart.

All 4 features are desired, as each feature is applied in different context.

This package adds these features to any OpenType/TrueType fonts when they are missing, by computing the feature tables from data such as Unicode code points and glyph outlines.

Adding the features to your fonts

Install

You can install this tool by pip or pipx.

pipx install east-asian-spacing
pip install east-asian-spacing

Please be aware that, if you install with pip in the global environment, its dependencies may cause conflicts with other packages. If all what you need is the command line tool, pipx can install it globally while still isolating it in a virtual environment.

Please also see the install package section if you want to use this package from your Python program, or the clone and install section if you want to diagnose fonts or the code in more details.

Command Line Usages

The following example adds the feature to input-font-file and saves it to the build directory.

east-asian-spacing -o build input-font-file

The testing section has resources for checking the differences and testing fonts you built.

For other options and usages, the --help option can show the full list of options.

Supported Fonts

The algorithm is applicable to any CJK fonts. Following fonts are tested on each release:

CJK fonts at fonts.google.com are tested in the chws_tool package. Several other fonts were also tested during the development.

When adding the features to your fonts, the test HTML is a handy tool to check the results. If you encounter any problems with your fonts, please report to issues.

Please also see the Advanced Topics below if you want to customize the default behaviors for your fonts.

TrueType Collection (TTC)

When the input font file is a TrueType Collection (TTC), this tool adds the feature to all fonts in the TTC by default.

If you want to add the feature to only some of fonts in the TTC, you can specify a comma-separated list of font indices. The following example adds the feature to the font index 0 and 1, but not to other fonts in the TTC.

east-asian-spacing --index=0,1 input-font-file.ttc

API

Install Package

You can install this package using your favorite package management tools such as poetry, pipenv, or pip.

pip install east-asian-spacing
pipenv install east-asian-spacing
poetry add east-asian-spacing

Please also see the clone and install section if you want to diagnose fonts or the code in more details.

Sample Code

The following example creates a font with the features in the "build" directory if the features are applicable.

import east_asian_spacing

async def main_async():
    builder = east_asian_spacing.Builder("fonts/input.otf")
    output_path = await builder.build_and_save("build")
    if output_path:
        print(f"Saved to {output_path}")
    else:
        print("Skipped")

Testing

Test HTML

A test HTML page is available to check the behavior of fonts on browsers.

It can test fonts you built locally.

  1. Save the page to your local drive. The HTML is a single file, saving the HTML file should work.
  2. Add your font files to the "fonts" list at the beginning of the <script> block.
  3. Open it in your browser and choose your font.

Note, when you want to test a TTC (TrueType Collection) but your browser can load only the first font in the TTC, the following command extracts all OpenType fonts (.otf or .ttf) from an OpenType Collection font file (.ttc or .otc).

east-asian-spacing ttc build/NotoSansCJK-Regular.ttc

Dump

The dump sub-command can create various types of text dump files.

The most simple usage is to show a list of tables. This is similar to the "-l" option of TTX, except for TrueType Collections (TTC), this tool can show tables of all fonts in the TTC, along with which tables are shared with which fonts.

east-asian-spacing dump build/NotoSansCJK-Regular.ttc

The "-o" option creates table list files in the specified directory:

east-asian-spacing dump -o build/dump build/*.ttc

The "--ttx" option creates TTX text dumps of all tables in addition to the table list files. This is similar to the "-s" option of TTX, except that it can dump all tables in TrueType Collections (TTC).

east-asian-spacing dump -o build/dump --ttx build/*.ttc

Diff

The dump sub-command can also create dump files of two font files and compare them. This helps visualizing differences in two fonts, specifically, the font files you created from the original font files.

east-asian-spacing dump -o build/diff --diff source_fonts_dir build/NotoSansCJK.ttc

The example above computes the differences between source_fonts_dir/NotoSansCJK.ttc and build/NotoSansCJK.ttc by creating following 3 sets of files:

  1. The table list and TTX text dump files for build/NotoSansCJK.ttc in the build/diff/dump directory.
  2. The table list and TTX text dump files for source_fonts_dir/NotoSansCJK.ttc in the build/diff/src directory.
  3. Diff files of the two sets of dump files in the build/diff directory.

Note: The "--diff" option is more efficient than doing all these, especially for large fonts, because it skips creating TTX of tables when they are binary-equal.

The -o option is optional. When it is omitted, the sub-command outputs the diff to stdout.

east-asian-spacing dump --diff source_fonts_dir build/NotoSansCJK.ttc | less

To create diff files for all fonts you built, you can pipe the output as below:

east-asian-spacing -p *.otf | east-asian-spacing dump -o build/diff -

The "-p" option prints the input and output font paths to stdout in the tab-separated-values format. The dump sub-command with the "-" argument reads this list from stdin, and creates their text dump and diff files in the build/diff directory. The "--diff" option is not necessary in this case, because the source font paths are provided from the pipe.

References

Once you reviewed the diff files created above, or tested fonts you build, you can copy the diff files into the references directory. Then when you want to build them again, such as when the fonts are updated or when the build environment is changed, you can compare the diff files with the reference files to know how new fonts are different from previous builds.

With the "-r" option, the dump sub-command creates diff files between two font files, and compare the diff files with once-reviewed diff files in the references directory.

The typical usage of this option is as below:

east-asian-spacing -p -g=build/glyphs *.otf |
    east-asian-spacing dump -o=build/diff -r=references -

Please see the Diff section for the "-p" option and piping.

The build*.sh scripts include this option.

Shape Test

The shape testing shapes test strings and checks whether the contextual spacing is applied or not.

The --test option sets the level of the shape testing.

east-asian-spacing --test 2 -v -o build input-font-file

The level 0 disables the shape testing. The level 1 runs a smoke test using a small set of samples. The level 2 runs the shape testing using a large set of test strings. The default value is 1.

Advanced Topics

Algorithm

The algorithm is language agnostic and is applicable to any CJK fonts.

This package determines the glyph pairs to adjust spacings by a set of Unicode code points defined in the Config class.

Then for each pair, it checks if the spacings are applicable by examining glyph outlines and computing ink bounding boxes of glyphs. For example, when glyphs are very thick, glyphs may not have enough internal spacings, and applying the spacings may cause glyphs to collide. This package automatically detects such cases and avoids applying spacings to such pairs.

This automatic behavior can be disabled by specifying the languages below, or by setting Config.use_ink_bounds to False in your Python program.

Languages

There are language-specific conventions for where punctuation characters are placed in the glyph spaces. For example, U+3002 IDEOGRAPHIC FULL STOP should be placed at the left-bottom corner of the glyph space in Japanese, while it should be placed at the center in Traditional Chinese.

By default, this package determines such differences from glyph outlines as described in the Algorithm section above. But you can specify the OpenType language system tag to let this package follow the language convention instead of using glyph outlines. The following example disables the automatic determination by glyph outlines, and specifies that the font is a Japanese font.

east-asian-spacing --language=JAN input-font-file

For TrueType Collections (TTC), the language option applies to all fonts in the TTC by default. When you want to specify different languages to each font in the TTC, it accepts a comma-separated list. The following example specifies Korean for the font index 1, Simplified Chinese for the font index 2, and automatic for all other fonts.

east-asian-spacing --language=,KOR,ZHS input-font-file.ttc

You can combine these two options. The following example applies JAN to the index 2, and ZHS to the index 3. Other fonts in the TTC are not changed.

east-asian-spacing --index=2,3 --language=JAN,ZHS input-font-file.ttc

Character-Pairs

You may want to adjust which character-pairs should adjust spacings, in cases such as when your fonts may not have expected spacings for some characters. Currently, this is possible only from Python programs.

For a simple example, please see the test_config function in tests/config_test.py.

The chws_tool project is an actual example of customizing this package.

HarfBuzz

This package uses the HarfBuzz shaping engine by using a Cython bindings uharfbuzz.

If you want to use a specific build of the HarfBuzz, this tool can invoke the external hb-shape command line tool instead by setting the SHAPER environment variable.

export SHAPER=hb-shape

To install hb-shape for Linux:

sudo apt get libharfbuzz-bin

To install hb-shape for Mac with homebrew:

brew install harfbuzz

Instructions for other platforms may be available at command-not-found.com.

Clone and Install

If you may need to diagnose fonts or the code, cloning and installing using poetry is recommended:

git clone https://github.com/kojiishi/east_asian_spacing
cd east_asian_spacing
poetry install
poetry shell

This method has following advantages:

  • Installs the exact versions of dependencies.
  • Installs in the editable mode (i.e., pip "-e" option or setuptools "development mode").
  • Installs testing tools too. You can run unit tests to verify your installation if needed.
  • Creates the virtual environment automatically.

You can also install the cloned directory using pip if you prefer:

git clone https://github.com/kojiishi/east_asian_spacing
cd east_asian_spacing
pip install .

Unit Tests

This repository contains unit tests using pytest. The unit tests cover the basic functionalities including shape tests, adding the feature to a test font, and comparing it with references.

If you followed the clone and install section, tools for unit testing are already installed. Before you run them first time, you need to download fonts for testing:

./tests/download_fonts.py

You can then run the tests by:

pytest

or run them with multiple versions of Python using tox:

tox

Scripts

The scripts directory has some small shell scripts.

build*.sh scripts are useful to build fonts, compute diff from source fonts, and compare the diff files with references. Followings are example usages.

./scripts/build.sh input-font-file.otf -v
./scripts/build-noto-cjk.sh ~/fonts/noto-cjk -v