Skip to content
This repository has been archived by the owner on Jan 21, 2022. It is now read-only.

Adding character maps - Getting the available characters. #132

Open
phoenixenero opened this issue Dec 6, 2015 · 5 comments
Open

Adding character maps - Getting the available characters. #132

phoenixenero opened this issue Dec 6, 2015 · 5 comments

Comments

@phoenixenero
Copy link
Contributor

One of the difficulties of displaying character maps is the fact that you have to access the font files themselves in order to find how many (accessable) glyphs are there. Of course you can do the "dumb" solution and loop across Unicode values, but that would take a really long time, and would leave a lot of gaps.

I finished writing two Ruby scripts that will automate that function. They depend on the woff2sfnt command line tool, and the ttfunk gem.

get-available-chars.rb outputs a file containing all the available Unicode values with glyphs encoded onto them. It works by following these steps:

  • First, we create a temporary copy of the .WOFF file and convert into OpenType/TrueType using woff2sfnt. This is necessary in order for ttfunk to access the OpenType/TrueType tables.
  • Using ttfunk, we access the cmap (character map) table and get the Unicode keys.
  • We then output them on a file as a comma-delimited list. Since we have finished getting the Unicode values, we also delete the temporary font copy.

get-available-chars.rb takes two options:

  • The file name of the font
  • The extension of the output files. By default, it's set to ".unc"

glob-get-available-chars.rb is a wrapper for get-available-chars.rb. This script takes two options:

  • The glob pattern (ex: "*.woff")
  • The extension of the output files.

It subsequently loops through all the files which matches the pattern and runs get-available-chars.rb with the options above.

Here's the sample output (I dropped the files from my VM into Windows):

ss 2015-12-06 at 06 24 50

From left-to-right: Heuristica Regular, Montserrat Regular, Charter Regular

We can now use these values to generate a table containing each font's available characters.

The script can be run after each update through Github pages. To prevent the repository being bloated, we can add the Unicode character map file's extension (say, .unc) in the .gitignore.

I had some difficulties writing this due to my inexperience with Ruby, and ttfunk's glaring lack of documentation, but eventually it all worked out. I plan on making a PR which outputs the character map (if it exists) on the font catalog page.

Anyway, that's all for today!

Note: if you're hosting on Ubuntu, you can get the woff2sfnt tool through launchpad.ubuntu.org

Edit: There's actually a possibility of subsetting fonts being possible thanks to ttfunk, though I imagine that'll be a bit difficult to implement :)

@phoenixenero phoenixenero changed the title Adding character maps part 1 - Getting the available characters. Adding character maps - Getting the available characters. Dec 6, 2015
@alfredxing
Copy link
Owner

This has definitely been on my to-do list for quite a while. I've been doing a bit of searching for tools to programmatically grab info from fonts, and came across https://github.com/behdad/fonttools/, which seems pretty robust and feature-filled. I think it's a part of the Google Fonts toolchain.

@phoenixenero
Copy link
Contributor Author

Now that I think about it, that is better. That is what Google uses for subsets.

@phoenixenero
Copy link
Contributor Author

So I was using the fonttools package. This command dumps the cmap tables of a font into a .ttx (XML) file.

ttx -t cmap Aileron-Black.woff

This seems to be a lot simpler than my previous workings, lol...

Though I haven't tested it yet, we can access the table data with the Nokogiri Ruby gem. Of course we can just use Regex, but that wouldn't be future-proof.

I will test this further. Will post results.

@phoenixenero
Copy link
Contributor Author

Here's a new version of my script, now condensed into 1 .rb file: https://gist.github.com/phoenixenero/c8d40a390bb1acabcf9c

It supports a -f flag, which takes in a file pattern glob (ex: *.otf, fonts/*.woff) as input and outputs the unicode character map of all of the files matched.

@phoenixenero
Copy link
Contributor Author

ss 2015-12-07 at 10 41 50

Alright seems like my implementation is working! Outputs a table markup with the codes! I might need to sort these though, seems like the .charmap generation is unsorted.

Edit: It's apparently it's not uncommon for fonts to have multiple cmap tables, will correct.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants