Skip to content

Making a minimal build for ICU58 or later

John Thomson edited this page May 5, 2017 · 6 revisions

Making a minimal build of Icu4C (such as MinimumStaticallyLinked58)

Example describes what I did to make MinimumStaticallyLinked58. Change 58 to whatever version you are making.

  1. Clone the Icu4C project (https://github.com/sillsdev/icu4c.git) if you don't already have it.
  2. Find the tag for the ICU version you want to base on, e.g., release58-1.
  3. Check it out and make your branch (e.g., MinimumStaticallyLinked58)
  4. Open solution...I used source/allinone/allinone.sln. (Had to tell VS2017 not to update projects or change tools version.)
  5. Since I had problems later, I tried a build at this point. Build all on the solution eventually succeeded. It added 3910 new files to git!
  6. Added .gitignore with the following rules:
*.user
// git artifacts
*.orig
// visual studio local artifacts
**/.vs/**

// various things the build process inserts into the source tree in a release build
source/data/out/**
include/unicode/**
**/*.tlog
source/test/testdata/out/**
**/*.exp
**/x86/Release/**
source/extra/uconv/pkgdatain.txt
source/extra/uconv/resources/fr.res
source/extra/uconv/resources/root.res
source/stubdata/stubdatabuilt.txt
  1. Modify icu/source/common/unicode/uconfig.h: change the line
 #define UCONFIG_ONLY_COLLATION 0

to

#define UCONFIG_ONLY_COLLATION 1

Similarly make sure that UCONFIG_NO_LEGACY_CONVERSION is defined to be 1.

  1. Modify icu/source/i18n/sharedpluralrules.h: change the destructor of SharedPluralRules to
virtual ~SharedPluralRules() { delete ptr; }

At this point, building produces icudt58.dll 24.9M; compare to ICU54.net shippped with libpalaso, 938kb. (icuin58.dll is 486kb, compare to icuin54.dll 527kb; icuuc58.dll 986kb, compare to icuuc54.dll 1.05mb). We still need to get that data down a lot!

At this point I found that one test DLL fails to build. This doesn't matter, except that it will prevent the Nuget package build from completing. If you experience this just open the allinone solution properties, choose configuration Properties/Configuration, select All Configurations and All Platforms in the combo boxes at the top, and remove the check box telling it to build the problem test project. (Note: you'll need to do something different if the failure isn't a TEST project.)

  1. Customize ICU's data. This is the main step that really minimizes the size. The original instructions say to use the tool at http://apps.icu-project.org/datacustom/index.html to generate a data file (deselect everything except Collators/coll/ucadata.icu), get the resulting data file and extract it to icu/source/data/in. This isn't possible any more because the tool at that URL is not being maintained after version 5.7.

Instead, we have to mess with source files. The starting point for understanding what's going on is the makefile for the makedata project, source\data\makedata.mak. The interesting bit starts around line 167. If source/data/in/icu58_.pkg exists, it builds from that; this is the output we should have generated from the customization tool.

Since that does not exist, it includes a number of other .mk files from all over the source/data folder. Each of them defines one or more lists of files to be included in the icudt58.dll package. It would be possible to edit these make files, but they recommend against it, as then the changes would have to be reconciled with any later version of the files in a later ICU. Instead, the build process defines a 'local' file corresponding to each of the .mk files, and includes it if present. These local files can override what is in the corresponding main make file.

For example, source/data/lang/resfiles.mk defines LANG_SOURCE to be a long list of files. The customization process is primarily designed for adding more resources, and the main make file anticipates this with a command to replace LANG_SOURCE with the concatenation of LANG_SOURCE and LANG_SOURCE_LOCAL if the anticipated 'local' make file is found. So to add things, you can just define LANG_SOURCE_LOCAL in a (new) file source/data/lang/reslocal.mk.

In theory, removing things is done by simply redefining LANG_SOURCE in reslocal.mk. In practice, it wasn't that simple. There is a (repeated) bug in the main make file. If LANG_SOURCE_LOCAL is empty, concatenating it onto the end of LANG_SOURCE leaves a trailing space. Some very tricky macros are then used to insert "lang" into each file name, by first replacing ".txt " with ".res lang" (which is supposed to put a lang\ after every file but the last), and then replacing "lang\ " with "lang" to get rid of the previously-present space. A trailing space breaks this because an extra lang\ is inserted after the last filename, and becomes a target that the system does not know how to build.

There are two ways to work around this. If you don't want to modify makedata.mak, you can define LANG_SOURCE to be empty and LANG_SOURCE_LOCAL to have just one file. In a few cases it was a bit more tricky than this, but that's the basic strategy. You need to check the makedata.mak to see exactly what it is defining and how it's manipulating things... I had to define three pairs of symbols to make brklocal.mk work. Using this strategy I was able to get icudt58.dll down to 1986kb.

The other solution is to modify makedata.mak. Basically each of the lines like LANG_SOURCE=$(LANG_SOURCE) $(LANG_SOURCE_LOCAL) should be surrounded with tests like !IFDEF LANG_SOURCE_LOCAL...!ENDIF. Then reslocal.mk can simply !UNDEF LANG_SOURCE to get rid of all the data from the lang directory.

Probably the various *local.mk files I added in the MinimumStaticallyLinked58 branch can be cherry-picked into a new branch, and the changes I made to makedata.mak will be a good pattern (possibly even mergeable) for fixing that, if ICU haven't fixed it themselves. Using this approach I got icudt58.dll down to 1061K.

  1. Build ICU VS solution in icu/source/allinone/allinone.sln.

    • Choose Configuration Release and Target Platform Win32. There will be errors about "tstfiles.mk not found" that can be ignored.
  2. To validate the results, get the icu-dotnet project (https://github.com/sillsdev/icu-dotnet). First build it as-is and verify that all tests pass. Next, go to output/Debug and rename the folder x86 to x-86disabled. (This is just to make sure the x86 dlls are not being used to run tests... if by chance you're on a 32-bit machine, reverse this strategy.) Then go into x64 and remove (or move to a Disabled directory) all the DLLs except the three icuXX54.dll ones (these are an earlier minimized set). Run the tests again. I found that 90 failed. Filter to select failed tests and copy the test output window to a file.

Next, remove the icuXX54 dlls, run an x64 build of icu4c's makedata project, and copy the corresponding three icuXX58 dlls from the bin64 directory to icu-dotnet's output\Debug\x64. Run the tests again. They should now be using your DLLs...hopefully, they work at least as well as the old ones! You can again save the failed test output and use a diff tool to see any changes.

  1. There's a series of commits common to the Minimum branches that should be cherry-picked or something equivalent done.
  • If you don't already have a satisfactory .gitignore and readme, you can cherry-pick a commit like I90d97635b866c16f420a4bbb687ba2773617aa64 ("Add readme and .gitignore file").
  • Make the build statically-linked (as explained in the ReadMe). c0d66dc38a5276f2c0c62a59745d38082e9de605 ("Make a staticallly-linked build") is typical, but you probably don't want to cherry-pick as conflicts in the project files are likely to be horrific. Instead, do a global search-and-replace of MultiThreadedDLL with MultiThreaded. Check results in git and build and you should be good.
  • Add nuget packaging. Cherry-pick a build like I277ad9c5c1483dbf8ee873625fb246298c399343 ("Add nuget packaging"), then fix the version number where icu4c.proj has <icu_ver Condition="'$(icu_ver)' == ''">58</icu_ver>.
  • Add a file to cause Jenkins to actually make the nuget packages. Cherry-pick a build like Ic15182b882eafc236148518a270a7892c12d81fb ("Add Jenkinsfile"). You might have to adjust the msbuild version number (e.g. msbuild14 instead of msbuild12).
  1. It is a manual step to actually publish the nuget package that Jenkins will make. Contact Eberhard if he doesn't update this to say how to do it.

  2. If necessary edit the ReadMe.md at the project root to list the new long-running branch. More recently we've been describing them generically so may not be needed. FieldWorks is currently the default branch so that's the version of the ReadMe that actually shows up on the web page.