Skip to content

ICU+Docx4J based MS Word file font converter for legacy Ethiopic font encodings

Notifications You must be signed in to change notification settings

lights7/DocxConverter

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Docx Converter from Tibetan Ededris Font to Unicode

Just download ConvertT.jar and run under any system that has java script installed.

If you are trying to run command line, you can continue to read.

About

This repository may be imported into Eclipse as a simple Maven Java project. The "Eclipse IDE for Java Developers" download option will include both Maven and Git integration, no additional installation will be needed to build this project. Maven in turn will retrieve all dependencies, primarily the docx4j (6.0.1) and ICU (63.1) and Java libaries.

The converter presently provides support for two legacy (non-Unicode) systems, Brana (I & II) and Feedel Ge'ez (New & B). The conversion mappings come directly from the long defunct LibEth C language library which supported conversion of many more legacy encoding systems. Support for additional encodings systems can be porrted from LibEth as the need arrises. Please feel free to request support for additional systems.

Usage (Executable Jar with GUI)

In the GitHub the "releases" folder you can find and download the "DocxConverter-0.2.0-full-gui.jar" file. This version contains the converter and all of its dependencies (jar libraries). Double clicking the jar file will launch a user interface where fonts and files may be selected via mouse and menus.

Output files will have the name "-Abyssinica" appended to it. The application uses the Abyssinica SIL font as a default for output. If you do not have the Aybssinica SIL font installed this will not be an issue, Microsoft Word will substitute for another font such as Nyala.

Command Line Usage Examples

% java -cp [path to libaries] <system> <file-in> <file-out>

<system> is either: brana or geeznewab

Brana 90

The Brana 90 encoding system did not decompose letters into base forms and separate diacritical marks. Instead, it split the full syllabary across two two fonts, Brana I and Brana II respectively. Brana 90 was a proprietary application that used the HighEdit document format as its native system.

  1. In the Brana 90 application, save a document as RTF.
  2. Open the RTF document in Microsoft WordPad and Save As... Office Open XML Document (.docx)
    (MS Word 2016 is unable to open RTF documets saved from Brana 90).
  3. The converter application may be run simply from Eclipse. At the command line, you will need to specify the paths to depending librariesl in a form similar to:
% java -cp DocxConverter-0.2.0.jar:docx4j-6.0.1.jar:dependencies/*:icu4j-63_1.jar:slf4j-1.7.25/slf4j-nop-1.7.25.jar org.geez.convert.docx.ConvertDocx brana MyFileIn.docx MyFileOut.docx 

Feedel

The Feedel company produced three encoding systems that all took the approach of decomposing letters into base forms and separate diacritical marks. The most widely used of the three encoding systems used two fonts: GeezNewA and GeezNewB. The Feedel application was a keyboard utility that could be used in Microsoft Windows systems up until Windows XP. The following steps assume that Feedel documents were composed in older versions of Microsoft Word:

  1. Open a Feedel .doc file in a recent version of Microsoft Word (2007 or later).
  2. Save the document from Word as a Word Document (.docx)
  3. The converter application may be run simply from Eclipse. At the command line, you will need to specify the paths to depending librariesl in a form similar to:
% java -cp DocxConverter-0.2.0.jar:docx4j-6.0.1.jar:dependencies/*:icu4j-63_1.jar:slf4j-1.7.25/slf4j-nop-1.7.25.jar org.geez.convert.docx.ConvertDocx geeznewab MyFileIn.docx MyFileOut.docx 

About

ICU+Docx4J based MS Word file font converter for legacy Ethiopic font encodings

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%