CJK fonts and Ghostscript integration

Font setup, especially for CJK (Chinese-Japanese-Korean) fonts, is complicated, mostly due to a diversity of formats, diversity of configuration places, and last but not least the inflexibility of Ghostscript. The script we are developing aims at a one-shot solution that searches all available fonts, and makes those recognized available in Ghostscript.

cjk-ghostscript

Download and development place

The script is available from CTAN and also from TeX Live, package name cjk-gs-integrate.

The script and some test files are developed in the Github cjk-gs-support repository. Currently nothing more than the script itself is needed, as the font database is included in the script.

Direct link to the latest version: cjk-gs-integrate.pl (only this script is necessary)

Operation

This script searches a list of directories (see below) for CJK fonts, and makes them available to an installed Ghostscript. In the simplest case with sufficient privileges, a run without arguments should effect in a complete setup of Ghostscript.

In particular, the script tries the following actions:

TrueType (TTF) fonts

For each found and recognized font, the script does:

  • creates and entry in cidfmap.local in the Ghostscript Resource/Init directory
  • links the font into Ghostscript’s Resource/CIDFSubst directory
  • creates for each encoding a small code snippet for the recoded font in Resource/Font

OpenType (OTF) fonts

For each font found and recognized, the script does:

  • links the font into Ghostscript’s Resource/CIDFont directory, under the standardized PSName
  • creates for each encoding a small code snippet for the recoded font in Resource/Font

Aliases

The script also tries to set up a set of aliases, defined via Provide statements in the font database. These aliases are necessary because these are the names used by TeX and other applications. The aliases are saved into cidfmap.aliases in the Ghostscript Resource/Init

Integration

Finally, the script adds code to GhostScript’s Resource/Init/cidfmap to load the generated cidfmap.local and cidfmap.aliases.

Usage

The script allows to adjust several parameters, in particular:

  • change the output directory in case of insufficient permissions
  • override alias settings automatically computed

Warning: When run on a Unix system without adjusted output option, the script normally requires root privileges, as it tries to write to the Resource directory of an installed Ghostscript.

Without any further customization via command line options, the script tries to set up a Ghostscript searched via the gs program in the path for all fonts found.

[perl] cjk-gs-integrate[.pl] [OPTIONS]

Options

Various options allow to influence the operation of the script, as well as the amount of output:

  -n, --dry-run         do not actually output anything
  --remove              try to remove instead of create
  -f, --fontdef FILE    specify alternate set of font definitions, if not
                        given, the built-in set is used
  -o, --output DIR      specifies the base output dir, if not provided,
                        the Resource directory of an install Ghostscript
                        is searched and used.
  -a, --alias LL=RR     defines an alias, or overrides a given alias
                        illegal if LL is provided by a real font, or
                        RR is neither available as real font or alias
                        can be given multiple times
  --filelist FILE       read list of available font files from FILE
                        instead of searching with kpathsea
  --link-texmf [DIR]    link fonts into
                           DIR/fonts/opentype/cjk-gs-integrate
                        and
                           DIR/fonts/truetype/cjk-gs-integrate
                        where DIR defaults to TEXMFLOCAL
  --machine-readable    output of --list-aliases is machine readable
  --force               do not bail out if linked fonts already exist
  -q, --quiet           be less verbose
  -d, --debug           output debug information, can be given multiple times
  -v, --version         outputs only the version information
  -h, --help            this help

Command like options

This options change the behavior in the sense that nothing is actually set up, but information about found fonts and alias ordering is output.

  
  --only-aliases        do only regenerate the cidfmap.alias file instead of all
  --list-aliases        lists the available aliases and their options, with the 
                        selected option on top
  --list-all-aliases    list all possible aliases without searching for actually
                        present files
  --list-fonts          lists the fonts found on the system
  --info                combines the above two information

Directories searched

Search of fonts is done using the kpathsea library, in particular using kpsewhich program. Thus, an installed TeX system is necessary. By default the following directories are searched:

  • all TEXMF trees
  • /Library/Fonts, /System/Library/Fonts, /Library/Fonts/Microsoft, /Network/Library/Fonts, and ~/Library/Fonts (all if available)
  • c:/windows/fonts (on Windows)
  • the directories in OSFONTDIR environment variable

In case you want to add some directories to the search path, adapt the OSFONTDIR environment variable accordingly. Example:

OSFONTDIR="/usr/local/share/fonts/truetype//:/usr/local/share/fonts/opentype//" cjk-gs-integrate

The above invocation will result in fonts found in the above two given directories to be searched in addition.

Output files

If no output option is given, the program searches for a Ghostscript interpreter gs and determines its Resource directory. This might fail, in which case one need to pass the output directory manually.

Since the program adds files and link to this directory, sufficient permissions are necessary.

Handling of aliases

Aliases are managed via the Provides values in the font database. At the moment entries for the basic font names for CJK fonts are added:

Japanese

Ryumin-Light, GothicBBB-Medium, FutoMinA101-Bold, FutoGoB101-Bold, Jun101-Light

Korean

HYGoThic-Medium, HYSMyeongJo-Medium, HYSMyeongJoStd-Medium

Chinese

MHei-Medium, MKai-Medium, MSungStd-Light, STHeiti-Light, STHeiti-Regular, STKaiti-Regular, STSongStd-Light

Additional aliases

In addition, we also includes provide entries for the OTF Morisawa names:

RyuminPro-Light, , FutoMinA101Pro-Bold, , A-OTF-Jun101Pro-Light

The order how fonts are selected between alternative provides is determined by the Priority in the Provides setting in the font database, and for the Japanese fonts it is currently: Morisawa Pr6, Morisawa, Hiragino ProN, Hiragino, Yu OSX, Yu Win, Kozuka ProN, Kozuka, IPAex, IPA

That is, the first font found in this order will be used to provide the alias if necessary.

Overriding aliases

Using the command line option --alias LL=RR one can add arbitrary aliases, or override ones selected by the program. For this to work the following requirements of LL and RR must be fulfilled:

  • LL is not provided by a real font
  • RR is available either as real font, or as alias (indirect alias)

Font database

The format of the font database is as follows: Entries are defined by paragraphs, that is, they are separated by one or more empty lines. Each entry needs a line specifying the name (in particular the PSName), the type, the class, a list of possible file names, and a list of provides.

Example entry:

Name: HiraKakuPro-W3
Class: Japan
Provides(40): GothicBBB-Medium
Provides(40): A-OTF-GothicBBBPro-Medium
Filename(20): ヒラギノ角ゴ Pro W3.otf
Filename(10): HiraKakuPro-W3.otf

The possible values are:

  • Name – Free form, but should agree with the PSName of the fonts
  • Class – One of Japan, Korea, GB (for simplified Chinese), or CNS (for traditional Chinese)
  • Provides – A free-form name of a font to be provided. This will generate an alias entry unless the provided font is actually available as real font. Provides can have a priority in case several fonts provide the same alias. The priority is written in parenthesis between the Provide and the : (optional). Priorities are the lower the better.
  • Filename – A possible filename that provides the font. These file names are searched on the system. As with the provides, filenames can have an attached priority to select from multiple version of font names.

The current font database is contained in the script at the very end. But users who want to experiment can save the database to an external file, make changes to ones liking, and run the script with the -f or --fontdef option to provide an alternative font definition.

If you find a missing font or missing/wrong information, please inform the me via a github issue or email. Thanks.

Authors, Contributors, and Copyright

The script and its documentation was written by Norbert Preining, based on research and work by Yusuke Kuroki, Bruno Voisin, Munehiro Yamamoto and the TeX Q&A wiki page. Big thanks goes to all those three and many more for their excellent work.

The script is licensed under GNU General Public License Version 3 or later. The contained font data is not copyrightable.

Changelog since first release

  • 20160115.0:
    • fix link names of otf font links
    • safer remove option
    • ensure that we don’t create circular links
  • 20151002.0:
    • add support for OSX 10.11 El Capitan provided fonts
    • added 2004-{H,V} encodings for Japanese fonts
    • fix incorrect link name
    • rename --link-texmflocal to --link-texmf [DIR] with an optional argument
    • add a --remove option to revert the operation (does clean up completely only if the same set of fonts is found)

Technical notes

Font formats

I will not go into details of font formats, there are thousands of pages out there doing this. I only want to list the font formats with which we are concerned, namely Open Type fonts. Due to their history, these fonts come in two different variations. The more standard TTF form, containing TrueType outlines using quadratic Bézier curves. And the OTF/CFF form, containing CFF outlines using cubic Bézier curves. Basically, OpenType fonts are TrueType fonts with some additional tables, and in case of OTF fonts one special table contains the CFF outlines.

CID key fonts were developed to deal with the huge number of glyphs in CJK fonts. For each registry (mostly Adobe), each ordering (for example Japan1 or CNS), and each supplement (a number), there is a defined list of glyph to CID key (some integer). The detailed (and big) lists can be seen here for Adobe-Japan1-6, Adobe-CNS1-6, Adobe-GB1-5, Adobe-Korea1-2.

Furthermore, what kind of font is saved into a OTF/CFF font can also vary. There are CID Fonts saved into the OTF/CFF, and non-CID fonts.

In the following, when referring to OTF/CFF we will assume it contains actual CID fonts, and not any other font data.

CID fonts and encodings

For actually accessing glyphs in a CID fonts, one usually applies an encoding to them. The idea is that there is for each encoding a file that maps CID numbers to the right positions (glyph numbers) in the encoding. Encodings can be Unicode, or some local standard, or the CID values itself.

Ghostscript and (CID) fonts

Ghostscript’s Usage documentation lists describes in 8.2 and 8.3 the usage of CID fonts. Fonts and encodings in Post(Ghost)Script are specified by

CIDResource-Encoding

and 8.2 states that if the CID Resource is available, the font resource will be auto-constructed.

As it turned out, this is not the case for OTF/CFF fonts copied/linked into the CIDFont directory of the Ghostscript Resource directory. According to one of the developers the reason is the following:

We try to parse out the registry and ordering without actually loading the entire CIDFont (which can be very time consuming), but the code that does that, doesn’t understand OTTO fonts, and probably doesn’t understand CFF fonts either.

(here OTTO fonts are OTF/CFF fonts I guess)

TTF fonts

For OTF/TTF fonts, so those looking like normal ttf fonts and containing quadratic splines, the encoding thing is again done differently. The reason is that in principle Ghostscript is a PostScript interpreter, thus TTF fonts are not really within its realm. Well, that has changed since long, but still.

So for TTF fonts, one has to specify an Explicit CIDFont Substitution (again, see Use.html, Section 8.4 by providing some code that loads the TTF font and encodes it properly.

Here we were advised not to put the TTF fonts into the Resource/Font directory, because they are not font resources in the PostScript sense.

How to setup the fonts

So, how to piece all that together, if you want to make a CID font available to Ghostscript. The procedure varies according to font type.

OTF/CFF fonts

Again, assuming the font file contains a real CID fonts, there are two steps involved: Copying or linking the file into the Ghostscript’s Resource/CIDFont directory under the correct name, and creating pre-made snippets for the necessary font/encoding pairs.

  • Resource/CIDFont: One needs to find out the PostScript name. Normally the file name without extension is the PostScript name, but this is not always the case. To find out the PostScript name, use otfinfo from the LCDF TypeTools package, normally available in most distributions:
    $ otfinfo --postscript-name A-OTF-RyuminPro-Medium.otf 
    RyuminPro-Medium
    

    (here an example where the name differs!). Having found the correct PostScript name, copy or link the otf file to the CIDFont directory, where the link name is the PostScript name, without extension. In our case that would be

    Resource/CIDFont/RyuminPro-Medium -> /path/to/A-OTF-RyuminPro-Medium.otf
    
  • Font snippets: Here we have to create small files in Resource/Font for each combination of CIDFont and needed Encoding. Note that CIDFont here means the PostScript name. So one of such files for CIDFONT-ENCODING looks like:
    %%!PS-Adobe-3.0 Resource-Font
    %%%%DocumentNeededResources: ENCODING (CMap)
    %%%%IncludeResource: ENCODING (CMap)
    %%%%BeginResource: Font (CIDFONT-ENCODING)
    (CIDFONT-ENCODING)
    (ENCODING) /CMap findresource
    [(CIDFONT) /CIDFont findresource]
    composefont
    pop
    %%%%EndResource
    %%%%EOF
    

    Here all CIDFONT and ENCODING need to be replaced by proper values: For CIDFONT there should be a file Resource/CIDFont/CIDFONT, and for ENCODING a file Resource/CMap/ENCODING.

With these two things in place, the font should be available to Ghostscript by using

/CIDFONT-ENCODING findfont

TTF fonts

There are two steps involved: Creating entries in the file Resource/Init/cidfmap, and creating pre-made snippets for the necessary font/encoding pairs.

  • cidfmap entries: For each ttf font, one needs to know ordering/supplement. This is not trivial, and I don’t know how to find that out automatically. If you know this, say the values are ORDER for ordering, and SUPP for supplement, add a lines
    /NAME << /FileType /TrueType   /Path (PATH-TO-TTF-FILE)   /SubfontID 0   /CSI [(ORDER) SUPP] >> ;
    

    Note that for TrueType collections (.ttc) you need to select the proper Subfont ID.

  • Font snippets: As above, with NAME as the first part. (This might not be strictly necessary, though)

With this in place, the TTF fonts should be available, too, in the same way as the OTF/CFF fonts.

Addition: using relative path for TTF

In case you want to create setups in a way that they are self-contained, i.e., not referring to fonts somewhere in the system, you might copy or link the TTF fonts into the directory Resource/CIDFSubst, and then use the following code to load it:

/NAME << /FileType /TrueType    /Path pssystemparams /GenericResourceDir get    (CIDFSubst/FILENAME.ttf) concatstrings   /SubfontID 0   /CSI [(ORDER) SUPP] >> ;

Here the FILENAME.ttf is the link/copy of your font.

Vertical typesetting

We realized that vertical typesetting does not work in all cases. In fact, it works only with TrueType fonts, but not with OTF fonts, which is a big disappointment. During a chat session with the Ghostscript developers it turned out, that the problems lies in the fact that the OTF/CID fonts are not completely loaded, so the vertical metrics are not available. There is no quick fix for that, so for now those wanting to have vertical typesetting need to use TrueType fonts.

Multiple invocations

(not implemented!)

This script can be run several times, in case new fonts are installed into TEXMFLOCAL or font dirs.

If run a second time it does also:

  • check all files in Resource/CIDFont that are links into TEXMF trees whether they got dangling (and remove them in this case) (don’t do this for links to outside the TEXMF trees)
  • check the Resource/Font files for the comment header of the script remove/sync with newly found fonts
  • regenerate the cidfmap.local

Interoperation with kanji-config-updmap

(open – not clear if this is an aim at all)