Writing Japanese in LaTeX : Part 1 – Introduction
For Japanese people it might be obvious and easy how to write Japanese documents in LaTeX, and there are many pages (in Japanese) explaining details. But for the foreigner who is able to read and write a bit (especially with the computer), and needs to prepare handouts or slides in Japanese (like I have to do for my lectures), the mystery of writing Japanese text, with mathematics, and mixed with Roman letter text, is a big hurdle, especially since there are so many different options.
I am setting off to write down my limited knowledge on how to write documents containing Japanese, based on my own experience and help from many friends here in Japan. This first part introduces the various writing systems in use in Japan, the TeX engines and packages that can be used to typeset Japanese text, and a list of features. I finish with a very personal comparison of the options.
The Japanese writing system
(probably boring for most, but for completeness) In Japan at least 4 different writing systems are used:
- Romaji the normal Roman letters
- Hiragana ひらがな: A 45 character alphabet in syllabic style, that is most glyphs carry both a consonant and a vowel as sound. Generally used for grammatical expressions, connectives, or words where the Kanjis are too complicated 😉
- Katakana カタカナ: Absolutely parallel to the Hiragana, but the form is a bit more squarish. Used for foreign words.
- Kanji 日本文化: iconographic characters imported from China. Make up most of the words written.
Typesetting in Japanese is governed by a long list of rules. A good explanation can be found in this article by Haruhiko Okumura (奥村 晴彦). Very detailed and extensive documentation can be found at this W3C Working Group Note (although you need quite some time to read through all of it!). TeX by itself does not provide support for all the particularities of the Japanese writing conventions, so in 1987 Yasuki Saito (斉藤 康己) from NTT set out to create NTT jTeX, and three years later in 1990, Shunji Ohno (大野 俊治) and Ryoichi Kurasawa (倉沢 良一) from the ASCII Corporation added vertical typesetting capabilities, necessary for the needs of Japanese publishers. pTeX, or Publishing TeX, was born. More details can be found in the already mentioned article of Okumura-sensei.
Although originally there was only the pTeX engine available, over the years several alternatives and extensions have been developed. To my knowledge, the following options are currently available in TeX Live:
- (e)p(la)tex: the original pTeX, its extension with eTeX primitives to work with current latex
- up(la)tex: another extension of ptex for UTF8 support, which also includes the eTeX primitives
- xe(la)tex: another long term player, UTF8 support and many other language/writing system support
- pdf(la)tex with cjk package: Supports not only Japanese, but also Chinese and Korean, but is somehow outdated
- pdf(la)tex with bxcjkjatype package: more modern package, very easy to handle, but several limitations
- lua(la)tex with luatexja package: the new player, with excellent features and support
There are a few other options on CTAN, especially zhmCJK, but it isn’t in TeX Live, and I had a hard time setting it up, so I cannot comment on it.
Requirements on the engine/package
When writing Japanese text, there are many features one may want to have, let us list those I know (or I care for):
- eTeX primitives
- a set of primitives first implemented in etex engine, and are now required for the LaTeX format
- Japanese typographic rules
- As mentioned above, the rules for typesetting Japanese are quite intricated.
- vertical typesetting
- non-technical text, books, newspaper, many printed items are written with vertical typesetting
- UTF8 support
- pTeX traditionally used a Japanese standard encoding, which is – especially for foreigners – not optimal. Most new computer systems are running natively on UTF8
- Commercial font support
- Having the ability to choose different fonts for Japanese characters
- Graphic support
- Inclusion of images of various types
The engine/package – feature matrix, as far as I understand, looks like below. Here O stands for excellent support, V for partical support, and X for no support (and ? for I don’t know!):
Conclusions (for today)
Selecting the “right” option is a non-trivial thing. For my first steps I have used pdflatex/CJK, and then (u)ptex. Then, since I had to make slides using tikz etc, I switched to pdflatex/bxcjkjatype. Recently I am converting all my documents to luatexja, as it provides the best mix of features for me.
Here a short summary of my view onto advantages and disadvantages
- (u)ptex: There are two advantages of this group of engines: One is that they support vertical typesetting. And in fact, if you need vertical typesetting, there is no other way to go then one of them. The other advantage is that these engines are the original implementation and closest to the typographic standard of Japan. Unfortunately, there is no development going on for ptex, while uptex at least has small changes. The big disadvantage of this group of engines is that it is based on old code, so it cannot produce pdf directly, nor supports lots of different graphic formats.
- xe(la)tex: here I hardly have any experience, so I cannot comment on it
- pdftex/CJK is simple outdated for Japanese, I wouldn’t recommend it
- pdftex/bxcjkjatype has excellent features, works very well, without much setup, but has two disadvantages: One is that all non-ascii characters are treated as Kanji, which means you cannot type in characters like ö (for Gödel) as is, but have to write them as \"o, which I don’t like.
The other disadvantage is a very particular one, linked to the way Kanjis are included from the fonts: I use Rikaichan a lot, a plugin for Firefox and Thunderbird, that allows translating from Japanese to English by moving the mouse pointer over a text in the browser. No clicking, no copying, and even works with inflections. So I sometimes load pdfs into the browser (recent versions have pdf viewers built in), just to easily read complicated texts with lots of to me unknown kanjis. But this doesn’t work with bxcjkjatype, as the glyphs often come from different fonts (the thousands of glyphs necessary for Japanese are split into several different type1 fonts).
- luatexja: The absolute winner for me. As I normally don’t need vertical typesetting, it combines all the best features, and let me work like in a normal TeX document. Also the problems with Rikaichan are gone, tikz can be easily used, all graphics. My recommendation.
In the next part I will take a simple Hello World document and prepare it for the different engines/packages, so that we see how to write the simplest documents.
Thanks for any comments, suggestions, corrections, improvements.