Writing Japanese in LaTeX : Part 1 – Introduction

For Japanese people it might be obvious and easy how to write Japanese documents in LaTeX, and there are many pages (in Japanese) explaining details. But for the foreigner who is able to read and write a bit (especially with the computer), and needs to prepare handouts or slides in Japanese (like I have to do for my lectures), the mystery of writing Japanese text, with mathematics, and mixed with Roman letter text, is a big hurdle, especially since there are so many different options.

nihongo-latex

I am setting off to write down my limited knowledge on how to write documents containing Japanese, based on my own experience and help from many friends here in Japan. This first part introduces the various writing systems in use in Japan, the TeX engines and packages that can be used to typeset Japanese text, and a list of features. I finish with a very personal comparison of the options.

The Japanese writing system

(probably boring for most, but for completeness) In Japan at least 4 different writing systems are used:

  • Romaji the normal Roman letters
  • Hiragana ひらがな: A 45 character alphabet in syllabic style, that is most glyphs carry both a consonant and a vowel as sound. Generally used for grammatical expressions, connectives, or words where the Kanjis are too complicated 😉
  • Katakana カタカナ: Absolutely parallel to the Hiragana, but the form is a bit more squarish. Used for foreign words.
  • Kanji 日本文化: iconographic characters imported from China. Make up most of the words written.

In a normal text, these four alphabets appear in permanent mixture. The following screenshot of the Japanese Wikipedia entry on Japan shows all four writing styles in a short paragraph:

Japanese-Text-Example

Typesetting in Japanese is governed by a long list of rules. A good explanation can be found in this article by Haruhiko Okumura (奥村 晴彦). Very detailed and extensive documentation can be found at this W3C Working Group Note (although you need quite some time to read through all of it!). TeX by itself does not provide support for all the particularities of the Japanese writing conventions, so in 1987 Yasuki Saito (斉藤 康己) from NTT set out to create NTT jTeX, and three years later in 1990, Shunji Ohno (大野 俊治) and Ryoichi Kurasawa (倉沢 良一) from the ASCII Corporation added vertical typesetting capabilities, necessary for the needs of Japanese publishers. pTeX, or Publishing TeX, was born. More details can be found in the already mentioned article of Okumura-sensei.

Typesetting Japanese

Although originally there was only the pTeX engine available, over the years several alternatives and extensions have been developed. To my knowledge, the following options are currently available in TeX Live:

  • (e)p(la)tex: the original pTeX, its extension with eTeX primitives to work with current latex
  • up(la)tex: another extension of ptex for UTF8 support, which also includes the eTeX primitives
  • xe(la)tex: another long term player, UTF8 support and many other language/writing system support
  • pdf(la)tex with cjk package: Supports not only Japanese, but also Chinese and Korean, but is somehow outdated
  • pdf(la)tex with bxcjkjatype package: more modern package, very easy to handle, but several limitations
  • lua(la)tex with luatexja package: the new player, with excellent features and support

There are a few other options on CTAN, especially zhmCJK, but it isn’t in TeX Live, and I had a hard time setting it up, so I cannot comment on it.

Requirements on the engine/package

When writing Japanese text, there are many features one may want to have, let us list those I know (or I care for):

eTeX primitives
a set of primitives first implemented in etex engine, and are now required for the LaTeX format
Japanese typographic rules
As mentioned above, the rules for typesetting Japanese are quite intricated.
vertical typesetting
non-technical text, books, newspaper, many printed items are written with vertical typesetting
UTF8 support
pTeX traditionally used a Japanese standard encoding, which is – especially for foreigners – not optimal. Most new computer systems are running natively on UTF8
Commercial font support
Having the ability to choose different fonts for Japanese characters
Graphic support
Inclusion of images of various types

The engine/package – feature matrix, as far as I understand, looks like below. Here O stands for excellent support, V for partical support, and X for no support (and ? for I don’t know!):

eTeX
LaTeX
typo vertical utf8 comm.
fonts
graphics
ptex X O O X O V
eptex O O O X O V
uptex O O O O O V
xetex O V V O O O
pdftex/cjk O V X O ? O
pdftex/bxcjkjatype O V X O X O
luatexja O O X O O O

Conclusions (for today)

Selecting the “right” option is a non-trivial thing. For my first steps I have used pdflatex/CJK, and then (u)ptex. Then, since I had to make slides using tikz etc, I switched to pdflatex/bxcjkjatype. Recently I am converting all my documents to luatexja, as it provides the best mix of features for me.

Here a short summary of my view onto advantages and disadvantages

  • (u)ptex: There are two advantages of this group of engines: One is that they support vertical typesetting. And in fact, if you need vertical typesetting, there is no other way to go then one of them. The other advantage is that these engines are the original implementation and closest to the typographic standard of Japan. Unfortunately, there is no development going on for ptex, while uptex at least has small changes. The big disadvantage of this group of engines is that it is based on old code, so it cannot produce pdf directly, nor supports lots of different graphic formats.
  • xe(la)tex: here I hardly have any experience, so I cannot comment on it
  • pdftex/CJK is simple outdated for Japanese, I wouldn’t recommend it
  • pdftex/bxcjkjatype has excellent features, works very well, without much setup, but has two disadvantages: One is that all non-ascii characters are treated as Kanji, which means you cannot type in characters like ö (for Gödel) as is, but have to write them as \"o, which I don’t like.
    The other disadvantage is a very particular one, linked to the way Kanjis are included from the fonts: I use Rikaichan a lot, a plugin for Firefox and Thunderbird, that allows translating from Japanese to English by moving the mouse pointer over a text in the browser. No clicking, no copying, and even works with inflections. So I sometimes load pdfs into the browser (recent versions have pdf viewers built in), just to easily read complicated texts with lots of to me unknown kanjis. But this doesn’t work with bxcjkjatype, as the glyphs often come from different fonts (the thousands of glyphs necessary for Japanese are split into several different type1 fonts).
  • luatexja: The absolute winner for me. As I normally don’t need vertical typesetting, it combines all the best features, and let me work like in a normal TeX document. Also the problems with Rikaichan are gone, tikz can be easily used, all graphics. My recommendation.

In the next part I will take a simple Hello World document and prepare it for the different engines/packages, so that we see how to write the simplest documents.

Thanks for any comments, suggestions, corrections, improvements.

6 Responses

  1. asdfjkl says:

    since you didn’t comment on xelatex, I’ll throw in the following:
    … when I rewrote the thesis proposal stylesheet for CS students at our all beloved University in Ishikawa I found pTex extremely difficult to set up and use. The main difficulty is to actually edit a document in Shi(f)t JIS (or was it EUC?). Emacs could handle it (of course), but I couldn’t get any of the fancy Tex-IDEs (i.e. Texmaker) working.
    I could however get xelatex on the other hand working with a simple sudo apt-get install, and unicode support (“Mr. Äüüo Öäüöö”), custom font-selection (i.e. using Microsoft Mincho, Gothic or the new fancy Meiryu) was no problem, and if I remember correctly, tategaki was also not difficult…
    The only main drawback was the lack of some Japanese typographic rules, but these were quite special problems like kunten (and maybe microtypography?)…

    • Hi

      thanks for your comment. Yes, this is a known problem. Fortunately, uptex can use UTF-8, and current ptex from TeX Live also accepts UTF-8 (by default on Unix, by command line switch on Windows).

      It is true that xelatex gives quick and easy results, but vertical writing is achieved by rotating the whole page (AFAIR).

      I will probably write one part on setting up various editors (emacs, texworks, texstudio maybe) to work for Japanese typesetting, too.

  2. Anonymous says:

    Can you tell me how to get luatexja to recognize NotoSansCJK? I’ve just installed TexLive 2015, and I get gibberish when I try to use NotoSansCJK.

    Thanks

  1. 2014/12/15

    […] 本文为译文,原文载于这里。 […]

Leave a Reply to asdfjkl Cancel reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>