Kobo Japanese Dictionary Enhancer

The following script takes a Japanese dictionary as provided by Kobo (dicthtml-ja.zip for firmwares before 3.17.0, dicthtml-jaxxdjs.zip since 3.17.0) and enriches the dictionary with definitions from some other dictionary, like English or German translations. In effect you get a Japanese-English or Japanese-German dictionary (which is not provided by Kobo) plus the Japanese explanations.

kobo-japanese-dictionary-enhancer

Prerequisites

The script is neither fool-proof nor does everything by itself. Furthermore, it requires a set of programs. In details:

  • Unix/Linux computer I haven’t tried anything of this on a Windows machine, but I am happy about feedback. I am working on a version that does not depend on external programs, and thus might be much more portable.
  • Dictionaries A copy of the Edict dictionary – see below for details on dictionaries.
  • 7z A standard zip/unzip program that also takes the locale into account when unpacking (different to unzip that I have access to)
  • Perl modules Various Perl modules that should be standard on most installations: Getopt::Long, File::Temp, File::Basename, and Cwd.

Supported dictionaries

At the moment the program can use dictionaries of the following two formats: Edict2 and Japanese3.

Edict2

The Edict2 format was developed by the The JMdict/Edict Project. They provide a very complete dictionary for English translations. Many other dictionaries are based on the English edict. Dictionaries for other languages, e.g. German, are also available in the Edict2 format.

English

To use the dict with the current program, one need to download the edict2.gz file and unpack it with gunzip edict2.gz. If you put it into the same directory as the Perl script, nothing else needs to be done, otherwise one can use the command line option --dict PATH-TO-EDICT to specify the location of the edict2 file.

German

The Wadoku project has a very extensive Japanese-German dictionary, and they provide their dictionary in Edict2 format here.

Japanese3

The Japanese3 is a dictionary application for iOS which provides another very complete Japanese-English dictionary. My feeling is that it is 90% and more based on Edict, so adding it will not buy you much, but still a bit (see below for how much!).

If you have purchased this dictionary/application, and you manage to get access to your iOS device (via jailbreaking or some other tools), then you need to get the file Japanese3.db from the application folder, and then generate a file via sqlite3 as follows:

$ sqlite3 Japanese3.db
.output japanese3-data
select Entry, Furigana, Summary from entries;

If you save the generated file japanese3-data into the same directory as the Perl script, nothing else needs to be done, otherwise one can use the command line option --japanese3 PATH-TO-J3-DATA to specify the location of the Japanese3 data file.

Example runs

Simple run

Assume that we have the following files in the current working directory: enhance-dictionary.pl, edict2, and dicthtml-jaxxdjs.zip. Under these conditions a simple invocation of enhance-dictionary.pl will already provide an enriched dictionary with English translations:

$ ls
dicthtml-jaxxdjs.zip edict2 enhance-dictionary.pl
$ perl enhance-dictionary.pl 
Using the following dictionaries as source for translations: edict2:edict2
loading edict2 type from edict2 ... done
loading Japanese3 data from japanese3-data ... done
loading and merging dict files ... done
total entries: 922380
entries used from edict2: 326064
creating output html ... done
creating update dictionary in dicthtml-jaxxdjs-YYYYMMDDDmmss.zip ... done
$

German translations

If you have downloaded and unpackaged the Wadoku Edict2 file, you can choose this one instead of the English version:

$ ls
dicthtml-jaxxdjs.zip enhance-dictionary.pl wadokudict_20150705/
$ perl enhance-dictionary.pl --dict=wadokudict_20150705/wadokudict
Using the following dictionaries as source for translations: wadokudict_20150705/wadokudict
loading edict2 type from wadokudict_20150705/wadokudict ... done
loading and merging dict files ... done
total entries: 922380
entries used from wadokudict_20150705/wadokudict: 368943
creating output html ... done
creating update dictionary in dicthtml-jaxxdjs-YYYYMMDDDmmss.zip ... done
$

You see, that there are actually more translations in the German than in the English dictionary.

Merged translations

One feature that is quite nice is to have both English and German translations merged into the Japanese dictionary. This is possible with the --merged command line option:

$ ls
dicthtml-jaxxdjs.zip edict2 enhance-dictionary.pl wadokudict_20150705/
$ perl enhance-dictionary.pl --dict=wadokudict_20150705/wadokudict --dict=edict2 --merge
Using the following dictionaries as source for translations: wadokudict_20150705/wadokudict edict2
loading edict2 type from wadokudict_20150705/wadokudict ... done
loading edict2 type from edict2 ... done
loading and merging dict files ... done
total entries: 922380
entries used from wadokudict_20150705/wadokudict: 367755
entries used from edict2: 324938
creating output html ... done
creating update dictionary in dicthtml-jaxxdjs-YYYYMMDDDmmss.zip ... done
$

As an effect you get German and English translations shown, see the following screenshot:

kobo-dict-de-en

Command line options

The program supports the following command line options:

  • -h, --help Print this message and exit.
  • -i, --input location of the original Kobo GloHD dict, defaults to dicthtml-jaxxdjs.zip. This can of course point to one of the old dictionaries named dicthtml-ja.zip to enhance those.
  • -o, --output name of the output file, defaults to dicthtml-jaxxdjs-TIMESTAMP.zip
  • --dict=[TYPE:]PATH specifies dictionaries and their type to be used, can be given multiple times, possible values for TYPE are edict2 and japanese3. The order determines the priority of the dictionary.
  • plus some debugging and dev options, see the output of --help or read the code.

Note that the unpacked directory contains lots of .gif and .html files that are actually gzip-compressed files, but lacking the .gz extension.

If you want to unpack the dictionary, by advised that the file names in the Zip directory are already encoded in UTF-8, but normal programs (unzip, 7z) assume that they are encoded in some other encoding. Thus, if your are using UTF-8 locale, it is necessary to set LC_CTYPE to C to make sure that the encoding is used, as in

LC_CTYPE=C 7z x ...

Installing the dictionary

After having created the enhanced dictionary, one can install the generated file as KOBO/.kobo/dicts/dicthtml-jaxxdjs.zip (where KOBO is the mount point of the eReader). The dictionary should be picked up automatically. And the next lookup should give you something like the following:
kobo-glohd-enhanced-ja-dict

There is only one caveat: Syncing with Kobo will re-download the original dictionary and overwrite the enhanced one. There is at least one solution here, which I am employing, see this post on MobileRead.

Current status

Practically all non-compound words have translations, both in Kanji and Hiragana writing. Words that are variants or compounds often don’t have. In my daily routine reading several Japanese book, I normally find translations for all the words I look up.

Download and development place

I am using the github repository kobo-ja-dict-enhance to develop the program. Please report bugs, and feature requests there.

The file for the last released version can be downloaded from here: enhance-dictionary.pl

Future plans

I am planning to get rid of the 7z dependency and use Archive::Zip for all the unpacking and packing. This would also allow to do everything in memory and thus make it faster. There is a branch on github with ongoing work on this matter.

Conclusions

Even if Kobo does not provide us with a decent Japanese-English dictionary, adding at least a huge amount of translations to the current dictionary is now easily possible. For serious students of Japanese who are reading – or starting to read – Japanese eBooks, this will be of great help.

Enjoy, and don’t forget to give feedback, suggestions, and improvements either here or better via the Github page.