Japanese-English dictionary for Kobo

Since ever the Kobo firmwares also allowed downloading of a bunch of dictionaries, most of which I don’t need. As I am fluent in most languages I read and write, the only real dictionary I would like to see is a Japanese-English (I don’t dare asking for a Japanese-German). Unfortunately, Kobo never shipped one. OTOH, starting with firmware 3.16.10 they ships two different English-Japanese dictionaries, one excellent Japanese-Japanese dictionary, but not one Japanese-English. So I took the liberty to write a script that allows everyone to enrich the shipped Japanese-Japanese dictionary with English definitions.

Kobo Logo

Till now I have used Tshering’s excellent Japanese dictionary which was enriched with English definitions. It was based on the older Japanese-Japanese dictionary shipped with firmwares before 3.16.10. I didn’t want to loose the more complete dictionary from the new firmware, so here is a script that updates the dictionary for you…

Prerequisites

The script is neither fool-proof nor does everything by itself. Furthermore, it requires a set of programs. In details:

  • Unix/Linux computer I haven’t tried anything of this on a Windows machine, but I am happy about feedback. I am working on a version that does not depend on external programs, and thus might be much more portable.
  • Dictionaries A copy of the Edict dictionary – see below for details on dictionaries.
  • 7z A standard zip/unzip program that also takes the locale into account when unpacking (different to unzip that I have access to)
  • Perl modules Various Perl modules that should be standard on most installations: Getopt::Long, File::Temp, File::Basename, and Cwd.

Supported dictionaries

At the moment the program can use the following two dictionaries as sources: Edict2 and Japanese3.

Edict2

The Edict dictionary is a free dictionary which is the base of most other dictionaries. Created by the JMdict/EDICT Project it provides a very complete Japanese-English dictionary.

To use the dict with the current program, one need to download the edict2.gz file and unpack it with gunzip edict2.gz. If you put it into the same directory as the Perl script, nothing else needs to be done, otherwise one can use the command line option --edict PATH-TO-EDICT to specify the location of the edict2 file.

Japanese3

The Japanese3 is a dictionary application for iOS which provides another very complete Japanese-English dictionary. My feeling is that it is 90% and more based on Edict, so adding it will not buy you much, but still a bit (see below for how much!).

If you have purchased this dictionary/application, and you manage to get access to your iOS device (via jailbreaking or some other tools), then you need to get the file Japanese3.db from the application folder, and then generate a file via sqlite3 as follows:

$ sqlite3 Japanese3.db
.output japanese3-data
select Entry, Furigana, Summary from entries;

If you save the generated file japanese3-data into the same directory as the Perl script, nothing else needs to be done, otherwise one can use the command line option --japanese3 PATH-TO-J3-DATA to specify the location of the Japanese3 data file.

Command line options

The program supports the following command line options:

  • -h, --help Print this message and exit.
  • -i, --input location of the original Kobo GloHD dict, defaults to dicthtml-jaxxdjs.zip. This can of course point to one of the old dictionaries named dicthtml-ja.zip to enhance those.
  • -o, --output name of the output file, defaults to dicthtml-jaxxdjs-TIMESTAMP.zip
  • --dicts dictionaries to be used, can be given multiple times, possible values are edict2 and japanese3. The order determines the priority of the dictionary.
  • -e, --edict location of the edict2 file, defaults to edict2
  • -j, --japanese3 location of the japanese3 file, defaults to japanese3-data
  • –keep-input keep the unpacked directory
  • –keep-output keep the updated output directory
  • -u, –unpacked location of an already unpacked original dictionary

Note that in case you pass in the option --unpacked, the files should be properly named (encodings are a problem!). Furthermore, note that the unpacked directory contains lots of .gif and .html files that are actually gzip-compressed files, but lacking the .gz extension.

If you want to unpack the dictionary, by advised that the file names in the Zip directory are already encoded in UTF-8, but normal programs (unzip, 7z) assume that they are encoded in some other encoding. Thus, if your are using UTF-8 locale, it is necessary to set LC_CTYPE to C to make sure that the encoding is used, as in

LC_CTYPE=C 7z x ...

Typical run

In the following example we use both dict2 and Japanese3 dictionaries, and prefer Edict2:

$ perl enhance-dictionary.pl --dicts edict2 --dicts japanese3
Using the following dictionaries as source for translations: edict2 japanese3
loading edict2 ... done
loading Japanese3 data ... done
unpacking original dictionary ... done
loading dict files ... done
searching for words and updating ... done
total words 805521, matches: 130563 (edict: 123388, japanese3: 7175)
creating output html ... done
creating update dictionary in dicthtml-jaxxdjs-201508072201.zip ... done
$

Installing the dictionary

After having created the enhanced dictionary, one can install the generated file as KOBO/.kobo/dicts/dicthtml-jaxxdjs.zip (where KOBO is the mount point of the eReader). The dictionary should be picked up automatically. And the next lookup should give you something like the following:
kobo-glohd-enhanced-ja-dict

There is only one caveat: Syncing with Kobo will re-download the original dictionary and overwrite the enhanced one. There is at least one solution here, which I am employing, see this post on MobileRead.

Download and development place

I am using the github repository kobo-ja-dict-enhance to develop the program. Please report bugs, and feature requests there.

The file for the last released version can be downloaded from here: enhance-dictionary.pl

Future plans

I am planning to get rid of the 7z dependency and use Archive::Zip for all the unpacking and packing. This would also allow to do everything in memory and thus make it faster. There is a branch on github with ongoing work on this matter.

I also think about including translations for Hiragana words by providing the translations of all possible Kanji readings.

Conclusions

Even if Kobo does not provide us with a decent Japanese-English dictionary, adding at least a huge amount of translations to the current dictionary is now easily possible. For serious students of Japanese who are reading – or starting to read – Japanese eBooks, this will be of great help.

Enjoy, and don’t forget to give feedback, suggestions, and improvements either here or better via the Github page.

5 Responses

  1. kitkatneko says:

    Good stuff – and I though I was going to buy a kindle!

  2. Youkai says:

    I just did this; thanks!

    I don’t have access to an iOS device, but you can do this using the Android version of the Japanese app. If you have an Android device, download the Android version of the Japanese by renzo app (it’s free!), then on your internal memory card, go to /sdcard/Android/obb/com.renzo.japanese and copy the .obb file to your PC. Then download apktool and then use it on the .obb file to extract its contents. Somewhere in there is the Japnese4.db file. Use the sqlite instructions above and make sure the outputted file is named Japanese3-data so Norbert’s program will detect the file. Then follow the rest of the instructions as originally written.

    I also had to use the UTF-8 version of Edict2 as using the normal version spat out errors using the enhance program. But it all seemed to work perfectly. Thanks!

    • Hi Youkai,
      thanks for the information. I didn’t know that Japanese has an Android version, too. Mostly because I have switched to aedict which is absolutely spectacular, and allows also on-the-fly translation (via copy), and usage of multiple edict dicts, like German wadoku. So I completely out of the Japanese dict for now, but again, thanks for your investigation and explanations!

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>