Legacy Documentclose button

Important: The information in this document is obsolete and should not be used for new development.

Previous Book Contents Book Index Next

Inside Macintosh: Text /
Chapter 8 - Dictionary Manager


About Dictionaries for Input Methods

Input methods for 2-byte script systems use dictionaries, data files with information essential to the text conversions they perform. An input method uses its dictionary to convert the raw text entered by the user. For a discussion of raw text, conversion, and input methods, see the chapter "Text Services Manager" in this book.

Input methods commonly rely upon two or more dictionaries to perform conversion most efficiently. The main dictionary lists all standard conversion options for any valid syllabic or phonetic input. A main dictionary may have thousands to tens of thousands of entries, and is usually fixed in content. The user dictionary, also called an editable dictionary, is a complementary file in which users can add specialized or custom information that does not exist in the main dictionary. Because the main dictionaries of many input methods have only about 80 percent of the needed conversion options, a user dictionary is extremely valuable to users who customize the input process to improve its precision.

Users can also set dictionary learning. This allows the input method to incorporate frequency information as the user works, so that the frequency of combinations in a particular grammatical context is taken into account in doing conversions. This makes a user dictionary even more valuable to the individual that has worked with it for a long time.

In principle, the dictionaries for different input methods of a given writing system should be very similar. For instance, most Japanese dictionaries contain information relevant to the conversion of Hiragana to Kanji. Korean dictionaries consist of data necessary for the conversion of Hangul to Hanja. Chinese dictionaries have entries relevant to the conversion of radical to Hanzi, Zhuyinfuhao to Hanzi, or Pinyin to Hanzi.

In practice, however, many currently available Chinese, Japanese, and Korean input methods use their own dictionary formats. Each input method has independently implemented operations to insert, delete, and search for the entries in its own dictionaries.

Input methods that use their own dictionary formats can understand only the dictionaries they create. This may be desirable for main dictionaries, because the features of a main dictionary can distinguish the quality of one input method from another; input method developers may be hesitant to share such dictionaries with other vendors. But for user dictionaries, incompatible formats create serious difficulties for users--particularly when a user dictionary contains many entries.

Consider the following situation. A user purchases an input method and uses it for perhaps a year, making numerous entries in the user dictionary. Then a new and better input method is introduced, but the new input method cannot understand the customized user dictionary. Because there is no general dictionary format, the user is forced to choose between two undesirable alternatives: creating an entirely new user dictionary by manually keying in thousands of previous entries, or continuing to use the old input method, forgoing the benefits of the new one.

This chapter describes a dictionary format that allows user dictionaries to be carried over from one input method to another, to avoid the difficulty just described. And although dictionaries are primarily of use to input methods, and are discussed in that context here, other text services such as thesauri or spelling checkers can also benefit from using dictionaries with this format.


Previous Book Contents Book Index Next

© Apple Computer, Inc.
6 JUL 1996