Previous Book Contents Book Index Next

Inside Macintosh: Sound /
Chapter 4 - Speech Manager / Using the Speech Manager


Legacy Documentclose button

Important: Inside Macintosh: Sound is deprecated as of Mac OS X v10.5. For new audio development in Mac OS X, use Core Audio. See the Audio page in the ADC Reference Library.

Including Pronunciation Dictionaries

No matter how sophisticated a speech synthesis system is, there will always be words that it does not automatically pronounce correctly. A clear instance of words that are often mispronounced is the class of proper nouns (names of people, place names, and so on). The Speech Manager supports pronunciation dictionaries which allow applications to override the default pronunciations of words. A pronunciation dictionary is a list of words along with their associated pronunciations stored in a resource of resource type 'dict'.

The application is free to store dictionaries in either the resource fork or the data fork of a file. The application is responsible for loading the individual dictionaries into RAM and then passing a handle to the dictionary data to the Speech Manager. The initial release of the Speech Manager, however, does not include any routines that can add entries to dictionaries or manipulate them in other ways. The Speech Manager does include a routine, the UseDictionary function, that you can use to install one or more pronunciation dictionaries in a speech channel.

A multimedia application might store such a pronunciation dictionary resource in its own resource fork to specify the pronunciations of selected words used in a narration. A word-processing application, meanwhile, could allow a user to add words to a pronunciation dictionary stored in the resource fork of a text file. Or, a text-services application dedicated to speech generation might include large specialized dictionaries--for example, of medical terms--to specify pronunciation of words in particular subject areas. Because the Speech Manager allows your application to install as many pronunciation dictionaries as desired in a speech channel, it can use pronunciation dictionaries in one or more of these ways.

Note
The Dictionary Manager, described in Inside Macintosh: Text, cannot be used with pronunciation dictionaries.
Whenever a speech synthesizer needs to determine the proper phonemic representation for a particular word, it first looks for the word in its pronunciation dictionaries. Pronunciation dictionary entries contain information that enables precise conversion between text and the correct phoneme codes, as described in "Phonemic Representation of Speech" beginning on page 4-32. Pronunciation dictionary entries also provide stress, intonation, and other information to help speech synthesizers produce more natural speech, as described in "Prosodic Control Symbols" beginning on page 4-34. Note that you cannot use punctuation marks (as described in Table 4-5) in pronunciation dictionaries.

A single pronunciation dictionary entry cannot be used to specify the pronunciation of an entire phrase, because the Speech Manager checks its pronunciation dictionary on a word-by-word basis. Thus, the textual portion of a pronunciation dictionary entry must not contain any spaces.

If the pronunciation dictionaries installed in a speech channel do not include an indication of how a word should be pronounced, then the Speech Manager uses its own pronunciation rules and internal dictionary to pronounce the words. In general, you need to create a dictionary only for unusual words that your application requires but the Speech Manager ordinarily pronounces incorrectly. You might also allow a user who is not pleased with the default pronunciation of a word to add the correct pronunciation to a pronunciation dictionary. You can create a dictionary using MPW Rez or another appropriate tool. See "The Pronunciation Dictionary Resource" beginning on page 4-89 for a discussion of the format of the pronunciation dictionary resource and the meaning of it fields.

To install a pronunciation dictionary resource in a speech channel, you must read the resource into memory and pass it to the UseDictionary function. Because the UseDictionary function requires that you specify a speech channel, you might need to reinstall the dictionary whenever your application allocates a new speech channel or whenever it resets an existing speech channel. Listing 4-9 shows how you can use the UseDictionary function to install a pronunciation dictionary resource in a speech channel.

Listing 4-9 Installing a pronunciation dictionary resource into a speech channel

PROCEDURE MyUseDictionary (chan: SpeechChannel; resID: Integer);
VAR
   myDict:     Handle;                          {handle to dictionary data}
   myErr:      OSErr;
BEGIN
   myDict := GetResource('dict', resID);        {load the dictionary}
   IF (myDict <> NIL) AND (ResError = noErr) THEN
   BEGIN
      myErr := UseDictionary(chan, myDict);     {install the dictionary}
      IF myErr <> noErr THEN
         DoError(myErr);                        {respond to an error}
      ReleaseResource(myDict);                  {release the resource}
   END;
END;
The MyUseDictionary procedure defined in Listing 4-9 attempts to find a resource of resource type 'dict' with resource ID resID and uses the Resource Manager to read it into memory. If your application stores pronunciation dictionaries in the data fork of files, it can instead use analogous File Manager routines to read the data. If the data is read in correctly, MyUseDictionary calls the UseDictionary function to install the dictionary on the specified speech channel. Because the speech synthesizer copies all necessary data from the dictionary to its internal buffers, the application is free to release the memory occupied by the dictionary, as illustrated by the ReleaseResource call.

The pronunciation dictionary resource in Listing 4-10 consists of pronunciation dictionary entries in Rez format. Each entry specifies a word in textual format and its phonemic equivalent.

Listing 4-10 A sample pronunciation dictionary resource

resource 'dict' (1, "TestDict") {
   smRoman, langEnglish, verUS, ThisSecond,
   {
      pron, {tx, "ROOSEVELT",    ph, "_1EHf_d1IY_1AAr"},
      pron, {tx, "CHELSEA",      ph, "_C1EHls2IY"},
      pron, {tx, "AMHERST",      ph, "_2UXmAXrst"},
      pron, {tx, "REDSOX",       ph, "_r1EHd_s1AAks"},
      pron, {tx, "HALLOWEEN",    ph, "_h1AAl2OW_w1IYn"},
      pron, {tx, "FELIX",        ph, "_f1IYl2IHks_D2UX_k1AEt"},
      pron, {tx, "WEDNESDAY",    ph, "_m1IHd_w1IYk"},
   },
};
Note that you are not restricted to using pronunciations similar to those of the words listed. Typically, however, pronunciation dictionaries contain entries for words that the Speech Manager pronounces unsatisfactorily.

Also, note that a pronunciation dictionary's entries need not be in any particular order. In particular, you should not assume that a pronunciation dictionary is in alphabetical order unless your application creates the dictionary and maintains that order.

The pronunciation dictionary resource header consists of nine fields, of which four must be explicitly defined in a Rez definition such as the one in Listing 4-10. The first three of these fields specify the script, language, and region code of the language for which the pronunciation dictionary is designed. Note that you must create a separate pronunciation dictionary for each region, language, or script. The fourth field of a pronunciation dictionary is the date the pronunciation dictionary was last modified, in terms of seconds since midnight, January 1, 1904. In Listing 4-10, it is assumed that the constant ThisSecond is defined to be such a date. For information on obtaining information about the current date in this format, see Inside Macintosh: Operating System Utilities.


Previous Book Contents Book Index Next

© Apple Computer, Inc.
2 JUL 1996