Previous Book Contents Book Index Next

Inside Macintosh: Sound /
Chapter 4 - Speech Manager / Using the Speech Manager


Legacy Documentclose button

Important: Inside Macintosh: Sound is deprecated as of Mac OS X v10.5. For new audio development in Mac OS X, use Core Audio. See the Audio page in the ADC Reference Library.

Phonemic Representation of Speech

The Speech Manager allows your application to process text phonemically. If your application speaks only text that the user writes, this feature is unlikely to be useful to you, because you cannot anticipate what the user might enter. However, if there are a few or many sentences that your application frequently converts into speech, it might be useful to represent parts of these sentences phonemically rather than textually.

It might be useful to convert your text into phonemes during application development in order to be able to reduce the amount of memory required to speak. If your application does not require the text-to-phoneme conversion portion of the speech synthesizer, significantly less RAM might be required to speak with some synthesizers.

Additionally, you might be able to use a higher quality text-to-phoneme conversion process (even one that does not work in real time) to generate precise phonemic information. This data can then be used with any speech synthesizer to produce better speech. For example, you might convert textual to phonemic data on a future version of the Speech Manager that performs such conversions more accurately than the Speech Manager currently does; that phonemic data could then be used to generate speech with any version of the Speech Manager. The Speech Manager's TextToPhonemes function provides an easy method for converting text into its default phonemic equivalent.

To help the Speech Manager differentiate a textual representation of a word from a phonemic representation, you must embed commands in text that inform the Speech Manager to change into a mode in which it interprets a buffer of text as a phonemic representation of speech, in which particular combinations of letters represent particular phonemes. (You can also use the SetSpeechInfo function to change to phoneme mode.) To indicate to the Speech Manager that subsequent text is a phonemic representation of text to be spoken, embed the [[inpt PHON]] command within a string or buffer that your application passes to one of the SpeakString, SpeakText, or SpeakBuffer functions. To indicate that the Speech Manager should revert to textual interpretation of a text buffer, embed the [[inpt TEXT]] command. For example, passing the string

Hello, I am [[inpt PHON]]mAYkAXl[[inpt TEXT]], the talking 
computer.
to SpeakString, SpeakText, or SpeakBuffer would result in the generation of the sentence, "Hello, I am Michael, the talking computer."

Some, but not all, speech synthesizers allow you to embed a command that causes the Speech Manager to interpret a buffer of text as a series of allophones.

Phonemic Symbols

Table 4-3 summarizes the set of standard phonemes recognized by American English speech synthesizers. Other languages and dialects require different phoneme inventories. Phonemes divide into two groups: vowels and consonants. All vowel symbols are pairs of uppercase letters. For simple consonants the symbol is that lowercase consonant; for blends and complex consonants, the symbol is in uppercase. Within the example words, the individual sounds being exemplified appear in boldface.
Table 4-3 American English phoneme symbols
SymbolExampleOpcodeSymbolExampleOpcode
%silence0Dthem21
@breath intake1ffin22
AEbat 2ggain23
EYbait3hhat24
AOcaught4Jjump25
AXabout 5kkin26
IY beet 6llimb27
EH bet 7mmat28
 
IH bit 8 nnat29
AY bite 9Ntang30
IX roses 10 ppin31
AA cot11rran32
UW boot12ssin33
UHbook13 Sshin34
UX bud 14 ttin35
OWboat 15 Tthin36
AW bout 16 vvan37
OY boy 17 wwet38
b bin 18 yyet39
C chin 19 zzen40
d din20 Zmeasure41

You can obtain information similar to that in Table 4-3 for whatever language a synthesizer supports by using the GetSpeechInfo function on a channel using the synthesizer with the soPhonemeSymbols selector. The information is returned in a phoneme descriptor record, whose structure is described on page 4-53.

Prosodic Control Symbols

The symbols listed in Table 4-4 are recognized as modifiers to the basic phonemes described in the preceding section. You can use them to more precisely control the quality of speech that is described in terms of raw phonemes.
Prosodic control symbols
TypeSymbolSymbol nameDescription or illustration of effect 
Lexical stress:  Marks stress within a word (optional)
Primary stress1 AEnt2IHsIXp1EYSAXn ("anticipation")
Secondary stress2   
Syllable breaks:  Marks syllable breaks within a word (optional)
Syllable mark=(equal)AEn=t2IH=sIX=p1EY=SAXn ("an-ti-ci-pa-tion") 
Word prominence:  Placed before the affected word 
Destressed~(asciitilde)Used for words with minimal informational content
Normal stress_(underscore)Used for information-bearing words
Emphatic stress+(plus)Used for words requiring special emphasis 
Prosodic:  Placed before the affected phoneme 
Pitch rise/(slash)Pitch will rise on the following phoneme 
Pitch fall\(backslash)Pitch will fall on the following phoneme
Lengthen phoneme>(greater)Lengthens the duration of the following phoneme 
Shorten phoneme<(less)Shortens the duration of the following phoneme

Note
Like all other phonemes, the "silence" phoneme (%) and the "breath intake" phoneme (@) can be lengthened or shortened using the > and < symbols.
The prosodic control symbols (/, \, <, and >) can be concatenated to provide exaggerated or cumulative effects. The specific nature of the effect is dependent on the speech synthesizer. Speech synthesizers also often extend or enhance the controls described in the table.

Table 4-5 indicates the effect of punctuation marks on sentence prosody. In particular, the table shows the effect of punctuation marks on speech pitch and indicates to what extent the punctuation marks cause a pause. Note that because some languages might not use these punctuation marks, some synthesizers might not interpret them correctly. In general, speech synthesizers strive to mimic the pauses and changes in pitch of actual speakers in response to punctuation marks, so to obtain best results, you can punctuate according to standard grammatical rules.

Table 4-5 Effect of punctuation marks on English-language synthesizers
SymbolSymbol nameEffect of punctuation markEffect on Timing
&(ampersand)Forces no addition of silence between phonemesNo additional effect
:(colon)End of clause, no change in pitchShort pause follows
,(comma)Continuation rise in pitchShort pause follows
...(ellipsis)End of clause, no change in pitchPause follows
!(exclam)End-of-sentence sharp fall in pitchPause follows
-(hyphen)End of clause, no change in pitchShort pause follows
((parenleft)Start reduced pitch rangeShort pause precedes
 
)(parenright)End reduced pitch rangeShort pause follows
.(period)End-of-sentence fall in pitch Pause follows
?(question)End-of-sentence rise in pitch Pause follows
"
'
(quotedblleft, quotesingleleft)Varies depending on contextVaries
"
'
(quotedblright, quotesingleright)Varies depending on contextVaries
;(semicolon)Continuation rise in pitchShort pause follows

Specific pitch contours associated with these punctuation marks might vary according to other considerations in the analysis of the text. For example, if a question is rhetorical or begins with a word recognized by the synthesizer to be a question word, the pitch might fall at the question mark. Consequently the above effects should be regarded as only guidelines and not absolute. This also applies to the timing effects, which will vary according to the current rate setting.


Previous Book Contents Book Index Next

© Apple Computer, Inc.
2 JUL 1996