Inside Macintosh: Sound /: Chapter 4 - Speech Manager / Using the Speech Manager

Legacy Document

Important: Inside Macintosh: Sound is deprecated as of Mac OS X v10.5. For new audio development in Mac OS X, use Core Audio. See the Audio page in the ADC Reference Library.

Phonemic Representation of Speech
The Speech Manager allows your application to process text phonemically. If your application speaks only text that the user writes, this feature is unlikely to be useful to you, because you cannot anticipate what the user might enter. However, if there are a few or many sentences that your application frequently converts into speech, it might be useful to represent parts of these sentences phonemically rather than textually.
It might be useful to convert your text into phonemes during application development in order to be able to reduce the amount of memory required to speak. If your application does not require the text-to-phoneme conversion portion of the speech synthesizer, significantly less RAM might be required to speak with some synthesizers.
Additionally, you might be able to use a higher quality text-to-phoneme conversion process (even one that does not work in real time) to generate precise phonemic information. This data can then be used with any speech synthesizer to produce better speech. For example, you might convert textual to phonemic data on a future version of the Speech Manager that performs such conversions more accurately than the Speech Manager currently does; that phonemic data could then be used to generate speech with any version of the Speech Manager. The Speech Manager's TextToPhonemes function provides an easy method for converting text into its default phonemic equivalent.
To help the Speech Manager differentiate a textual representation of a word from a phonemic representation, you must embed commands in text that inform the Speech Manager to change into a mode in which it interprets a buffer of text as a phonemic representation of speech, in which particular combinations of letters represent particular phonemes. (You can also use the SetSpeechInfo function to change to phoneme mode.) To indicate to the Speech Manager that subsequent text is a phonemic representation of text to be spoken, embed the [[inpt PHON]] command within a string or buffer that your application passes to one of the SpeakString, SpeakText, or SpeakBuffer functions. To indicate that the Speech Manager should revert to textual interpretation of a text buffer, embed the [[inpt TEXT]] command. For example, passing the string
Hello, I am [[inpt PHON]]mAYkAXl[[inpt TEXT]], the talking 
computer.
to SpeakString, SpeakText, or SpeakBuffer would result in the generation of the sentence, "Hello, I am Michael, the talking computer."
Some, but not all, speech synthesizers allow you to embed a command that causes the Speech Manager to interpret a buffer of text as a series of allophones.

Phonemic Symbols
Table 4-3 summarizes the set of standard phonemes recognized by American English speech synthesizers. Other languages and dialects require different phoneme inventories. Phonemes divide into two groups: vowels and consonants. All vowel symbols are pairs of uppercase letters. For simple consonants the symbol is that lowercase consonant; for blends and complex consonants, the symbol is in uppercase. Within the example words, the individual sounds being exemplified appear in boldface.
Table 4-3 American English phoneme symbols
Symbol Example Opcode Symbol Example Opcode
% silence 0 D them 21
@ breath intake 1 f fin 22
AE bat 2 g gain 23
EY bait 3 h hat 24
AO caught 4 J jump 25
AX about 5 k kin 26
IY beet 6 l limb 27
EH bet 7 m mat 28

IH bit 8 n nat 29
AY bite 9 N tang 30
IX roses 10 p pin 31
AA cot 11 r ran 32
UW boot 12 s sin 33
UH book 13 S shin 34
UX bud 14 t tin 35
OW boat 15 T thin 36
AW bout 16 v van 37
OY boy 17 w wet 38
b bin 18 y yet 39
C chin 19 z zen 40
d din 20 Z measure 41

You can obtain information similar to that in Table 4-3 for whatever language a synthesizer supports by using the GetSpeechInfo function on a channel using the synthesizer with the soPhonemeSymbols selector. The information is returned in a phoneme descriptor record, whose structure is described on page 4-53.

Prosodic Control Symbols
The symbols listed in Table 4-4 are recognized as modifiers to the basic phonemes described in the preceding section. You can use them to more precisely control the quality of speech that is described in terms of raw phonemes.
Prosodic control symbols
Type Symbol Symbol name Description or illustration of effect
Lexical stress: Marks stress within a word (optional)
Primary stress 1 AEnt2IHsIXp1EYSAXn ("anticipation")
Secondary stress 2
Syllable breaks: Marks syllable breaks within a word (optional)
Syllable mark = (equal) AEn=t2IH=sIX=p1EY=SAXn ("an-ti-ci-pa-tion")
Word prominence: Placed before the affected word
Destressed ~ (asciitilde) Used for words with minimal informational content
Normal stress _ (underscore) Used for information-bearing words
Emphatic stress + (plus) Used for words requiring special emphasis
Prosodic: Placed before the affected phoneme
Pitch rise / (slash) Pitch will rise on the following phoneme
Pitch fall \ (backslash) Pitch will fall on the following phoneme
Lengthen phoneme > (greater) Lengthens the duration of the following phoneme
Shorten phoneme < (less) Shortens the duration of the following phoneme

Note
Like all other phonemes, the "silence" phoneme (%) and the "breath intake" phoneme (@) can be lengthened or shortened using the > and < symbols.
The prosodic control symbols (/, \, <, and >) can be concatenated to provide exaggerated or cumulative effects. The specific nature of the effect is dependent on the speech synthesizer. Speech synthesizers also often extend or enhance the controls described in the table.
Table 4-5 indicates the effect of punctuation marks on sentence prosody. In particular, the table shows the effect of punctuation marks on speech pitch and indicates to what extent the punctuation marks cause a pause. Note that because some languages might not use these punctuation marks, some synthesizers might not interpret them correctly. In general, speech synthesizers strive to mimic the pauses and changes in pitch of actual speakers in response to punctuation marks, so to obtain best results, you can punctuate according to standard grammatical rules.
Table 4-5 Effect of punctuation marks on English-language synthesizers
Symbol Symbol name Effect of punctuation mark Effect on Timing
& (ampersand) Forces no addition of silence between phonemes No additional effect
: (colon) End of clause, no change in pitch Short pause follows
, (comma) Continuation rise in pitch Short pause follows
... (ellipsis) End of clause, no change in pitch Pause follows
! (exclam) End-of-sentence sharp fall in pitch Pause follows
- (hyphen) End of clause, no change in pitch Short pause follows
( (parenleft) Start reduced pitch range Short pause precedes

) (parenright) End reduced pitch range Short pause follows
. (period) End-of-sentence fall in pitch Pause follows
? (question) End-of-sentence rise in pitch Pause follows
"
' (quotedblleft, quotesingleleft) Varies depending on context Varies
"
' (quotedblright, quotesingleright) Varies depending on context Varies
; (semicolon) Continuation rise in pitch Short pause follows

Specific pitch contours associated with these punctuation marks might vary according to other considerations in the analysis of the text. For example, if a question is rhetorical or begins with a word recognized by the synthesizer to be a question word, the pitch might fall at the question mark. Consequently the above effects should be regarded as only guidelines and not absolute. This also applies to the timing effects, which will vary according to the current rate setting.

**Table 4-3 American English phoneme symbols**
Symbol	Example	Opcode	Symbol	Example	Opcode
%	silence	0	D	them	21
@	breath intake	1	f	fin	22
AE	bat	2	g	gain	23
EY	bait	3	h	hat	24
AO	caught	4	J	jump	25
AX	about	5	k	kin	26
IY	beet	6	l	limb	27
EH	bet	7	m	mat	28

IH	bit	8	n	nat	29
AY	bite	9	N	tang	30
IX	roses	10	p	pin	31
AA	cot	11	r	ran	32
UW	boot	12	s	sin	33
UH	book	13	S	shin	34
UX	bud	14	t	tin	35
OW	boat	15	T	thin	36
AW	bout	16	v	van	37
OY	boy	17	w	wet	38
b	bin	18	y	yet	39
C	chin	19	z	zen	40
d	din	20	Z	measure	41

Prosodic control symbols
Type	Symbol	Symbol name	Description or illustration of effect
Lexical stress:			Marks stress within a word (optional)
Primary stress	`1`		AEnt2IHsIXp1EYSAXn ("anticipation")
Secondary stress	`2`
Syllable breaks:			Marks syllable breaks within a word (optional)
Syllable mark	`=`	(equal)	AEn=t2IH=sIX=p1EY=SAXn ("an-ti-ci-pa-tion")
Word prominence:			Placed before the affected word
Destressed	`~`	(asciitilde)	Used for words with minimal informational content
Normal stress	`_`	(underscore)	Used for information-bearing words
Emphatic stress	`+`	(plus)	Used for words requiring special emphasis
Prosodic:			Placed before the affected phoneme
Pitch rise	`/`	(slash)	Pitch will rise on the following phoneme
Pitch fall	`\`	(backslash)	Pitch will fall on the following phoneme
Lengthen phoneme	`>`	(greater)	Lengthens the duration of the following phoneme
Shorten phoneme	`<`	(less)	Shortens the duration of the following phoneme

**Table 4-5 Effect of punctuation marks on English-language synthesizers**
Symbol	Symbol name	Effect of punctuation mark	Effect on Timing
&	(ampersand)	Forces no addition of silence between phonemes	No additional effect
:	(colon)	End of clause, no change in pitch	Short pause follows
,	(comma)	Continuation rise in pitch	Short pause follows
...	(ellipsis)	End of clause, no change in pitch	Pause follows
!	(exclam)	End-of-sentence sharp fall in pitch	Pause follows
-	(hyphen)	End of clause, no change in pitch	Short pause follows
(	(parenleft)	Start reduced pitch range	Short pause precedes

)	(parenright)	End reduced pitch range	Short pause follows
.	(period)	End-of-sentence fall in pitch	Pause follows
?	(question)	End-of-sentence rise in pitch	Pause follows
" '	(quotedblleft, quotesingleleft)	Varies depending on context	Varies
" '	(quotedblright, quotesingleright)	Varies depending on context	Varies
;	(semicolon)	Continuation rise in pitch	Short pause follows

Shop the Apple Online Store (1-800-MY-APPLE), visit an Apple Retail Store, or find a reseller.