AVSpeechSynthesizer system voices (SLA clarification)

Hello,

I am building an iOS-only, commercial app that uses AVSpeechSynthesizer with system voices, strictly using the APIs provided by Apple. Before distributing the app, I want to ensure that my current implementation does not conflict with the iOS Software License Agreement (SLA) and is aligned with Apple’s intended usage.

For a better playback experience (more accurate estimation of utterance duration and smoother skip forward/backward during playback), I currently synthesize speech using:

  • AVSpeechSynthesizer.write(_:toBufferCallback:)
  • Converting the received AVAudioPCMBuffer buffers into audio data
  • Storing the audio inside the app sandbox
  • Playing it back using AVAudioPlayer / AVAudioEngine

The cached audio is:

  • Generated fully on-device using system voices
  • Stored only inside the app’s private container
  • Used only for internal playback controls (timeline, seek, skip ±5 seconds)
  • Never shared, exported, uploaded, or exposed outside the app

The alternative approaches would be:

  • Keeping the generated audio entirely in memory (RAM) for playback purposes, without writing it to the file system at any point
  • Or using AVSpeechSynthesizer.speak(_:) and playing speech strictly in real time which has a poorer user experience compared to my approach

I have reviewed the current iOS Software License Agreement:
https://www.apple.com/legal/sla/docs/iOS18_iPadOS18.pdf

In particular, section (f) mentions restrictions around System Characters, Live Captions, and Personal Voice, including the following excerpt:

“…use … only for your personal, non-commercial use…
No other creation or use of the System Characters, Live Captions, or Personal Voice is permitted by this License, including but not limited to the use, reproduction, display, performance, recording, publishing or redistribution in a … commercial context.”

I do not see a specific reference in the SLA to system text-to-speech voices used via AVSpeechSynthesizer, and I want to be certain that temporarily caching synthesized speech for internal, non-exported playback is acceptable in a commercial app.

My question is:

Is caching AVSpeechSynthesizer system-voice output inside the app sandbox for internal playback acceptable, or is Apple’s recommended approach to rely only on real-time playback (speak(_:)) or strictly in-memory buffering without file storage?

If this question falls outside DTS technical scope and is instead a policy or licensing matter, I would appreciate guidance on the authoritative Apple documentation or the correct Apple team/contact.

Thank you.

AVSpeechSynthesizer system voices (SLA clarification)
 
 
Q