Best approach for animating a speaking avatar in a macOS/iOS SwiftUI application

I am developing a macOS application using SwiftUI (with an iOS version as well).

One feature we are exploring is displaying an avatar that reads or speaks dynamically generated text produced by an AI service.

The basic flow would be:

  • Text generated by an AI service
  • Text converted to speech using a TTS engine
  • An avatar (2D or 3D) rendered in the app that animates lip movement synchronized with the speech

Ideally the avatar would render locally on the device.

Questions:

  1. What Apple frameworks would be most appropriate for implementing a speaking avatar?

    • SceneKit
    • RealityKit
    • SpriteKit (for 2D avatars)
  2. Is there any recommended way to drive lip-sync animation from speech audio using Apple frameworks?

  3. Does AVSpeechSynthesizer expose phoneme or viseme timing information that could be used for avatar animation?

  4. If such timing information is not available, what is the recommended approach for synchronizing character mouth animation with speech audio on macOS/iOS?

  5. Are there examples of real-time character animation synchronized with speech on macOS/iOS?

Any architectural guidance or references would be greatly appreciated.

Best approach for animating a speaking avatar in a macOS/iOS SwiftUI application
 
 
Q