Resolving co channel interference VOIP

Subject: Inquiry Regarding Architectural Overhead and Buffer Access in the Push to Talk Framework for Real-Time Core ML Blind Source Separation

Dear Apple Engineering Team,

We are currently developing an Apple-native communication platform that utilizes the Push to Talk framework alongside Core ML to handle real-time, on-device audio processing. We are working to resolve the issue of single-channel, co-channel interference (overlapping voice streams) directly on the edge.

Our current challenge lies in the pipeline latency and background lifecycle constraints when intercepting incoming audio buffers. To cleanly separate overlapping voices before they hit the audio output mixer, we need to process the raw PCM data immediately upon arrival.

Could you please provide guidance on the following architectural questions:

Low-Latency Buffer Interception: What is the recommended design pattern within the PTChannelManagerDelegate flow to pass raw incoming audio buffers directly to a Core ML model running on the Apple Neural Engine (ANE) before the system routes them to AVAudioEngine for playback?

Background Thread Management: Given the strict background execution boundaries enforced by the Push to Talk framework, how can we best optimize thread scheduling to ensure our speech separation model completes its execution without triggering an OS background processing timeout or process termination?

Dynamic UI Manifestation: Once a combined audio stream is separated into two clean, distinct voice vectors on-device, what is the best approach for registering multiple PTParticipant states simultaneously so that the native system UI (like the Dynamic Island) accurately reflects both speakers?

Thank you for your time, insights, and continued support of developer innovation within the iOS and iPadOS ecosystems.

Best regards,

Ken Zakreski Founder, Marine Link Pro

What is the recommended design pattern within the PTChannelManagerDelegate flow to pass raw incoming audio buffers directly to a Core ML model running on the Apple Neural Engine (ANE) before the system routes them to AVAudioEngine for playback?

I don't know what you mean here. The PushToTalk framework and PTChannelManagerDelegate in particular don't actually have any interaction with your app audio data. All of that is handled through the standard audio APIs.

Background Thread Management: Given the strict background execution boundaries enforced by the Push to Talk framework, how can we best optimize thread scheduling to ensure our speech separation model completes its execution without triggering an OS background processing timeout or process termination?

The background execution model for PTT is actually pretty simple— your app will stay away as long as it's actively transmitting. In practice, that gives you a huge amount of implementation flexibility, such that you shouldn't really have any issue with keeping your app awake if it needs to be.

Do you need to continue executing code after you've stopped transmitting? How much time do you need?

Dynamic UI Manifestation: Once a combined audio stream is separated into two clean, distinct voice vectors on-device, what is the best approach for registering multiple PTParticipant states simultaneously so that the native system UI (like the Dynamic Island) accurately reflects both speakers?

You shouldn't necessarily think of "PTParticipant" as specifically representing a single individual. As far as the system is concerned, it's basically just a string and a picture it's going to show to the user. If multiple people are speaking, you show that by giving the system a name and a picture that accurately conveys what's going on.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thank you Kevin.

Resolving co channel interference VOIP
 
 
Q