PTT Framework has compatibility issue with .voiceChat AVAudioSession mode

As I've mentioned before our app uses PTT Framework to record and send audio messages. In one of supported by app mode we are using WebRTC.org library for that purpose. Internally WebRTC.org library uses Voice-Processing I/O Unit (kAudioUnitSubType_VoiceProcessingIO subtype) to retrieve audio from mic. According to https://developer.apple.com/documentation/avfaudio/avaudiosession/mode-swift.struct/voicechat using Voice-Processing I/O Unit leads to implicit enabling .voiceChat AVAudioSession mode (i.e. it looks like it's not possible to use Voice-Processing I/O Unit without .voiceChat mode).

And problem is following: when user starts outgoing PTT, PTT Framework plays audio notification, but in case of enabled .voiceChat mode that sound is playing distorted or not playing at all.

Questions:

  1. Is it known issue?
  2. Is there any way to workaround it?
Answered by DTS Engineer in 826597022

Let me start here:

And problem is following: when user starts outgoing PTT, PTT Framework plays audio notification, but in case of enabled .voiceChat mode that sound is playing distorted or not playing at all.

I don't think the voiceChat mode itself is the issue. The PTT Framework is directly derived from CallKit, particularly in terms of how it handles audio, and CallKit has no issue with this as our CallKit sample specifically uses that mode.

However, what IS a known issue is problems with integrating audio libraries that weren't specifically written with PTT/CallKit in mind:

As I've mentioned before our app uses PTT Framework to record and send audio messages. In one of supported by app mode we are using WebRTC.org library for that purpose. Internally WebRTC.org library uses Voice-Processing I/O Unit (kAudioUnitSubType_VoiceProcessingIO subtype) to retrieve audio from mic.

The big issue here is that most audio libraries do their own session activation and that pattern doesn't work for PTT/CallKit. Depending on exactly what exactly the library does (and when), that can cause exactly the kinds of problems you're describing when the audio session ends up misconfigured in a way that the library isn't expecting. The solution here is basically "don't activate that audio session yourself". The PTT framework should handle all session activation, with the only exception from being the interruption handler. See our CallKit sample for how this should work (again, CallKit and PTT handle audio in the same way).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Let me start here:

And problem is following: when user starts outgoing PTT, PTT Framework plays audio notification, but in case of enabled .voiceChat mode that sound is playing distorted or not playing at all.

I don't think the voiceChat mode itself is the issue. The PTT Framework is directly derived from CallKit, particularly in terms of how it handles audio, and CallKit has no issue with this as our CallKit sample specifically uses that mode.

However, what IS a known issue is problems with integrating audio libraries that weren't specifically written with PTT/CallKit in mind:

As I've mentioned before our app uses PTT Framework to record and send audio messages. In one of supported by app mode we are using WebRTC.org library for that purpose. Internally WebRTC.org library uses Voice-Processing I/O Unit (kAudioUnitSubType_VoiceProcessingIO subtype) to retrieve audio from mic.

The big issue here is that most audio libraries do their own session activation and that pattern doesn't work for PTT/CallKit. Depending on exactly what exactly the library does (and when), that can cause exactly the kinds of problems you're describing when the audio session ends up misconfigured in a way that the library isn't expecting. The solution here is basically "don't activate that audio session yourself". The PTT framework should handle all session activation, with the only exception from being the interruption handler. See our CallKit sample for how this should work (again, CallKit and PTT handle audio in the same way).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

@DTS Engineer yes, I know about this restriction. And I've disabled AudioSession activation inside WebRTC.org library (but I'll check one more time to be sure 100%) and still see issue. Also as I mentioned before, in our code we have path using AVAudioRecorder for the same purpose, and adding .voiceChat mode there gives identical result. And everything is fine without it.

Also, to avoid possible side effects from third part libraries with hard to understand side effects I've did small experiment with AVAudioEngine:

let engine = AVAudioEngine()
defer { engine.stop() }
let inputNode = engine.inputNode // just to create node accessing mic
try inputNode.setVoiceProcessingEnabled(false) // if remove this, PTT start sonification will be distorted!!!
try engine.start()
try await Task.sleep(for: .milliseconds(150))

and as mentioned in comment, if remove try inputNode.setVoiceProcessingEnabled(false) PTT start sound become distorted. I.e. it's something specific exactly to voice processing (.voiceChat mode enables the same thing I believe)

And I don't talk about audio recording/playing by app, I'm talking only about sound playing by PTT Framework to indicate PTT start. That's why comparing with CallKit has no meaning, since CallKit doesn't play such sound.

So, let me start by providing a bit of background context on why CallKit is relevant and helpful here:

And I don't talk about audio recording/playing by app, I'm talking only about sound playing by PTT Framework to indicate PTT start. That's why comparing with CallKit has no meaning, since CallKit doesn't play such sound.

Architecturally, the PTT framework is an extension of CallKit, not an independant API. That is, the PTT session your app manages is actually a modified variant of the same "calls" callservicesd. This is particularly true for audio handling, as the PTT framework basically doesn't implement any of it's "own" in process audio handling, relying entirely on CallKit's implementation.

Now, that leads to here:

and as mentioned in comment, if remove try inputNode.setVoiceProcessingEnabled(false) PTT start sound become distorted.

Throwing out some educated guesses, are you:

I ask because those to factor can cause exactly the issue you mentioned here:

Also, the way how PTT start sound is distorted, sounds for me very similar to what I hear if I start some audio playback with .playback category and do switch to .playAndRecord category while audio is still playing...

...for exactly the same reason. The issue with bluetooth in particular is that bluetooth has two different specification for dealing with audio:

  1. Advanced Audio Distribution Profile (A2DP)-> This is playback only and is what speakers use when playing audio.

  2. Hands-Free Profile (HFP)-> This is bidirectional, allowing playback and recording.

The distinction here matters because A2DP has significantly higher fidelity than HFP and simply "sounds" better*.

*Note that this is for entirely practical reasons. HFP has roughly "half" the playback bandwidth as A2DP, since it has to divide it's bandwith between playback and recording. In addition, HFP use different (and less efficient) audio compression codecs because it's latency is MUCH lower (~30ms vs ~100ms+).

In any case, the net result is that the switch from A2DP to HFP caused by something like:

start some audio playback with .playback category and do switch to .playAndRecord category while audio is still playing

...will cause unavoidably cause quite noticeable* (and unpleasant) decline in audio quality. That same behavior will occur when enabling voice processing for the same reason- voice processing enables input and output, which is effectively the same as playAndRecord.

*Change will occur on wired I/O as well, but the effect is most significant on bluetooth.

With all that context, let me go back to here:

Is there any way to workaround it?

So, the basic answer here is that you need to either ensure that the transitions only occurs when no playback is occurring (which isn't really possible) or ensure that the session "stays" in playAndRecord.

Being more specific, I have two answers:

  1. If you're using PTTransmissionMode.halfDuplex, stop and switch to PTTransmissionMode.fullDuplex. In hindsight, we probably shouldn't have bothered implementing halfDuplex, as it basically forces exactly this kind of awkward transition. More the point, my experience has been that products which truly are halfDuplex often end up using fullDuplex anyway because the mechanics of our halfDuplex implementation don't actually match up with their implementation. PTTransmissionMode.fullDuplex ends up just working "better", as they can use the additional flexibility to implement exactly the behavior they want*.

  2. If you're using PTTransmissionMode.fullDuplex, then (I think) leaving voiceChat enabled at all times will avoid the problem. If it's not, then you should take a closer at exactly what and where your app does audio configuration. My guess is that you're doing "something" which is pushing you back to playback only, setting up the distortion when your start recording again.

*Keep in mind that the system enabling tranmission/recording does NOT mean your app actually has to "do" anything with the audio it received from the system. For example, you can implement floor "claiming" mechanics have requesting the floor when you receive the transmission request, then play a sound (so the user knows it's happened) and start actually recording and sending audio once they have the floor.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

PTT Framework has compatibility issue with .voiceChat AVAudioSession mode
 
 
Q