AVAudioSessionCategoryPlayback is not allowed while CallKit call is active

We require assistance in resolving a critical audio design conflict within our Push-to-Talk (PTT) application. Our current volume amplification strategy—which relies on applying a GAIN factor to PCM samples in conjunction with setting the AVAudioSession category to Playback—is working successfully when PTT is used independently. However, upon integrating and reporting the same PTT call through the CallKit framework, this amplification effect is lost. The CallKit integration appears to be forcing a different, non-amplifying audio session category or configuration, negatively impacting the user's perceived call volume. We need guidance on how to maintain the AVAudioSessionCategoryPlayback setting, or an equivalent high-volume configuration, while operating under the control of CallKit.

So, let me start here:

The CallKit integration appears to be forcing a different, non-amplifying audio session category or configuration, negatively impacting the user's perceived call volume.

Basically, yes, that is exactly what's going on. Internally, both CallKit and the PTT framework really run on a private audio session configuration which is similar to PlayAndRecord, but not identical. Ironically, the most notable difference is that its maximum playback volume is slightly higher than a standard PlayAndRecord session.

However, the bigger benefit is that this session configuration has a higher priority than any other audio session time, which is why it can't be interrupted by any of "standard" audio APIs.

We require assistance in resolving a critical audio design conflict within our Push-to-Talk (PTT) application.

Just to clarify, are you using CallKit or the PTT framework? Your answer won't really change the audio side of this, but I'd like to have a clearer picture of what you're actually doing:

That leads to here:

Our current volume amplification strategy—which relies on applying a GAIN factor to PCM samples in conjunction with setting the AVAudioSession category to Playback—is working successfully when PTT is used independently.

I don't know what exactly is going wrong here, but it sounds like you've fallen into a common development anti-pattern I've seen before. That is, I think you:

  1. Built a "foreground only implementation" on the standard audio framework.

  2. Retrofitted CallKit/PushToTalk support onto that implementation to "handle the background case".

  3. Notice that the two cases don't sound the same and are now trying to force #2 to sound the same as #1.

Unfortunately, this approach doesn't really work. The biggest issue is that using #1 AT ALL means that you’re running as a standard priority audio session, which can then be interrupted by CallKit at any time. That's disruptive in the foreground (cutting your audio without warning) and catastrophic in the background (where losing audio will force app suspension). You also can't really transition "smoothly" between 1 & 2, so you'll either end up with audio glitches (as you end one to start the other) or you'll end up continuing #1 into the background, creating an inconsistent user experience. Even worse, that transition process ITSELF tends to create problems, as #1 requires manual AudioSession activation and calling "setActive" is what actually causes the vast majority of audio problems in CallKit/PTT.

*Note that this issue was the primary reason we created CallKit in the first place.

Finally, those two different configurations cause exactly the problem you've noticed— our audio session types don't actually sound "exactly" the same in every detail, which makes switching between them a poor user experience. Note that part of this is because the sessions are "different", but the largest factor is simply that the device tracks the "device volume" and "phone volume" separately, which is why playing loud music when a call comes in doesn't change the volume of the call you'd answered.

The solution to all of this is to use CallKit/PTT for "all" of your audio. That prevents all of the technical issues I outlined above and it also means that you can tune the audio behavior of that particular configuration to work however you want. In terms of the specific technique here:

...which relies on applying a GAIN factor to PCM samples in conjunction...

You may need to apply a different gain factor to account for the differences in configuration, but I'd expect the basic technique to be exactly the same in both cases.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thank you for the clear explanation.

We must emphasize that, as a Mission critical Push-to-Talk (PTT) application, the perceived volume and clarity of the audio are fundamental user requirements. We currently achieve the necessary amplification and clarity using a custom PCM gain factor. Therefore, the ability for our application to apply a specific gain factor to PCM samples for amplification is essential.

Could you please advise on which framework or API Apple recommends for a third-party application to intercept and process the audio stream—specifically, to apply a GAIN factor to PCM samples—during an active CallKit call?

Could you please advise on which framework or API Apple recommends for a third-party application to intercept and process the audio stream—specifically, to apply a GAIN factor to PCM samples—during an active CallKit call?

AVAudioEngine is the API we generally recommend for lower level audio processing.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

AVAudioSessionCategoryPlayback is not allowed while CallKit call is active
 
 
Q