Push To Talk framework doesn't active audio session in background

We are trying to extend our app with Push To Talk functionality by integrating the Push To Talk framework. We are extensively testing what happens if the app is running in the foreground, in the background or not running at all.

When the app is in the foreground, and the user has joined a channel we maintain an open connection to our server. When a remote participant starts streaming audio, we immediately call setActiveRemoteParticipant on our PTChannelManager instance. The PTT system will than call our delegate's channelManager:didActivate audioSession method and we can successfully play the incoming audio.

When the app is not running at all, there is of course no active connection initially. When another participant starts talking we send a push notification. The PTT system will start our app in the background, call the incomingPushResult method on our delegate, after returning the remote participant the PTT framework will then call the channelmanager:didJoin delegate method which we will use to re-establish the server connection, the PTT framework then calls our channelManager:didActivate audioSession delegate method and we can then successfully play audio.

Now the problem. When the application was initially in the foreground and has an established server connection, we initially keep the server connection active when the app enters the background state, until a certain timeout or the system decides our app needs to be killed / removed from memory. This allows us to finish an incoming audio stream, quickly react on incoming responses etc. When we then receive an incoming audio stream after a certain delay (for example 5 seconds) we call the channelManager.setRemoteParticipant method (using try await syntax). This finishes successfully, without any error, however the channelManager:didActivate audioSession delegate method is never called. Manually setting up an audio session is not allowed either and returns an error.

Our current workaround for this issue is to disconnect the server connection as soon as the app goes into the background. This will make sure our server sends a push notification, which is successful in activating the audio session after which we can play audio. However, this means we need to re-establish the connection which will introduce an unnecessary delay before we can start playback (and currently means we loose some audio). This also means we need to do extra checks when going to the background to make sure there is no active incoming stream. After each incoming stream we have to check again if we are in the background and disconnect immediately to make sure we get a push notification next time. This can of course also lead to race conditions in an active conversation where we might need to disconnect between incoming streams and if we don't do this in time we might never get an activated audio session.

Now this might be by design, as Apple might not want us to keep the server connection active when the application enters the background state. But if that's the case I would expect the channelManager.setRemoteParticipant method to throw an error, but it doesn't. It returns successfully after which we would expect the audio session to get activated as well. So maybe we are not setting the capabilities of our project correctly (we might need other background permissions as well, although we already experimented with that), or we need to do something else to make this work?

Answered by DTS Engineer in 872136022

Now the problem. When the application was initially in the foreground and has an established server connection, we initially keep the server connection active when the app enters the background state, until a certain timeout or the system decides our app needs to be killed/removed from memory. This allows us to finish an incoming audio stream, quickly react on incoming responses, etc. When we then receive an incoming audio stream after a certain delay (for example, 5 seconds), we call the channelManager.setRemoteParticipant method (using try await syntax).

So, the short summary is that this should "just work". More specifically, all PTT apps are allowed to initiate playback at any time by calling setRemoteParticipant(), even if they're in the background.

In particular, what you're describing here:

we initially keep the server connection active when the app enters the background state, until a certain timeout or the system decides our app needs to be killed/removed from memory.

...is actually pretty common to most PTT apps, as it helps keep the conversation stream live/current compared to relying entirely on PTT pushes.

Similarly, just to be clear:

Now this might be by design, as Apple might not want us to keep the server connection active when the application enters the background state.

...no, this is not something we expect/require. You can do so if you choose, but that's not something we're particularly trying to "require".

That leads to here:

I've created an example app to demonstrate my problem. It can be found here: https://github.com/egeniq/ptt-audio-activation-test-ios

Looking at your sample, the main thing I noticed is that you're not configuring the audio session or requesting record access. After modifying your start() to include that functionality:

func start() {
	Task {
		NSLog("start()")
		do {
			channelManager = try await PTChannelManager.channelManager(delegate: self, restorationDelegate: self)
		} catch let error as PTInstantiationError {
			NSLog("Failed to create channel manager: \(error)")
		}
		AVAudioSession.sharedInstance().requestRecordPermission { granted in
			NSLog("Audio session record permission granted: \(granted)")
		}
		try AVAudioSession.sharedInstance().setCategory(AVAudioSession.Category.playAndRecord, mode: .default, options: [AVAudioSession.CategoryOptions.allowBluetoothHFP, AVAudioSession.CategoryOptions.allowBluetoothA2DP, AVAudioSession.CategoryOptions.defaultToSpeaker])
		
	}
}

...activation started happening in the background as expected:

Set remote participant after 5s fired
Setting active participant: DELAYED
Audio session activated

I don't know how that translates to your real app (which, I assume, configures its audio session), but that's the problem with your test app.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

I've created an example app to demonstrate my problem. It can be found here: https://github.com/egeniq/ptt-audio-activation-test-ios

To reproduce, do the following:

  • Change team selection in Xcode.
  • Run the app.
  • Choose join.
  • Tap on the "Set remote participant NOW" button.
  • See in the log output that the audio session gets activated.
  • Tap on the "Clear remote participant" button.
  • See in the log output that the audio session gets deactivated.
  • Tap on the "Set remote participant after 5s" button.
  • Immediately go back to your phone's homescreen.
  • After 5s see in the log output that the remote participant is successfully set.
  • However also note that no audio session activation occurs.
Accepted Answer

Now the problem. When the application was initially in the foreground and has an established server connection, we initially keep the server connection active when the app enters the background state, until a certain timeout or the system decides our app needs to be killed/removed from memory. This allows us to finish an incoming audio stream, quickly react on incoming responses, etc. When we then receive an incoming audio stream after a certain delay (for example, 5 seconds), we call the channelManager.setRemoteParticipant method (using try await syntax).

So, the short summary is that this should "just work". More specifically, all PTT apps are allowed to initiate playback at any time by calling setRemoteParticipant(), even if they're in the background.

In particular, what you're describing here:

we initially keep the server connection active when the app enters the background state, until a certain timeout or the system decides our app needs to be killed/removed from memory.

...is actually pretty common to most PTT apps, as it helps keep the conversation stream live/current compared to relying entirely on PTT pushes.

Similarly, just to be clear:

Now this might be by design, as Apple might not want us to keep the server connection active when the application enters the background state.

...no, this is not something we expect/require. You can do so if you choose, but that's not something we're particularly trying to "require".

That leads to here:

I've created an example app to demonstrate my problem. It can be found here: https://github.com/egeniq/ptt-audio-activation-test-ios

Looking at your sample, the main thing I noticed is that you're not configuring the audio session or requesting record access. After modifying your start() to include that functionality:

func start() {
	Task {
		NSLog("start()")
		do {
			channelManager = try await PTChannelManager.channelManager(delegate: self, restorationDelegate: self)
		} catch let error as PTInstantiationError {
			NSLog("Failed to create channel manager: \(error)")
		}
		AVAudioSession.sharedInstance().requestRecordPermission { granted in
			NSLog("Audio session record permission granted: \(granted)")
		}
		try AVAudioSession.sharedInstance().setCategory(AVAudioSession.Category.playAndRecord, mode: .default, options: [AVAudioSession.CategoryOptions.allowBluetoothHFP, AVAudioSession.CategoryOptions.allowBluetoothA2DP, AVAudioSession.CategoryOptions.defaultToSpeaker])
		
	}
}

...activation started happening in the background as expected:

Set remote participant after 5s fired
Setting active participant: DELAYED
Audio session activated

I don't know how that translates to your real app (which, I assume, configures its audio session), but that's the problem with your test app.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

I indeed used a simplified example for demonstration purposes where I didn't add the microphone permission and audio session setup. In my real application I do.

However, your answer did trigger me to check that code again, and I noticed that I accidentally added .mixWithOthers to the options. And it seems that causes the audio session not to get activated when running in the background. So I removed that option and now everything is running fine!

Also great to hear that it is ok to keep things running in the background when there is an active PTT session. It indeed improves the conversation flow a lot.

Thanks!

However, your answer did trigger me to check that code again, and I noticed that I accidentally added .mixWithOthers to the options. And it seems that causes the audio session not to get activated when running in the background.

Ahh... That makes sense.

So, as a bit of a history, before late-iOS 12/13 the way PTT apps worked was that they used a mixable (non-mixable activation never worked) PlayAndRecord session. It's expected/allowed that mixable sessions are allowed to activate in the background (for example, it's how things like turn-by-turn directions work), but that probably shouldn't have been allowed for recording sessions, since it basically allows an app to start recording whenever it wants.

To fix that, we disabled all background recording activation in late-iOS 12, so PTT apps were then moved to a CallKit-based workaround (iOS 12->iOS 15) and then later the PTT framework (iOS 16+). The CallKit workaround (as the PTT framework today) came with other major benefits[1], so the "mixable" session configuration quickly went away as existing PTT apps reconfigured their code to comply with CallKit.

In any case, I suspect there is still some code deep in the audio system from the original iOS 12 change, and that's what's causing activation to fail.

[1] Notably, using a standard (non-CallKit) audio session meant that PTT apps operated under the "standard" audio session priority system, which meant that ANY incoming call would IMMEDIATELY trigger an audio interruption. That both cut off any existing audio and also meant that the app was immediately forced to suspend, leaving the app without any good way to recover. Ironically, this was the EXACT same problem standard voip apps had… which is what had originally led to the creation of CallKit in iOS 10. The CallKit workaround is one of the few cases I can think of where a bug workaround was actually a BETTER solution than the original solution.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Wow, thank you for the extensive background information!

Wow, thank you for the extensive background information!

You're very welcome.

Kevin Elliott
DTS Engineer, CoreOS/Hardware

Push To Talk framework doesn't active audio session in background
 
 
Q