So, the first thing to understand is that what you're describing here:
Our app receives a CallKit VoIP call. When the user taps “Answer”, the app launches and automatically connects to a real-time audio session using WebRTC or MobileRTC.
...is NOT what's actually happens on iOS. Your app doesn't "receive" a CallKit call, nor is CallKit something that really "controls" how your app works. This is how incoming voip pushes actually work:
-
The device receives an voip push for your app.
-
The system either launches or wakes your app (depending on whether or not your app is running).
-
Your app receives the voip push.
-
Your app reports a new call into CallKit.
-
The system present the incoming call UI, sending updates back to your app about the actions the user takes in that UI.
-
If the user answers the call, the system activates the audio session you previously configured.
The critical thing to understand here is that CallKit is best understood as an "interface" framework (albiet a very narrowly focused one), NOT a "voip calling" framework". As the most obvious example of this, the CallKit sample "Speakerbox" is in effect a FULLY functional voip app, including the PushKit support... except for the small detail that it doesn't actually do any networking.
That leads to here:
We would like to confirm whether the following flow (“CallKit Answer → app opens → automatic WebRTC or MobileRTC audio session connection”) complies with Apple’s VoIP Push / CallKit policy.
One thing to note here is that it's not guaranteed that answer a video call will actually cause the app to be opened into the foreground. That is what happens when you swipe to answer on the lock screen AND the device is able to unlock, but it won't happen if the device can't unlock or the unlock event came from other sources (like CarPlay).
In addition, our service also provides real-time video-class functionality using the Zoom Meeting SDK (MobileRTC). When an incoming CallKit VoIP call is answered, the app launches and the user is automatically taken to the Zoom-based video lesson flow: the app opens → the user is landed on the Zoom Meeting pre-meeting room → MobileRTC initializes immediately. In the pre-meeting room, audio and video streams can already be active and MobileRTC establishes a connection, but the actual meeting screen is not joined until the user explicitly taps “Join”. We would like to confirm whether this flow for video lessons (“CallKit Answer → app opens → pre-meeting room (audio/video active) → user taps ‘Join’ → enter actual meeting”) is also compliant with Apple’s VoIP Push and CallKit policy.
The decision here is ultimately up to App Review but, IMHO, this is really about the overall user experience not the general "flow" you're describing. More specifically, what I'd be concerned about is the user experience if the user answers the call, but your app isn't brought to the foreground.
For example, if your app simply starts the call and then does "nothing", then that's a pretty poor user experience which makes your app look pretty broken. On the other hand, answering the call plays a brief description of the upcoming meeting and/or "hold" music so the user knows what's going on, then don't really see any issue with that. Similar issue come up when joining meeting before they're scheduled start time and there's no problem with simply leaving the call active until the meeting actually starts. In any case, the key point here is that it needs to be clear to the user what's going on, not necessarily the details of your implementation.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware