Project Background:
I am developing a third-party custom keyboard for iOS whose primary feature is real-time voice input.
In my current design, responsibilities are split as follows:
1. The container (main) app is responsible for:
-
Audio recording
-
Speech recognition (ASR)
2. The keyboard extension is responsible for:
-
Providing the keyboard UI
-
Initiating the voice input workflow
-
Receiving transcription results via an App Group
-
Inserting recognized text into the active text field using textDocumentProxy.insertText(_:)
Intended User Flow
The intended workflow is:
- The user is typing in a third-party app (for example, WeChat) using my custom keyboard.
- The user taps a “Voice Input” button in the keyboard extension.
- The keyboard extension activates the container app so that audio recording and ASR can begin.
- After recording has started, control returns to the original app where the user was typing.
- The container app continues running in the background, maintaining active audio recording and ASR.
- Recognized text is continuously streamed back to the keyboard extension and inserted into the current cursor position in real time.
Observed Industry Behavior
Some popular third-party keyboards on iOS, such as WeChat Keyboard and Doubao Keyboard, appear to provide a similar user experience in which:
- Voice input can be initiated directly from the keyboard while typing in another app.
- The user remains (or returns) in the original typing context after voice input starts.
- Speech recognition continues and text is streamed into the active text field without interrupting the typing experience.
I would like to better understand how this type of workflow aligns with iOS platform capabilities and supported APIs.
My Questions
- Is it supported by iOS public APIs for a custom keyboard extension to activate its container app to start audio recording and ASR, then return to the original host app while the container app continues recording and performing ASR in the background?
- If this workflow is not supported, are there any Apple-recommended or supported alternative architectures for achieving a similar user experience, especially when audio recording and ASR logic are currently implemented in the container app rather than in the keyboard extension?
Goal
My goal is to design a solution that is fully compliant with iOS public APIs and platform constraints, while providing a real-time voice input experience comparable to existing third-party keyboards on the platform.
Any guidance on supported APIs, recommended architectures, or relevant documentation would be greatly appreciated.
I see the community bringing this up from time to time, and so would like to take this chance to hopefully get it sorted. For the clarify of the discussion, I'd split this question into three parts:
- Does App Review guideline allow a keyboard extension to launch its container app?
- If yes, is there any API to do so?
- Is there any API for the container app to bring the host app back to the foreground?
The first part is about App Review policy, and so only the App Review team has the final answer, but looking into the App Review Guideline, you can see the following related to keyboard extensions:
4.4.1 Keyboard extensions have some additional rules.
...
They must not:
Launch other apps besides Settings;
Here the guideline uses "other apps," rather than simply "apps" or "any app." I believe that means to say that launching the container app is allowed. To make sure my understanding is correct, I've reached out my App Review colleague, and they've confirmed that launching the container app from a keyboard extension (to do something like barcode scanning) is allowed.
That being said, at the time of writing, it is pretty safe to say that the answer to the first part of the question is yes.
As of the second part, to launch the container app from a keyboard extension, the following code, which goes along the responder chain to find the UIApplication instance and then calls its open(_:options:completionHandler:), works for me:
/**
"mytestapp" is the URL schema the container app defines.
*/
guard let url = URL(string: "mytestapp://") else {
return
}
/**
Go along the responder chain to find the UIApplication instance,
then do UIApplication.open(url).
*/
var responder: UIResponder? = self
while responder != nil {
guard let application = responder as? UIApplication else {
responder = responder?.next
continue
}
application.open(url) { success in
print("open URL: \(success)")
}
break
}
In the above code example, mytestapp:// is the URL schema the container app defines. For more information about that topic, see Defining a custom URL scheme for your app.
To the third part of the question, the answer is no – There is no API for the container app to bring the host app back to the foreground automatically. However, after the container app is launched from a keyboard extension, the system shows a back arrow on the screen's top-left corner, which a user can tap to go back to the host app. I hope that's good enough for your implementation.
Hopefully, this information helps.
Best,
——
Ziqiao Chen
Worldwide Developer Relations.