Speech Recognition Problem in iOS 18.0

Question

Created Aug ’24

Replies 40

Boosts 9

Views 9.3k

Participants 16

It looks like Apple has added some new API(s) to SFSpeechRecognition My app, which is currently listed on App Store does feature speech recognition. Yet, trying to use it under iOS 18.0 throws errors: -[SFSpeechRecognitionTask localSpeechRecognitionClient:speechRecordingDidFail:]_block_invoke Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" What happens is that after several words are transcribed and displayed, the next sentence results in previous words disappearance. That's probably what that portion of the error text - "Ignoring subsequent local speech recording error: Error Domain=kAFAssistantErrorDomain Code=1101 "(null)" means. The problem occurs ONLY when the app is running under iOS 18.0 Even when it's compiled in Xcode 16.0 using iOS 17.5 everything works fine. Any suggestions?

Answered by iaborodin in 857829022

Finally, there's a good news: in iOS 26 RC the problem - Speech Recognition in custom implementation - appears to be fixed! No more disappearing of the transcribed text after a pause. At least in my tests/trials of this functionality, it works as it did before upgrading to iOS 18. Thank you, Apple!

Answer 1

iaborodin OP

Aug ’24

The follow-up to the previous post. It looks likes during dictation if I make even a short break - 1-2 seconds, the previously transcribed words disappear and the transcription starts anew. Let me reiterate, that does NOT happen under iOS 17. The app downloaded from the store functions as expected. Yet the very same code compiled under iOS 18 shows the problem: even a 2-second pause in talking truncates the previously transcribed text and starts anew. The quickest and simplest test is to run Apple demo app 'Spoken Word' Any insights would be greatly appreciated.

Answer 2

iaborodin OP

Sep ’24

One more attempt to get an answer, or at least a cofirmation that the problem does exist. If somebody willing to spend just couple of minutes to run Apple own demo app 'SpokenWord' - https://developer.apple.com/documentation/speech/recognizing_speech_in_live_audio?language=objc - under iOS 18.0 Here's what I ask: start dictation and make a natural pause - say, 1-2 seconds. What you are going to see is that the previously transcribed text is completely truncated and the transcription starts anew. Once again: it's Apple own demo app

Answer 3

34534543456789098767654 OP

Sep ’24

@iaborodin Did you get anywhere with this? I’m looking into fixes and workarounds now with some urgency. Happy to chat, you can reach me at 34534543456789098767654 @ mailer.city.

Answer 4

iaborodin OP

Sep ’24

@34534543456789098767654 No, I haven't found a solution to the problem - intermittent truncation of transcribed dictation. So, I reverted to Apple standard way - using its keyboard dictation functionality, which works as expected. By the way, for some unclear reason(s) after a few instances, the error messages that I mentioned in initial post stopped appearing.

Answer 5

righteoustales OP

Sep ’24

(Context: iphone12 running 17.6.1, XCode Version 15.4 (15F31d))

I see exactly what the OP above reported when running the Apple SpokenWord sample with no changes. However, changing this one line from true to false fixes the problem:

Screenshot 2024-09-13 at 5.40.55 PM.png

I'm fine with the quality of the recognition being different between local and remote (presumably because cloud might be better), but this is not that and this feels very broken. Valid, recognized text is simply being thrown away after brief (speaking) pauses in the local-required case but not in the local-not-required case. In addition, in the case of setting the flag to 'false' to not require local recognition, the workaround still fixes it even when I have completely disabled all network connectivity on the iPhone (ie. it cannot make a remote call and the recognition is, by definition, being done locally).

Other notes of potential interest:

even if the workaround fixes it, part of my requirement is that it can always work whether remote calls are possible or not. Hence, why I set the flag to require local to true in the first place. as reported above the "isFinal" flag is never set to true during the time the earlier results are discarded i'm hearing that ios18 is even worse, specifically that setting the requiresOnDeviceRecognition to false does not help as a workaround. I have not yet verified this on ios18 because it is in beta at this time. Example to repro the bug:

[with requiresOnDeviceRecognition = true] speaking "add 1+2+3+4+ (go as long as you want with no brief pauses)" results in exactly what was spoken. Doing the same with a brief pause followed by "5+6" results in all text preceding "5+6" being thrown away. By "brief pause" I mean 1 1/2 to 2 seconds.

[with requiresOnDeviceRecognition = false] speaking the exact same as above with a pause as long as 2 minutes (maybe longer - I stopped testing at 2 mins) before adding "5+6" results in the full spoken text being returned (ie. the result contains "add 1+2+3+4+5+6". Again, this works even if iPhone networking is completely disabled.

Answer 6

righteoustales OP

Sep ’24

Quick update now that iOS18 is released versus in beta.

The nice workaround I documented above of setting requireOnDeviceRecognition to false no longer works. As of iOS18 the loss of words recognized after a brief pause always happens regardless of that flag being set to true or false.

Apple folks: It would be nice to hear back from you on this. Do you concur that this reported behavior is a bug? Or, if by design, is there a recommended approach for coping with it?

Answer 7

Squids OP

Sep ’24

I just wanted to mention that I’m also experiencing this issue and haven’t found a great working solution. I’ve reached out to Apple.

It's also worth noting that I've only seen this happen on iOS 18.0: After you finish speaking and pause, the transcribe function is called again, causing a duplicate transcript. This usually results in the following transcription overwriting the previous one.

Screenshot 2024-09-16 at 11.00.59 PM.png

Screenshot 2024-09-16 at 10.59.05 PM.png

Answer 8

iaborodin OP

Sep ’24

@Squids "I’ve reached out to Apple"

Me too :), but so far no response.

Answer 9

konstantinfromnizhniynovgorod OP

Sep ’24

Same problem here.

Answer 10

BryanKim84 OP

Sep ’24

Same here.. I wonder if this problem occurs with all devices. Only 15 ProMax is working correctly, as far as I tested.

Answer 11

righteoustales OP

Sep ’24

I shared via Dropbox (should be publicly accessible) a quick video from my iPhone illustrating the issue:

https://www.dropbox.com/scl/fi/ci16tz76q9trxsuv1k1dx/audioBug.MP4?rlkey=pkywy8hanqasxya5myca3ezq4&e=1&dl=0

Answer 12

Squids OP

Sep ’24

I got an email back from DTS team and they told me to submit a bug report via the Feedback Assistant (https://feedbackassistant.apple.com) which I've done under the Developer Technologies & SDKs topic.

I encourage everyone else to submit one as well. Hopefully if enough people submit a report they'll take a look at this bug.

Answer 13

DTS Engineer OP

Apple

Sep ’24

What was that bug number?

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Answer 14

Squids OP

Sep ’24

Sure thing, it's FB15166325

Answer 15

righteoustales OP

Sep ’24

Also. FB15192539

Answer 16

Squids OP

Sep ’24

I may have come up with a solution for now. I closer into SFSpeechRecognitionResult -> SFSpeechRecognitionMetadata and saw that there was a variable 'speechDuration'.

Turns out that speechDuration will spit out how long the previous utterance was. And while speech is coming in it will default to nil. So with that, I created another published var "accumulatedTranscript" and checked to see if speechDuration != nil then append whatever the current transcript is, then reset the transcript to an empty string (to clear out the UI's text).

For the UI I'm using a combined var of accumulatedTranscript + transcript to give the appearance of a continuous stream of text. And from my screenshots you can see it will use the last transcript/final result that comes in after the pause

Some things worth noting:

I haven't seen iOS17 display a non-nil speech duration so this solution shouldn't affect how iOS17 works but there may be some edge cases I'm not able to think of now.
The new transcript appended will begin with a capital letter, you'll want to deal with this however you need to for your app (for me, I'll just make everything past the first word lowercase since the pause timer is finicky).
I haven't done a robust test of this solution yet but I've tested on iOS18 simulator and physical device and iOS17 simulator only
I'm not sure how this workaround will affect any changes Apple might make to address this so, you know, keep that in mind.

Screenshot 2024-09-20 at 3.03.03 PM.png Screenshot 2024-09-20 at 3.04.11 PM.png Screenshot 2024-09-20 at 3.33.01 PM.png

IMG_1304 1.png

Answer 17

DTS Engineer OP

Apple

Sep ’24

Thanks for those bug numbers (FB15166325, FB15192539). Those are both quite new, filed within the last few days, so there no news to report on that front yet.

Does anyone have a bug they filed earlier in the beta cycle?

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Answer 18

iaborodin OP

Sep ’24

@DTS Engineer

Another # FB15245186 though it's even more recent then the previous ones.

Answer 19

peterwarbo OP

Sep ’24

Wow that's pretty incredible this bug snuck into iOS 18.

Answer 20

apple-man OP

Sep ’24

There is also FB15110263 and FB15110251

Answer 21

LokeshKumar OP

Oct ’24

I'm experiencing the same issue on iOS 18, although it works fine on older versions. The problem is that I'm receiving partial results, but the text disappears and returns as empty later in the repeated callbacks.

Adding the screenshot and code for reference here.

import UIKit import Speech

public protocol SpeechRecognizerWrapperDelegate: AnyObject { func speechRecognitionFinished(transcription: String) func speechRecognitionPartialResult(transcription: String) func speechRecognitionRecordingNotAuthorized(statusMessage: String) func speechRecognitionTimedOut() }

public class SpeechRecognizerWrapper: NSObject, SFSpeechRecognizerDelegate { public weak var delegate: SpeechRecognizerWrapperDelegate?

private let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: (LocalData.sharedInstance.UPAppLanguage == LanguageCode.Hindi.rawValue) ? "hi-IN" : "en-IN"))!

private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?

private var recognitionTask: SFSpeechRecognitionTask?

private let audioEngine = AVAudioEngine()
var notAuthorise = true
var noAuthStatus = ""
var allPermissionGranted:(()->())?
public override init() {
    super.init()
    setupSpeechRecognition()
}

private func setupSpeechRecognition() {
    speechRecognizer.delegate = self
}

func requestAuthorization() {
    if SFSpeechRecognizer.authorizationStatus() == .authorized && AVAudioSession.sharedInstance().recordPermission == .granted {
        self.notAuthorise = false
        return
    }
    self.notAuthorise = true
    SFSpeechRecognizer.requestAuthorization { [weak self] authStatus in
        guard let self = self else { return }
        /*
         The callback may not be called on the main thread. Add an
         operation to the main queue to update the record button's state.
         */
        OperationQueue.main.addOperation {
            if authStatus != .authorized {
                self.notAuthorise = true
                self.noAuthStatus = ""
                if authStatus == .denied {
                    self.noAuthStatus = "User denied access to speech recognition"
                } else if authStatus == .restricted {
                    self.noAuthStatus = "Speech recognition restricted on this device"
                }
            } else {
                self.checkTheRecord()
                self.notAuthorise = false
            }
        }
    }
}

func checkTheRecord() {
    switch AVAudioSession.sharedInstance().recordPermission {
    case AVAudioSession.RecordPermission.granted:

// self.allPermissionGranted?() break case AVAudioSession.RecordPermission.denied: break case AVAudioSession.RecordPermission.undetermined: AVAudioSession.sharedInstance().requestRecordPermission({ [weak self] (granted) in if granted { // self?.allPermissionGranted?() } else { self?.notAuthorise = true } }) default: break } }

private var speechRecognitionTimeout: Timer?

public var speechTimeoutInterval: TimeInterval = 2 {
    didSet {
        restartSpeechTimeout()
    }
}

private func restartSpeechTimeout() {
    speechRecognitionTimeout?.invalidate()
    speechRecognitionTimeout = Timer.scheduledTimer(timeInterval: speechTimeoutInterval, target: self, selector: #selector(timedOut), userInfo: nil, repeats: false)
}

public func startRecording() throws {
    if let recognitionTask = recognitionTask {
        recognitionTask.cancel()
        self.audioEngine.stop()
        self.audioEngine.inputNode.removeTap(onBus: 0)
        self.recognitionTask = nil
        self.recognitionRequest = nil
        self.recognitionTask = nil
    }

    let audioSession = AVAudioSession.sharedInstance()
    try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
    try audioSession.setActive(true, options: .notifyOthersOnDeactivation)

    recognitionRequest = SFSpeechAudioBufferRecognitionRequest()

    let inputNode = audioEngine.inputNode

    let mixerNode = AVAudioMixerNode()
    audioEngine.attach(mixerNode)
    audioEngine.connect(inputNode, to: mixerNode, format: nil)

    guard let recognitionRequest = recognitionRequest else { return }

    // Configure request so that results are returned before audio recording is finished
    recognitionRequest.shouldReportPartialResults = true

    // A recognition task represents a speech recognition session.
    // We keep a reference to the task so that it can be cancelled.
    recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { [weak self] result, error in
        guard let self = self else { return }
        var isFinal = false
        if let result = result {
            print("formattedString: \(result.bestTranscription.formattedString)")
            isFinal = result.isFinal
            self.delegate?.speechRecognitionPartialResult(transcription: result.bestTranscription.formattedString)
        }

        if error != nil || isFinal {
            self.audioEngine.stop()
            inputNode.removeTap(onBus: 0)

            self.recognitionRequest = nil
            self.recognitionTask = nil
        }

        if isFinal {
            self.delegate?.speechRecognitionFinished(transcription: result!.bestTranscription.formattedString)
            self.stopRecording()
        } else {
            if error == nil {
                self.restartSpeechTimeout()
            } else {
                // cancel voice recognition
            }
        }
    }

    let recordingFormat = inputNode.outputFormat(forBus: 0)
    inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { [weak self] (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
        guard let self = self else { return }
        self.recognitionRequest?.append(buffer)
    }

    audioEngine.prepare()

    try audioEngine.start()
}

@objc private func timedOut() {
    stopRecording()

    self.delegate?.speechRecognitionTimedOut()
}

public func stopRecording() {
    audioEngine.stop()
    audioEngine.inputNode.removeTap(onBus: 0) // Remove tap on bus when stopping recording.

    recognitionRequest?.endAudio()

    speechRecognitionTimeout?.invalidate()
    speechRecognitionTimeout = nil
}

}

Screenshot 2024-10-01 at 3.24.53 PM.png

Answer 22

jsnbro OP

Oct ’24

iOS 18.1 Beta 5 (22B5054e) seems to have resolved this issue and improved U.S. English language recognition & punctuation.

https://developer.apple.com/download/

Here's hoping its Speech framework makes it into the next release.

Answer 23

righteoustales OP

Oct ’24

18.1 Beta 5 (22B5054e) does not fix it. Not quite.

--- @jsnbro stated above "iOS 18.1 Beta 5 (22B5054e) seems to have resolved this issue and improved U.S. English language recognition & punctuation."

I upgraded to 22B5054e to re-test. What I am seeing is not quite a fix. It seems to have reverted back to the behavior I saw (and reported in this thread on page 1) on iOS 17.6, specifically this:

the bug does not manifest if you set requiresOnDeviceRecognition = false
the bug does manifest if you set requiresOnDeviceRecognition = true

As before I am using Apple's SpokenWord example app to test.

My first bug report here was using: (Context: iphone12 running 17.6.1, XCode Version 15.4 (15F31d))

For this update: (Context: iphone12 running 18.1 Beta (22B5054e), XCode Version 16.0 (16A242d))

Tagging you, @DTS Engineer. Looks like your efforts are helping.

Answer 24

DTS Engineer OP

Apple

Oct ’24

Looks like your efforts are helping.

Nah, I’m just watching the bugs go by |-:

Seriously though folks, if you have a product that’s affected by this issue and you haven’t already filed a bug, please do so, and post your bug number here, just for the record.

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

FB15166325, FB15192539, FB15245186, FB15110263, FB15110251

Answer 25

iaborodin OP

Oct ’24

@DTS Engineer

I did receive a request from Apple to clarify the framework(s):

"Apple Sep 26, 2024 at 1:53 PM Engineering has requested the following information regarding your report:

Is this with mainstream Dictation or Voice Control?"

Sure enough I clarified that it's Dictation and the frameworks I used are SFSpeechRecognizer, SFSpeechAudioBufferRecognitionRequest, etc.

As you can see it was a week ago, and so far I haven't heard from them.

What surprises me is that I don't see other reports on that bug. All of the numbers except mine - FB15245186 - return 'Not found'. Needless to say that if I get any response, I'll post it here.