SoundRecognition causes Input/Output callbacks to have varying Buffer sizes and introduces Glitching

Hello,

We have noticed an issue with SoundRecognition that causes glitching with our AudioUnit setup in Smule.

  • Input and output frame sizes are inconsistent.
  • Input frame size does not match [AVAudioSession sharedInstance].IOBufferDuration
  • My best guess is that SoundRecognition influences the input frame size and not the output frame size.

To reproduce use the example app here: https://github.com/MarkoGill/SoundRecognitionBug

Hardware/OS

  • iPhone 14 Pro on iOS 18 -> Experiences the problem
  • iPhone 11 on iOS 18 -> Experiences the problem
  • iPhone 15 on iOS 18 -> Not experiencing the problem

Reproduction Steps

  1. Enable Sound Recognition (Settings > Accessibility > Sound Recognition > On)
  2. Enable a Sound for detection (Sounds > Dog > On)
  3. Open the example app with headset (it routes input to output)
  4. Notice glitching occurs
  5. Check the logs. Record and Playback buffer sizes vary
Example Log:
AU input sample rate: 48000.000000
AU output sample rate: 48000.000000
hardware sample rate: 48000.000000
hardware buffer size: 1104.000000
updated record frame counts:  1024
updated playback frame counts:  1104

Notes:

You can disable Sound Recognition, restart the app, and playback behaves correctly.

Hello @M_Dawgy, thank you for your post. Our engineering teams need to investigate this issue, as resolution may involve changes to Apple's software. I'd greatly appreciate it if you could open a bug report, include a sysdiagnose, and post the FB number here once you do. Bug Reporting: How and Why? has tips on creating your bug report.

Here is the requested Bug Report with sysdiagnose

https://feedbackassistant.apple.com/feedback/15387048

We see a similar problem when Voice Commands are enabled.

We're seeing the same issue, particularly when Vocal Shortcuts is enabled. https://developer.apple.com/forums/thread/769245 (forked your project to extract some other details while testing)

SoundRecognition causes Input/Output callbacks to have varying Buffer sizes and introduces Glitching
 
 
Q