So,
I've been wondering how fast a an offline STT -> ML Prompt -> TTS roundtrip would be.
Interestingly, for many tests, the SpeechTranscriber (STT) takes the bulk of the time, compared to generating a FoundationModel response and creating the Audio using TTS.
E.g.
InteractionStatistics:
- listeningStarted: 21:24:23 4480 2423
- timeTillFirstAboveNoiseFloor: 01.794
- timeTillLastNoiseAboveFloor: 02.383
- timeTillFirstSpeechDetected: 02.399
- timeTillTranscriptFinalized: 04.510
- timeTillFirstMLModelResponse: 04.938
- timeTillMLModelResponse: 05.379
- timeTillTTSStarted: 04.962
- timeTillTTSFinished: 11.016
- speechLength: 06.054
- timeToResponse: 02.578
- transcript: This is a test.
- mlModelResponse: Sure! I'm ready to help with your test. What do you need help with?
Here, between my audio input ending and the Text-2-Speech starting top play (using AVSpeechUtterance) the total response time was 2.5s.
Of that time, it took the SpeechAnalyzer 2.1s to get the transcript finalized, FoundationModel only took 0.4s to respond (and TTS started playing nearly instantly).
I'm already using reportingOptions: [.volatileResults, .fastResults] so it's probably as fast as possible right now?
I'm just surprised the STT takes so much longer compared to the other parts (all being CoreML based, aren't they?)
Audio
RSS for tagDive into the technical aspects of audio on your device, including codecs, format support, and customization options.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hello,
I have an existing AUv3 instrument plugin. In the plug in, users can access files (audio files, song projects) via a UIDocumentPickerViewController
In Logic Pro, (and some other hosts, but not all), the document picker is unable to receive touches, while a keyboard case is attached to the iPad.
Removing the case (this is an Apple brand iPad case) allows the interactions to resume and allows me to pick files in the usual way.
One of my users reports this non-responsive behavior occurs even after disconnecting their keyboard.
I have fiddled with entitlements all day, and have determined that is not the issue, since the keyboard disconnection appears to fix it every time for me.
Here is my, very boilerplate, presentation code :
guard let type = UTType("com.my.type") else {
return
}
let fileBrowser = UIDocumentPickerViewController(forOpeningContentTypes: [type])
fileBrowser.overrideUserInterfaceStyle = .dark
fileBrowser.delegate = self
fileBrowser.directoryURL = myFileFolderURL()
self.present(fileBrowser, animated: true) {
Description
As of iOS 18, AVAudioSession.setPreferredIOBufferDuration ignores the requested buffer size when Sound Recognition or Vocal Shortcuts is enabled. This results in 1) much larger buffer sizes and 2) mismatched buffer sizes between input and output buffers, which causes ‘glitchy’ audio and increased latency.
Additionally, when this issue occurs AVAudioSession.setPreferredIOBufferDuration continues to return ‘true’ and no error is produced.
Steps to Reproduce:
Enable Vocal Shortcuts on a device running iOS 18. Enable at least one shortcut (e.g. Control Center).
Open or clone the example project (https://github.com/cwalo/SoundRecognitionBug)
Build and install the example project
Attach a headset and launch the application
Observe console logs showing
a requested buffer size of 0.005805 (256 samples @ 48k)
an actual buffer size of 0.023220 (1104 samples @48k - this is regularly the resulting buffer size in all of our tests)
Quit the app and detach the headset. Enable mutesOutput in AudioSystem.mm (to avoid feedback)
Launch the application
Observe
Same result from step 4
Mismatched hardware buffer size of 1104 and recorded frame count of 1024
Mismatched playbackCount and recordCount
Quit the app and disable vocal shortcuts
Launch the app
Observe IOBufferDuration matching the requested duration and matched buffer sizes (expected behavior)
Expected results:
Requested IOBufferDuration is respected or AVAudioSession returns false or error is produced
Input and output buffer sizes match
Device(s): iPhone 11 Pro, iPad Pro
OS: iOS 18.0.1
Environment: Xcode 16.1
FB: FB15715421
Related to: https://forums.developer.apple.com/forums/thread/765477
I have an app under development - demo here - https://youtu.be/VbAfUk_eYl0?si=s6EDBx-4G6P_QbZO - which is sort of an audio player for airdropped files - something useful to musicians who dump work in progress to their phone, make notes, revise and update.
I've been testing my handling of audio session interruption notifications, but seems to be a lot of inconsistency in how, when and why iOS delivers them, and I'm wondering if there is some rhyme or reason to it that I'm just not detecting.
For example, I am playing a song in my app. Switch to Apple Music and start playing a song there. My app gets an interruption began notification - this is consistent.
Switch back to my app, and about half the time, I will get an interruption ended notification (coupled often with a blast of the tail of whatever audio buffer was partially played when the interruption started, even though the engine was stopped - and followed by call to my AVAudioPlayerNodeCompletionCallback - is there some way to avoid this?). Half the time I don't get an interruption ended notification; my app can (as expected) end the interruption by activating the AVAudioSession and playing something.
I have not been able to determine any pattern to this behavior, other than that if my app started playing using AVAudioPlayerNode.scheduleSegment rather than scheduleFile I think the notification will be consistently delivered on app activation rather than when I activate the session programmatically.
I would like my app to behave deterministically, and would appreciate any help in deciphering what causes the inconsistent behavior in notifications from iOS.
After upgrading to iOS 18.4, I'm no longer able to establish an AirPlay v1 connection to an audio system. The symptom is that the AirPlay route picker just spins when trying to connect to an audio system. It eventually gives up.
I tested this on an iPhone 14, connecting to a HomePod, AirPort express, AppleTV and a Wiim Pro. If I try connecting with AirPlay v2, ex: using Apple Music, the connection succeeds and audio can be played.
I'm the developer of an app that plays audio over AirPlay while also recording. My app has to use AirPlay v1 because AvAudioSession doesn't allow the policy .longFormAudio when the category is .playAndRecord. This issue is a real pain as it means my app is suddenly broken for many thousands of users.
Is anyone else seeing this issue? Any suggestions for a workaround?
I'd like to find out: Can backgrounded apps record audio?
In the past as I recall, I found that backgrounded apps were pretty restricted and couldn't do much of anything.
However I'm not familiar with the current state of affairs.
With iOS 15.8 and above, can backgrounded apps record audio if they've been given permission by the user to access the microphone?
Thanks.
Topic:
Media Technologies
SubTopic:
Audio
My app encountered problems when trying to open an x86 audioUnit v2 on a Silicon Mac (although Rosetta is installed).
There seems to be a XPC connection issue with the AUHostingService that I don't know how to fix.
I observed other host apps opening the same plugins without problem, so there is probably something wrong or incompatible in my codes.
I noticed that:
The issue occurs whether or not the app is sandboxed.
The issue does no longer occur when the app itself runs under Rosetta.
There is no error reported by CoreAudio during allocation and initialization of the audio unit. The first notified errors appears when the unit calls AudioUnitRender from the rendering callback.
With most x86 plugins, the error is on first call:
kAudioUnitErr_RenderTimeout
and on any subsequent call:
kAudioComponentErr_InstanceInvalidated
On the UI side, when the Cocoa View is loaded, it appears shortly, then disappears immediately leaving its superview empty.
With another x86 plugin, the Cocoa View is loaded normally, but CoreAudio still emits
kAudioUnitErr_NoConnection
from AudioUnitRender, whether the view has been loaded or not, and the plugin produces no sound.
I also find these messages in the console (printed in that order):
CLIENT ERROR: RemoteAUv2ViewController does not override - and thus cannot react to catastrophic errors beyond logging them
AUAudioUnit_XPC.mm:641 Crashed AU possible component description: aumu/Helm/Tyte
My app uses the AUv2 API and I suspect that working with the AUv3 API would spare me these problems.
However, considering how my audio system is built (audio units are wrapped into C++ classes and most connections between units are managed on the fly from the rendering callback), it would be a lot of work to convert, and I’m even not sure that all I do with the AUv2 API would be possible with the AUv3 API.
I could possibly find an intermediate solution, but in the immediate future I'm looking for the simplest and fastest possible fix. If I cannot find better, I see two fallback options:
In this part of the doc: “Beginning with macOS 11, the system loads audio units into a separate process that depends on the architecture or host preference”, does “host preference” means that it would be possible to disable the “out of process” behavior, for example from the app entitlements or info.plist?
Otherwise, as a last resort, I could completely disable the use of x86 audioUnits when my app runs under ARM64, for at least making things cleaner. But the Audio Component API doesn’t give any info about the plugin architecture, how could I found it?
Any tip or idea about this issue will be much appreciated.
Thanks in advance!
Hi, when using ApplicationMusicPlayer from MusicKit my app automatically gets the media controls on the lock screen: Play/ Pause, Skip Buttons, Playback Position etc.
I would like to customize these. Tried a bunch of things, e.g. using MPRemoteCommandCenter. So far I haven't had any success.
Does anyone know how I can customize the media controls of ApplicationMusicPlayer.
Thank you.
I am trying to debug the AAX version of my plugin (MIDI effect) on Pro Tools, but I am getting the following error (Mac console) when attempting to load it:
dlsym cannot find symbol g_dwILResult in CFBundle etc..
I used Xcode 16.4 to build the plugin.
Has anybody come across the same or a similar message?
Best,
Achillefs
Axart Labs
I'm experiencing audio issues while developing for visionOS when playing PCM data through AVAudioPlayerNode.
Issue Description:
Occasionally, the speaker produces loud popping sounds or distorted noise
This occurs during PCM audio playback using AVAudioPlayerNode
The issue is intermittent and doesn't happen every time
Technical Details:
Platform: visionOS
Device: vision pro / simulator
Audio Framework: AVFoundation
Audio Node: AVAudioPlayerNode
Audio Format: PCM
I would appreciate any insights on:
Common causes of audio distortion with AVAudioPlayerNode
Recommended best practices for handling PCM playback in visionOS
Potential configuration issues that might cause this behavior
Has anyone encountered similar issues or found solutions? Any guidance would be greatly helpful.
Thank you in advance!
I am unable to access the Int32 error from the errors that CoreAudio throws in Swift type AudioHardwareError. This is critical. There is no way to access the errors or even create an AudioHardwareError to test for errors.
do {
_ = try AudioHardwareDevice(id: 0).streams // will throw
} catch {
if let error = error as? AudioHardwareError { // cast to AudioHardwareError
print(error) // prints error code but not the errorDescription
}
}
How can get reliably get the error.Int32? Or create a AudioHardwareError with an error constant? There is no way for me to handle these error with code or run tests without knowing what the error is.
On top of that, by default the error localizedDescription does not contain the errorDescription unless I extend AudioHardwareError with CustomStringConvertible.
extension AudioHardwareError: @retroactive CustomStringConvertible {
public var description: String {
return self.localizedDescription
}
}
I'm experiencing a significant limitation with MusicKit's Dolby Atmos implementation on macOS and would appreciate clarification on whether this is intended behavior or if there are solutions available.
When streaming Dolby Atmos content through MusicKit's ApplicationMusicPlayer, the output is limited to 2-channel stereo, even when:
Audio MIDI Setup is configured for 7.1.4 (12-channel) output
The same tracks play in full multichannel through the native Apple Music app
Dolby Atmos is set to "Automatic" in Apple Music preferences
Please let me know if there is anyway to enable this. If not, is this documented anywhere? Thanks!
I was testing audio playback from YouTube in Safari, and the sound was clipping heavily. At first, I thought it might be due to the poor quality of my small sound system. However, when I took a screenshot and the screenshot sound effect itself produced a loud clipping noise, it became clear that this is not a mechanical problem with my speakers, nor an issue specific to YouTube or Safari. This appears to be a system-wide audio issue in macOS Tahoe 26 - Beta 5.
I am trying to use AVAudioEngine for recording and playing for a voice chat kind of app, but when the speaker plays any audio while recording, the recording take the speaker audio as input. I want to filter that out. Are there any suggestions for the swift code
I have a simple AVAudioEngine graph as follows:
AVAudioPlayerNode -> AVAudioUnitEQ -> AVAudioUnitTimePitch -> AVAudioUnitReverb -> Main mixer node of AVAudioEngine.
I noticed that whenever I have AVAudioUnitTimePitch or AVAudioUnitVarispeed in the graph, I noticed a very distinct crackling/popping sound in my Airpods Pro 2 when starting up the engine and playing the AVAudioPlayerNode and unable to find the reason why this is happening. When I remove the node, the crackling completely goes away. How do I fix this problem since i need the user to be able to control the pitch and rate of the audio during playback.
import AVKit
@Observable @MainActor
class AudioEngineManager {
nonisolated private let engine = AVAudioEngine()
private let playerNode = AVAudioPlayerNode()
private let reverb = AVAudioUnitReverb()
private let pitch = AVAudioUnitTimePitch()
private let eq = AVAudioUnitEQ(numberOfBands: 10)
private var audioFile: AVAudioFile?
private var fadePlayPauseTask: Task<Void, Error>?
private var playPauseCurrentFadeTime: Double = 0
init() {
setupAudioEngine()
}
private func setupAudioEngine() {
guard let url = Bundle.main.url(forResource: "Song name goes here", withExtension: "mp3") else {
print("Audio file not found")
return
}
do {
audioFile = try AVAudioFile(forReading: url)
} catch {
print("Failed to load audio file: \(error)")
return
}
reverb.loadFactoryPreset(.mediumHall)
reverb.wetDryMix = 50
pitch.pitch = 0 // Increase pitch by 500 cents (5 semitones)
engine.attach(playerNode)
engine.attach(pitch)
engine.attach(reverb)
engine.attach(eq)
// Connect: player -> pitch -> reverb -> output
engine.connect(playerNode, to: eq, format: audioFile?.processingFormat)
engine.connect(eq, to: pitch, format: audioFile?.processingFormat)
engine.connect(pitch, to: reverb, format: audioFile?.processingFormat)
engine.connect(reverb, to: engine.mainMixerNode, format: audioFile?.processingFormat)
}
func prepare() {
guard let audioFile else { return }
playerNode.scheduleFile(audioFile, at: nil)
}
func play() {
DispatchQueue.global().async { [weak self] in
guard let self else { return }
engine.prepare()
try? engine.start()
DispatchQueue.main.async { [weak self] in
guard let self else { return }
playerNode.play()
fadePlayPauseTask?.cancel()
playPauseCurrentFadeTime = 0
fadePlayPauseTask = Task { [weak self] in
guard let self else { return }
while true {
let volume = updateVolume(for: playPauseCurrentFadeTime / 0.1, rising: true)
// Ramp up volume until 1 is reached
if volume >= 1 { break }
engine.mainMixerNode.outputVolume = volume
try await Task.sleep(for: .milliseconds(10))
playPauseCurrentFadeTime += 0.01
}
engine.mainMixerNode.outputVolume = 1
}
}
}
}
func pause() {
fadePlayPauseTask?.cancel()
playPauseCurrentFadeTime = 0
fadePlayPauseTask = Task { [weak self] in
guard let self else { return }
while true {
let volume = updateVolume(for: playPauseCurrentFadeTime / 0.1, rising: false)
// Ramp down volume until 0 is reached
if volume <= 0 { break }
engine.mainMixerNode.outputVolume = volume
try await Task.sleep(for: .milliseconds(10))
playPauseCurrentFadeTime += 0.01
}
engine.mainMixerNode.outputVolume = 0
playerNode.pause()
// Shut down engine once ramp down completes
DispatchQueue.global().async { [weak self] in
guard let self else { return }
engine.pause()
}
}
}
private func updateVolume(for x: Double, rising: Bool) -> Float {
if rising {
// Fade in
return Float(pow(x, 2) * (3.0 - 2.0 * (x)))
} else {
// Fade out
return Float(1 - (pow(x, 2) * (3.0 - 2.0 * (x))))
}
}
func setPitch(_ value: Float) {
pitch.pitch = value
}
func setReverbMix(_ value: Float) {
reverb.wetDryMix = value
}
}
struct ContentView: View {
@State private var audioManager = AudioEngineManager()
@State private var pitch: Float = 0
@State private var reverb: Float = 0
var body: some View {
VStack(spacing: 20) {
Text("🎵 Audio Player with Reverb & Pitch")
.font(.title2)
HStack {
Button("Prepare") {
audioManager.prepare()
}
Button("Play") {
audioManager.play()
}
.padding()
.background(Color.green)
.foregroundColor(.white)
.cornerRadius(10)
Button("Pause") {
audioManager.pause()
}
.padding()
.background(Color.red)
.foregroundColor(.white)
.cornerRadius(10)
}
VStack {
Text("Pitch: \(Int(pitch)) cents")
Slider(value: $pitch, in: -2400...2400, step: 100) { _ in
audioManager.setPitch(pitch)
}
}
VStack {
Text("Reverb Mix: \(Int(reverb))%")
Slider(value: $reverb, in: 0...100, step: 1) { _ in
audioManager.setReverbMix(reverb)
}
}
}
.padding()
}
}
iPadOS 18.3 beta 3 (22D5055b) fixed the issue for me and my 7th generation iPad.
Topic:
Media Technologies
SubTopic:
Audio
Hello,
I’m new here. I'm developing an iOS app and I’d like to know whether it is possible to detect if a phone call is being recorded by another app running in the background.
I’ve already reviewed the documentation for CallKit and AVAudioSession, but I couldn’t find anything related. My expectation was that iOS might provide some callback or API to indicate if a call is being recorded (third-party apps), but so far I haven’t found a way.
My questions are:
Does iOS expose any API to detect if a call is being recorded?
If not, is there any indirect, Apple's policy compliant method (e.g., microphone usage events) that can be relied upon?
Or is this something that iOS explicitly prevents for privacyreasons?
Expecting solutions that align with Apple’s policies and would be accepted under the App Store Review Guidelines.
Thanks in advance for any guidance.
I’ve been researching how to achieve a recording playback effect in iOS similar to the hands-free calling effect in the system’s phone app. How can this be implemented? I tried using the voice chat recording method, but found that the volume of the speaker output is too low. How should this issue be addressed? I couldn’t find a suitable API. Could you provide me with some documentation or sample code? Thank you.
Hi!
I get personal recommendations MusicItemCollection using this code:
func getRecommendations() async throws -> MusicItemCollection<MusicPersonalRecommendation> {
let request = MusicPersonalRecommendationsRequest()
let response = try await request.response()
let recommendations = response.recommendations
return recommendations
}
However, all recommendations contain no more than 12 MusicItem's, while the Music.app application provides much more for some recommendations, for example, for the You recently listened recommendation, the Music.app application displays 40 items. Each recommendation has an items property that contains a collection of musical items MusicItemCollection<MusicPersonalRecommendation.Item>, the hasNextBatch property for these collections is always false. I expected that for some collections loading of new items would be available. Please tell me if I'm doing something wrong or is this a MusicKit bug?
Thank you!
New to iOS development and I've been trying to make heads or tails of the documentation. I know there is a difference between the data fields returned from songs from the user library and from the category, but whenever I search on the apple site I can't find a list of each. For example, Im trying to get the releaseDate of a song in my library, but it seems I'll have to cross-query either the catalog entry for the using song.catalogID or the song.irsc but when I try to use them I can't find a cross reference between the two. I'm totally turned around.
Also trying to determine if a song in my library has been favorited or not? isFavorited (or something similar) doesn't seem to be a thing. Using this code and trying to find a way to display a solid star if the song has been favorited or an empty one if it's not. Seems like a basic request but I can't find anything on how to do it. I've searched docs, googled, tried.
Does apple want us to query the user's Favorited Songs playlist or something? How do I know which playlist that is?
I know isFavorited isn't a thing, just using it here so you can see what my intension is:
HStack(spacing: 10) {
Image(systemName: song.isFavorited ? "star.fill" : "star")
.foregroundColor(song.isFavorited ? .yellow : .gray)
Image(systemName: "magnifyingglass")
}