According to the documentation (https://developer.apple.com/documentation/avfoundation/avplayeritem/externalmetadata), AVPlayerItem should have an externalMetadata property. However it does not appear to be visible to my app. When I try, I get:
Value of type 'AVPlayerItem' has no member 'externalMetadata'
Documentation states iOS 12.2+; I am building with a minimum deployment target of iOS 18.
Code snippet:
import Foundation
import AVFoundation
/// ... in function ...
// create metadata as described in https://developer.apple.com/videos/play/wwdc2022/110338
var title = AVMutableMetadataItem()
title.identifier = .commonIdentifierAlbumName
title.value = "My Title" as NSString?
title.extendedLanguageTag = "und"
var playerItem = await AVPlayerItem(asset: composition)
playerItem.externalMetadata = [ title ]
Audio
RSS for tagDive into the technical aspects of audio on your device, including codecs, format support, and customization options.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Let's consider the following code.
I've created an actor that loads a list of .mp3 files from a Bundle and then makes it available for audio reproduction.
Unfortunately, I'm experiencing a memory leak.
At the play method.
player.play()
From Instruments I get
_malloc_type_malloc_outlined libsystem_malloc.dylib
start_wqthread libsystem_pthread.dylib
private actor AudioActor {
enum Failure: Error {
case soundsNotLoaded([AudioPlayerClient.Sound: Error])
}
enum Player {
case music(AVAudioPlayer)
}
var players: [Sound: Player] = [:]
let bundles: [Bundle]
init(bundles: UncheckedSendable<[Bundle]>) {
self.bundles = bundles.wrappedValue
}
func load(sounds: [Sound]) throws {
try AVAudioSession.sharedInstance().setActive(true, options: [])
var errors: [Sound: Error] = [:]
for sound in sounds {
guard let url = bundle.url(forResource: sound.name, withExtension: "mp3")
else { continue }
do {
self.players[sound] = try .music(AVAudioPlayer(contentsOf: url))
} catch {
errors[sound] = error
}
}
guard errors.isEmpty
else { throw Failure.soundsNotLoaded(errors) }
}
func play(sound: Sound, loops: Int?) throws {
guard let player = self.players[sound]
else { return }
switch player {
case let .music(player):
player.numberOfLoops = loops ?? -1
player.play()
}
}
func stop(sound: Sound) throws {
guard let player = self.players[sound]
else { throw Failure.soundsNotLoaded([:]) }
switch player {
case let .music(player):
player.stop()
}
}
}
I've got a setup using AVAudioEngine with several tone generator nodes, each with a chain of processing nodes, the chains then mixed into the main output.
Generator ➡️ Effect ➡️... ➡️ .mainMixerNode ➡️ .outputNode).
Generator ➡️ Effect ➡️... ⤴️
...
Generator ➡️ Effect ➡️... ⤴️
The user should be able to mute any chain individually. I've found several potential approaches to muting, but not terribly happy with any of them.
Adjust the amplitudes directly in my tone generators. Issue: Consumes CPU even when completely muted. 4 generators adds ~15% cpu, even when all chains are muted.
Detach/attach chains that are muted/unmuted. Issue: Causes loud clicking/popping sounds whenever muted/unmuted.
Fade mixer output volume while detaching/attaching a chain (just cutting the volume immediately to 0 doesn't get rid of the clicking/popping). Issue: Causes all channels to fade during the transition, so not ideal.
The rest of these ideas are variations on making volume control+detatch/attach work for individual chains, since approach #3 worked well.
Add an AVAudioMixer to the end of each chain (just for volume control). Issue: Only the mixer on the final chain functions -- the others block all output. Not sure what's going on there.
Use matrix mixer (for multi-input volume control). Plus detach/attach to reduce CPU if necessary. Not yet attempted, due to perceived complexity and reports of fragility in order of wiring in. A bunch of effort before I even know if it's going to work.
Develop my own fader node to put on the end of each channel. Unlike the tone generator (simple AVSourceNode), developing an effect node seems complex and time consuming. Might not even fix CPU use.
I'm not completely averse to the learning curve of either 5 or 6, but would rather get some guidance on best approach before diving in. They both seem likely to take more effort than I'd like for the simple behavior I'm trying to achieve.
I'm using the new SpeechAnalyzer framework to detect certain commands and want to improve accuracy by giving context. Seems like AnalysisContext is the solution for this, but couldn't find any usage example. So I want to make sure that I'm doing it right or not.
let context = AnalysisContext()
context.contextualStrings = [
AnalysisContext.ContextualStringsTag("commands"): [
"set speed level",
"set jump level",
"increase speed",
"decrease speed",
...
],
AnalysisContext.ContextualStringsTag("vocabulary"): [
"speed", "jump", ...
]
]
try await analyzer.setContext(context)
With this implementation, it still gives outputs like "Set some speed level", "It's speed level", etc.
Also, is it possible to make it expect number after those commands, in order to eliminate results like "set some speed level to" (instead of two).
Session player regions populate blank, with no sound media when tracks or regions are created.
Hi folks - I'm having trouble finding specific documentation about Audio Unit MIDI plugins - as in MIDI -only. Any suggestions welcome as searches aren't returning much. (too niche? user error?)
Topic:
Media Technologies
SubTopic:
Audio
I am developing an app that uses MusicKit to play music and then I need to have spoken words played to the user, while ducking the audio coming from MusicKit (application music player)
the built in Siri voices are not off sufficient quality so I am using an external service to create an mp3 file and then play this back using AVAudioSession
Sample code below
the problem I am having is that .duckOthers is not ducking the Application Music Player output
Is this a bug or am I doing this wrong?
// Configure audio session for system-wide ducking
try AVAudioSession.sharedInstance().setCategory(.playback, mode: .spokenAudio, options: [.duckOthers, .mixWithOthers])
try AVAudioSession.sharedInstance().setActive(true)
// Set the ducking level to maximum
try AVAudioSession.sharedInstance().setPreferredIOBufferDuration(0.005)
// Create and configure audio player
self.audioPlayer = try AVAudioPlayer(data: audioData)
self.audioPlayer?.delegate = self
self.audioPlayer?.volume = 1.0 // Ensure full volume for speech
self.audioPlayer?.prepareToPlay()
// Set the audio player's settings for maximum clarity
self.audioPlayer?.enableRate = false
self.audioPlayer?.pan = 0.0 // Center the audio
self.audioPlayer?.play()
hi all,
as soon an audio is played in a whatever app, coreaudiod inserts a sleep prevent assertion for both, the system AND the display.
can i somehow stop the insertion of the display sleep assertion?
pid 223(coreaudiod): [0x00004e9e00058dc2] 00:03:18 PreventUserIdleDisplaySleep named: "com.apple.audio.AppleGFXHDAEngineOutputDP:10001:0:{B31A-08C6-00000000}.context.preventuseridledisplaysleep"
Created for PID: 4145.
where PID 4145 is spotify.
but it doesn't matter which app is playing the audio.
any help would be appreciated
thanks
Topic:
Media Technologies
SubTopic:
Audio
Is there a way to destroy MIDIUMPMutableEndpoint again?
In my app, the user has a setting to enable and disable MIDI 2.0. If MIDI 2.0 should not be supported (or if iOS version < 18), it creates a virtual destination and a virtual source. And if MIDI 2.0 should be enabled, it instead creates a MIDIUMPMutableEndpoint, which itself creates the virtual destination and source automatically.
So here is my problem: I didn't find any way to destroy the MIDIUMPMutableEndpoint again. There is a method to disable it (setEnabled:NO), but that doesn't destroy or hide the virtual destination and source. So when the user turns MIDI 2.0 support off, I will have two virtual destinations and sources, and cannot get rid of the 2.0 ones.
What is the correct way to get rid of the MIDIUMPMutableEndpoint once it is created?
I’m developing a macOS audio monitoring app using AVAudioEngine, and I’ve run into a critical issue on macOS 26 beta where AVFoundation fails to detect any input devices, and AVAudioEngine.start() throws the familiar error 10877.
FB#: FB19024508
Strange Behavior:
AVAudioEngine.inputNode shows no channels or input format on bus 0.
AVAudioEngine.start() fails with -10877 (AudioUnit connection error).
AVCaptureDevice.DiscoverySession returns zero audio devices.
Microphone permission is granted (authorized), and the app is properly signed and sandboxed with com.apple.security.device.audio-input.
However, CoreAudio HAL does detect all input/output devices:
Using AudioObjectGetPropertyDataSize and AudioObjectGetPropertyData with kAudioHardwarePropertyDevices, I can enumerate 14+ devices, including AirPods, USB DACs, and BlackHole.
This suggests the lower-level audio stack is functional.
I have tried:
Resetting CoreAudio with sudo killall coreaudiod
Rebuilding and re-signing the app
Clearing TCC with tccutil reset Microphone
Running on Apple Silicon and testing Rosetta/native detection via sysctl.proc_translated
Using a fallback mechanism that logs device info from HAL and rotates logs for submission via Feedback Assistant
I have submitted logs and a reproducible test case via Feedback Assitant : FB#: FB19024508]
Hello,
i can successfully match music using shazamkit on Apple using SwiftUI, a simple app that let user to load an audio file and exctracts the relative match, while i am unable to match music using shamzamkit on Android. I am trying to make the same simple app but i cannot match music as i get MATCH_ATTEMPT_FAILED every time i try to. I don't know what i am doing wrong but the shazam part in the kotlin Android code is in this method :
suspend fun processAudioFileInBackground(
filePath: String,
developerTokenProvider: DeveloperTokenProvider
) = withContext(Dispatchers.IO) {
val bufferSize = 1024 * 1024
val audioFile = FileInputStream(filePath)
val byteBuffer = ByteBuffer.allocate(bufferSize)
byteBuffer.order(ByteOrder.LITTLE_ENDIAN)
var bytesRead: Int
while (audioFile.read(byteBuffer.array()).also { bytesRead = it } != -1) {
val signatureGenerator = (ShazamKit.createSignatureGenerator(AudioSampleRateInHz.SAMPLE_RATE_44100) as ShazamKitResult.Success).data
signatureGenerator.append(byteBuffer.array(), bytesRead, System.currentTimeMillis())
val signature = signatureGenerator.generateSignature()
println("Signature: ${signature.durationInMs}")
val catalog = ShazamKit.createShazamCatalog(developerTokenProvider, Locale.ENGLISH)
val session = (ShazamKit.createSession(catalog) as ShazamKitResult.Success).data
val matchResult = session.match(signature)
println("MatchResult : $matchResult")
setMatchResult(matchResult)
byteBuffer.clear()
}
audioFile.close()
}
I noticed that changing Locale in catalog creation results in different result as i get NoMatch without exception. Can you please help me with this? Do i need to create a custom catalog?
I'm seeing crashes in _MPRemoteCommandEventDispatch on iOS 26.x devices in 3 apps. According to Bugsnag logs they are:
NSInternalInconsistencyException: event dispatch <_MPRemoteCommandEventDispatch: <MPRemoteCommandEvent: 0x11c049500 commandID=THV0 command=<MPRemoteCommand: 0x109ad1ea0 type=Play (0) enabled=YES handlers=[0x109b6a310]> sourceID=(null) ([HostedRoutingSessionDataSource] handleControlSendingCommand<2W5E>)> state:201> deallocated without calling continuation
I attached a log from Xcode organizer matching Bugsnag crash.
mpr_remote_command_event.crash
When I set the brakpoint on the -[_MPRemoteCommandEventDispatch dealloc] I can see it it's hit every time I tap play or pause on locked screen play button.
Thread 0 Crashed:
0 libsystem_kernel.dylib 0x00000002370420cc __pthread_kill + 8 (:-1)
1 libsystem_pthread.dylib 0x00000001e975c810 pthread_kill + 268 (pthread.c:1721)
2 libsystem_c.dylib 0x0000000198f8ff64 abort + 124 (abort.c:122)
3 libc++abi.dylib 0x000000018a7cf808 __abort_message + 132 (abort_message.cpp:66)
4 libc++abi.dylib 0x000000018a7be484 demangling_terminate_handler() + 304 (cxa_default_handlers.cpp:76)
5 libobjc.A.dylib 0x000000018a6cff78 _objc_terminate() + 156 (objc-exception.mm:496)
6 xxxxxxxxxxxxxx 0x00000001003a7db8 CPPExceptionTerminate() + 416 (BSG_KSCrashSentry_CPPException.mm:156)
7 libc++abi.dylib 0x000000018a7cebdc std::__terminate(void (*)()) + 16 (cxa_handlers.cpp:59)
8 libc++abi.dylib 0x000000018a7ceb80 std::terminate() + 108 (cxa_handlers.cpp:88)
9 CoreFoundation 0x000000018d7341c4 __CFRunLoopPerCalloutARPEnd + 256 (CFRunLoop.c:769)
10 CoreFoundation 0x000000018d70bb5c __CFRunLoopRun + 1976 (CFRunLoop.c:3179)
11 CoreFoundation 0x000000018d70aa6c _CFRunLoopRunSpecificWithOptions + 532 (CFRunLoop.c:3462)
12 GraphicsServices 0x000000022e31c498 GSEventRunModal + 120 (GSEvent.c:2049)
13 UIKitCore 0x00000001930ceba4 -[UIApplication _run] + 792 (UIApplication.m:3902)
14 UIKitCore 0x0000000193077a78 UIApplicationMain + 336 (UIApplication.m:5577)
15 xxxxxxxxxxxxxx 0x00000001000c0134 main + 308 (main.swift:15)
16 dyld 0x000000018a722e28 start + 7116 (dyldMain.cpp:1477)
Is the crash happening when the app is being terminated?
Thank you!
I have an AUv3 plugin which uses an FFT - which requires n samples before it can produce any output - so, depending on the relation between the host's buffer size and the FFT window size, it may receive a several buffers of samples, producing no output, and then dumping out what it has once a sufficient number of samples have been received.
This means that output is produced in fits and starts, in batches that match the FFT size (modulo oversampling) - e.g. if being fed buffers of 256 samples with an fft size of 1024, the output buffer sizes will be 0 for the first 3 buffers, and upon the fourth, the first 256 processed samples are returned and the remaining 768 cached; the next three buffers will return the remaining cached samples while processing and buffering subsequent ones, and so forth.
The internal mechanics of that I have solved, caching output if the current output buffer is too small, and so forth - so it all works as advertised, and the plugin reports its latency correctly. And when run as an app in demo-mode, playback works as expected.
In the plugin's render block, it captures the number of frames written, and if it is less than the number of frames passed in, adjusts the mDataByteSize of the output buffers to match the actual quantity of data being returned:
unsigned int framesWritten = (unsigned int) processHelper->processWithEvents(inAudioBufferList, outAudioBufferList, timestamp, frameCount, realtimeEventListHead);
if (framesWritten < frameCount) {
for (UInt32 i = 0; i < outAudioBufferList->mNumberBuffers; ++i) {
outAudioBufferList->mBuffers[i].mDataByteSize = framesWritten * 4; // assume 4 byte floats
}
}
However, there are a couple of serious issues:
auval -v fails it with - Render Test at 64 frames, sample rate: 22050 Hz ERROR: Output Buffer Size does not match requested
When connected to Logic Pro, it appears that mDataByteSize is ignored, and the entire allocated buffer is read - audio has sections of silence snipped into it which corresponds the number of empty buffers being returned
If I set Logic's buffer size to 1024 and use a 1024 sample FFT window, the plugin works correctly - but of course a plugin cannot dictate buffer size, and `1024 is too small a window size to be useful for anything but filtering very high frequencies
This seems like it has to be a solvable problem, and most likely the issue is in how my code reports the number of usable samples in the returned buffer.
So, what is the correct way for a plugin to report that it has no samples to return, but will, uh, real soon now?
I know I could convert this plugin to be one that does offline rendering of the entire input, but this is real-time processing, just with a fixed amount of latency, so that should not be necessary.
I have an AUv3 that passes all validation and can be loaded into Logic Pro without issue. The UI for the plug in can be any aspect ratio but Logic insists on presenting it in a view with a fixed aspect ratio. That is when resizing, both the height and width are resized. I have never managed to work out what it is I need to do specify to Logic to allow the user to resize width or height independently of each other.
Can anyone tell me what I need to specify in the AU code that will inform Logic that the view can be resized from any side of the window/panel?
Hi 👋! We have a SpriteKit-based app where we play AVAudio sounds in three different ways:
Effects (incl. UI sounds) with AVAudioPlayer.
Long looping tracks with AVAudioPlayer.
Short animation effects on the timeline of SpriteKit's SKScene files (effectively SKAudioNode nodes).
We've found that when you exit the app or otherwise interrupt audio plays, future audio plays often fail. For example, there's a WebKit-based video trailer inside the app, and if you play it, our looping background music track (2.) will stop playing, and won't resume as you close the trailer (return from WebKit). This is probably due to us not manually restarting the track (so may well be easily fixed). Periodically played AVAudioPlayer audio (1.) are not affected.
However, the more concerning thing is that the audio tracks on SKScene file timelines (3.) will no longer play. My hypothesis is that AVAudioEngine gets interrupted, and needs to be restarted for those AVAudioNode elements to regain functionality. Thing is, we don't deal with AVAudioEngine at all currently in the app, meaning it is never initiated to begin with.
Obviously things return to normal when you remove the app from short-term memory and restart it. However, it seems many of our users aren't doing this, and often report audio failing presumably due to some interruption in the past without the app ever being cleared from memory.
Any idea why timeline-run SKAudioNodes would fail like this? Should the app react to app backgrounding/foregrounding regarding audio?
Any help would be very much appreciated ✌️!
I’m seeing what appears to be an iOS audio-session issue that occurs only when a phone call happens while the app is in the background.
API: AVAudioSession, AVAudioRecorder
Background Modes: Audio enabled (UIBackgroundModes = audio)
Category: .playAndRecord
Microphone permission: granted
Expected Behavior
If the app is recording audio in the background and a phone call interrupts it:
AVAudioSession.interruptionNotification(.began) fires
Call ends
AVAudioSession.interruptionNotification(.ended) fires
App should be able to re-activate its audio session and resume or restart recording
Apple documentation suggests this should be supported for background audio apps.
Actual Behavior
When the app is in the background and phone call is ended:
AVAudioSession.interruptionNotification(.ended) does fire
Attempting to reactivate the audio session always fails:
Error Domain=NSOSStatusErrorDomain
Code=560557684 ("!int")
"Session activation failed"
The session appears to remain permanently “interrupted”
Retrying activation (with delays) does not help
Recreating AVAudioRecorder does not help
Reactivation works only after the app is opened again
Hey everyone,
I'm encountering an issue with audio sample rate conversion that I'm hoping someone can help with. Here's the breakdown:
Issue Description:
I've installed a tap on an input device to convert audio to an optimal sample rate.
There's a converter node added on top of this setup.
The problem arises when joining Zoom or FaceTime calls—the converter gets deallocated from memory, causing the program to crash.
Symptoms:
The converter node is being deallocated during video calls.
The program crashes entirely when this happens.
Traditional methods of monitoring sample rate changes (tracking nominal or actual sample rates) aren't working as expected.
The Big Challenge:
I can't figure out how to properly monitor sample rate changes.
Listeners set up to track these changes don't trigger when the device joins a Zoom or FaceTime call.
Please, if anyone has experience with this or knows a solution, I'd really appreciate your help. Thanks in advance!
I am work an app development on an app which request an audio function in background as an alert sound.
during debug testing , the function work fine,
but once I testing standalone without debugging , The function not work , it will play out the sound when I back to app.
does any way to trace the issues ?
A bit of a novice to app development here but I have a paid developer account, I have registered the identifier for MusicKit on the developer website (using the bundle identifier I've selected in Xcode) but the option to add MusicKit as a capability is not available in Xcode?
I've manually updated the certificates, closed the app and reopened it, started a new project and tried with a different demo project?
Apologies if I am missing something obvious but could someone help me get this capability added?
I am a graduate student conducting research in speech/audio signal processing and multimodal interaction.
Apple Vision Pro is widely recognized as a multimodal interactive system supporting voice, eye, and gesture inputs. However, I could not find detailed specifications or documentation about the audio input sampling rate used by the device’s built-in microphone array when capturing user audio.
Specifically, I would like to understand:
What is the default audio input sampling rate (e.g., 16 kHz, 44.1 kHz, 48 kHz, etc.) for the Vision Pro’s microphones?
When developing with visionOS / AVAudioSession / AVAudioEngine, is there a documented or recommended sampling rate for audio capture?
Are there any best practices or settings for enabling high-quality voice capture on Vision Pro (especially for voice research tasks)?
For context, my work involves voice processing, analysis, and possibly on-device real-time speech recognition. Any pointers to relevant APIs, documentation or examples (especially regarding audio capture buffer size or available formats on visionOS) would be very helpful.
Thank you in advance!
Best regards.