Hi there,
I recently launched a dj app to the mac app store, and was wondering how I could access songs for mixing purposes via Apple Music just like how serato, rekordbox, djay, and other DJ apps do?
Thanks,
Gunek
Audio
RSS for tagDive into the technical aspects of audio on your device, including codecs, format support, and customization options.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I’m facing a problem while trying to achieve spatial audio effects in my iOS 18 app. I have tried several approaches to get good 3D audio, but the effect never felt good enough or it didn’t work at all.
Also what mostly troubles me is I noticed that AirPods I have doesn’t recognize my app as one having spatial audio (in audio settings it shows "Spatial Audio Not Playing"). So i guess my app doesn't use spatial audio potential.
First approach uses AVAudioEnviromentNode with AVAudioEngine. Chaining position of player as well as changing listener’s doesn’t seem to change anything in how audio plays.
Here's simple how i initialize AVAudioEngine
import Foundation
import AVFoundation
class AudioManager: ObservableObject {
// important class variables
var audioEngine: AVAudioEngine!
var environmentNode: AVAudioEnvironmentNode!
var playerNode: AVAudioPlayerNode!
var audioFile: AVAudioFile?
...
//Sound set up
func setupAudio() {
do {
let session = AVAudioSession.sharedInstance()
try session.setCategory(.playback, mode: .default, options: [])
try session.setActive(true)
} catch {
print("Failed to configure AVAudioSession: \(error.localizedDescription)")
}
audioEngine = AVAudioEngine()
environmentNode = AVAudioEnvironmentNode()
playerNode = AVAudioPlayerNode()
audioEngine.attach(environmentNode)
audioEngine.attach(playerNode)
audioEngine.connect(playerNode, to: environmentNode, format: nil)
audioEngine.connect(environmentNode, to: audioEngine.mainMixerNode, format: nil)
environmentNode.listenerPosition = AVAudio3DPoint(x: 0, y: 0, z: 0)
environmentNode.listenerAngularOrientation = AVAudio3DAngularOrientation(yaw: 0, pitch: 0, roll: 0)
environmentNode.distanceAttenuationParameters.referenceDistance = 1.0 environmentNode.distanceAttenuationParameters.maximumDistance = 100.0
environmentNode.distanceAttenuationParameters.rolloffFactor = 2.0
// example.mp3 is mono sound
guard let audioURL = Bundle.main.url(forResource: "example", withExtension: "mp3") else {
print("Audio file not found")
return
}
do {
audioFile = try AVAudioFile(forReading: audioURL)
} catch {
print("Failed to load audio file: \(error)")
}
}
...
//Playing sound
func playSpatialAudio(pan: Float ) {
guard let audioFile = audioFile else { return }
// left side
playerNode.position = AVAudio3DPoint(x: pan, y: 0, z: 0)
playerNode.scheduleFile(audioFile, at: nil, completionHandler: nil)
do {
try audioEngine.start()
playerNode.play()
} catch {
print("Failed to start audio engine: \(error)")
}
...
}
Second more complex approach using PHASE did better. I’ve made an exemplary app that allows players to move audio player in 3D space. I have added reverb, and sliders changing audio position up to 10 meters each direction from listener but audio seems to only really change left to right (x axis) - again I think it might be trouble with the app not being recognized as spatial.
//Crucial class Variables:
class PHASEAudioController: ObservableObject{
private var soundSourcePosition: simd_float4x4 = matrix_identity_float4x4
private var audioAsset: PHASESoundAsset!
private let phaseEngine: PHASEEngine
private let params = PHASEMixerParameters()
private var soundSource: PHASESource
private var phaseListener: PHASEListener!
private var soundEventAsset: PHASESoundEventNodeAsset?
// Initialization of PHASE
init{
do {
let session = AVAudioSession.sharedInstance()
try session.setCategory(.playback, mode: .default, options: [])
try session.setActive(true)
} catch {
print("Failed to configure AVAudioSession: \(error.localizedDescription)")
}
// Init PHASE Engine
phaseEngine = PHASEEngine(updateMode: .automatic)
phaseEngine.defaultReverbPreset = .mediumHall
phaseEngine.outputSpatializationMode = .automatic //nothing helps
// Set listener position to (0,0,0) in World space
let origin: simd_float4x4 = matrix_identity_float4x4
phaseListener = PHASEListener(engine: phaseEngine)
phaseListener.transform = origin
phaseListener.automaticHeadTrackingFlags = .orientation
try! self.phaseEngine.rootObject.addChild(self.phaseListener)
do{
try self.phaseEngine.start();
}
catch {
print("Could not start PHASE engine")
}
audioAsset = loadAudioAsset()
// Create sound Source
// Sphere
soundSourcePosition.translate(z:3.0)
let sphere = MDLMesh.newEllipsoid(withRadii: vector_float3(0.1,0.1,0.1), radialSegments: 14, verticalSegments: 14, geometryType: MDLGeometryType.triangles, inwardNormals: false, hemisphere: false, allocator: nil)
let shape = PHASEShape(engine: phaseEngine, mesh: sphere)
soundSource = PHASESource(engine: phaseEngine, shapes: [shape])
soundSource.transform = soundSourcePosition
print(soundSourcePosition)
do {
try phaseEngine.rootObject.addChild(soundSource)
}
catch {
print ("Failed to add a child object to the scene.")
}
let simpleModel = PHASEGeometricSpreadingDistanceModelParameters()
simpleModel.rolloffFactor = rolloffFactor
soundPipeline.distanceModelParameters = simpleModel
let samplerNode = PHASESamplerNodeDefinition(
soundAssetIdentifier: audioAsset.identifier,
mixerDefinition: soundPipeline,
identifier: audioAsset.identifier + "_SamplerNode")
samplerNode.playbackMode = .looping
do {soundEventAsset = try
phaseEngine.assetRegistry.registerSoundEventAsset(
rootNode: samplerNode,
identifier: audioAsset.identifier + "_SoundEventAsset")
} catch {
print("Failed to register a sound event asset.")
soundEventAsset = nil
}
}
//Playing sound
func playSound(){
// Fire new sound event with currently set properties
guard let soundEventAsset else { return }
params.addSpatialMixerParameters(
identifier: soundPipeline.identifier,
source: soundSource,
listener: phaseListener)
let soundEvent = try! PHASESoundEvent(engine: phaseEngine,
assetIdentifier: soundEventAsset.identifier,
mixerParameters: params)
soundEvent.start(completion: nil)
}
...
}
Also worth mentioning might be that I only own personal team account
I’m running the iOS 26.2 Public Beta update and my album artwork is missing from the music app (I’m not using Apple Music). I use google to get my album artwork. Do I need to wait for a new update?
Topic:
Media Technologies
SubTopic:
Audio
I have an AUv3 plugin which uses an FFT - which requires n samples before it can produce any output - so, depending on the relation between the host's buffer size and the FFT window size, it may receive a several buffers of samples, producing no output, and then dumping out what it has once a sufficient number of samples have been received.
This means that output is produced in fits and starts, in batches that match the FFT size (modulo oversampling) - e.g. if being fed buffers of 256 samples with an fft size of 1024, the output buffer sizes will be 0 for the first 3 buffers, and upon the fourth, the first 256 processed samples are returned and the remaining 768 cached; the next three buffers will return the remaining cached samples while processing and buffering subsequent ones, and so forth.
The internal mechanics of that I have solved, caching output if the current output buffer is too small, and so forth - so it all works as advertised, and the plugin reports its latency correctly. And when run as an app in demo-mode, playback works as expected.
In the plugin's render block, it captures the number of frames written, and if it is less than the number of frames passed in, adjusts the mDataByteSize of the output buffers to match the actual quantity of data being returned:
unsigned int framesWritten = (unsigned int) processHelper->processWithEvents(inAudioBufferList, outAudioBufferList, timestamp, frameCount, realtimeEventListHead);
if (framesWritten < frameCount) {
for (UInt32 i = 0; i < outAudioBufferList->mNumberBuffers; ++i) {
outAudioBufferList->mBuffers[i].mDataByteSize = framesWritten * 4; // assume 4 byte floats
}
}
However, there are a couple of serious issues:
auval -v fails it with - Render Test at 64 frames, sample rate: 22050 Hz ERROR: Output Buffer Size does not match requested
When connected to Logic Pro, it appears that mDataByteSize is ignored, and the entire allocated buffer is read - audio has sections of silence snipped into it which corresponds the number of empty buffers being returned
If I set Logic's buffer size to 1024 and use a 1024 sample FFT window, the plugin works correctly - but of course a plugin cannot dictate buffer size, and `1024 is too small a window size to be useful for anything but filtering very high frequencies
This seems like it has to be a solvable problem, and most likely the issue is in how my code reports the number of usable samples in the returned buffer.
So, what is the correct way for a plugin to report that it has no samples to return, but will, uh, real soon now?
I know I could convert this plugin to be one that does offline rendering of the entire input, but this is real-time processing, just with a fixed amount of latency, so that should not be necessary.
I'm encountering numerous crashes involving the com.apple.coreaudio.AQClient thread on our application. The crash details are as follows:
#10 com.apple.coreaudio.AQClient
SIGSEGV
SEGV_ACCERR
0 libobjc.A.dylib _objc_msgSend + 44
1 AudioToolbox ClientMessageHandler::PropertyChanged(unsigned int) + 872
2 AudioToolbox ClientAudioQueue::FetchAndDeliverPendingCallbacks(unsigned int) + 924
3 AudioToolbox __XCallbackNotificationsAvailable + 212
4 libAudioToolboxUtility.dylib _mshMIGPerform + 260
5 CoreFoundation ___CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE1_PERFORM_FUNCTION__ + 56
6 CoreFoundation ___CFRunLoopDoSource1 + 596
7 CoreFoundation ___CFRunLoopRun + 2392
8 CoreFoundation _CFRunLoopRunSpecific + 572
9 AudioToolbox CADeprecated::GenericRunLoopThread::Entry(void*) + 156
10 libAudioToolboxUtility.dylib CADeprecated::CAPThread::Entry(CADeprecated::CAPThread*) + 88
11 libsystem_pthread.dylib __pthread_start + 116
All these crashes occur on system versions below iOS/iPadOS 17, primarily when the device's available RAM is low. What steps can I take to resolve this issue? Any insights would be greatly appreciated!
Topic:
Media Technologies
SubTopic:
Audio
My workout watch app supports audio playback during exercise sessions.
When users carry both Apple Watch, iPhone, and AirPods, with AirPods connected to the iPhone, I want to route audio from Apple Watch to AirPods for playback. I've implemented this functionality using the following code.
try? session.setCategory(.playback, mode: .default, policy: .longFormAudio, options: [])
try await session.activate()
When users are playing music on iPhone and trigger my code in the watch app, Apple Watch correctly guides users to select
AirPods, pauses the iPhone's music, and plays my audio.
However, when playback finishes and I end the session using the code below:
try session.setActive(false, options:[.notifyOthersOnDeactivation])
the iPhone
doesn't automatically resume the previously interrupted music playback—it requires manual intervention.
Is this expected behavior, or am I missing other important steps in my code?
FaceTime’s screen-share audio balance is insanely absurd right now. Whenever I share media, the system audio that gets sent through FaceTime is a tiny whisper even at full volume (or even when connected to my speaker or headphones). The moment anyone on the call makes any noise at all, the shared audio ducks so hard it disappears, while the voice (or rustling or air conditioning noise) spikes to painful levels. It’s impossible to watch or listen to anything together. Also, the feature where FaceTime would shrink to a square during screen-sharing has been completely removed. That was a good feature and I'm really confused why it's gone. Now, the FaceTime window stays as a long rectangle that covers part of the content I'm trying to share (unless I do full screen tile, but then I can't pull up any other windows during the call) and can't be made smaller than about a third of the screen. You can't resize the window or adjust its dimensions, so it ends up blocking the actual media you're trying to watch.
Here are some feature requests/fixes that would greatly improve the FaceTime screen-share experience:
Option to adjust the shared media volume independently of call audio.
Disable/toggle the extreme automatic audio docking while screen-sharing
Reintroduce the minimized “floating square” mode or allow full manual resizing and repositioning of the FaceTime window during screen-share sessions.
Overall, this setup makes FaceTime screen-sharing basically unusable. The audio balance is so inconsistent that it’s easier to switch to Zoom or Google Meet, which both handle shared sound correctly and let you move the call window out of the way. Until these issues are fixed, there’s no practical reason to use FaceTime for shared viewing at all.
Hi all,
I’ve implemented the new Core Audio Tap API (AudioHardwareCreateProcessTap with CATapDescription) and I’m seeing consistent level attenuation that scales with the number of stereo output pairs exposed by the target device.
What I observe
Device with 4 stereo pairs (8 outs) → tap shows −12.04 dB relative to source.
True 2-ch devices (built-in speakers, AirPods) → ~0 dB attenuation.
The attenuation appears regardless of whether I:
Create a global (default-output) tap via initStereoGlobalTapButExcludeProcesses:
Or create a per-process/per-device tap via initWithProcesses:andDeviceUID:withStream:
Additionally, the routing choice inside the sending app matters:
App output to “System/Default Output” → I often see no attenuation.
App output directly to a multi-out interface (e.g., RME Fireface) → I see the pair-count-scaled attenuation.
I can query Core Audio for the number of output channels/pairs and gain-compensate (+20·log10(N_pairs) dB) and that matches my measurements for many cases. However, this compensation is not universally correct because it seems to depend on where each process routes its audio (Default Output vs. direct device), even when those processes are included in the same tap aggregate.
Question
Is there a supported way to obtain the raw, unattenuated streams for all processes through the Tap API—i.e., to bypass this automatic headroom/attenuation behavior entirely? If this attenuation is expected by design:
Is there a documented rule for when it applies (global vs. device taps, per-process taps, stream selection, etc.)?
Is there a property/flag to disable it, or a reliable, official method to compute the exact compensation (beyond counting stereo pairs)?
Any guidance on ensuring consistent levels when multiple processes route differently (Default Output vs. direct device) but are captured by the same tap?
Environment
API: AudioHardwareCreateProcessTap + CATapDescription
Devices: built-in output (2-ch), RME Fireface (8+ outs / 4+ stereo pairs)
Behavior reproducible with both global and per-process/per-device tap descriptions.
Attenuation example: 4 stereo pairs → −12.04 dB observed.
Happy to provide a minimal sample, measurements, and device logs. Thanks!
— David
I've tried SpeechTranscriber with a lot of my devices (from iPhone 12 series ~ iPhone 17 series) without issues. However, SpeechTranscriber.isAvailable value is false for my iPhone 11 Pro.
https://developer.apple.com/documentation/speech/speechtranscriber/isavailable
I'am curious why the iPhone 11 Pro device is not supported. Are all iPhone 11 series not supported intentionally? Or is there any problem with my specific device?
I've also checked the supportedLocales, and the value is an empty array.
https://developer.apple.com/documentation/speech/speechtranscriber/supportedlocales
I am trying to use SpeechTranscriber from Speech framework. Is it possible to use it on Simulator of iOS 26 (Mac OS Tahoe)? Function "supportedLocales" returns an empty array.
I am trying to use the new SpeechAnalyzer framework in my Mac app, and am running into an issue for some languages.
When I call AssetInstallationRequest.downloadAndInstall() for some languages, it throws an error:
Error Domain=SFSpeechErrorDomain Code=1 "transcription.ar asset not found after attempted download."
The ".ar" appears to be the language code, which in this case was Arabic.
When I call AssetInventory.status(forModules:) before attempting the download, it is giving me a status of "downloading" (perhaps from an earlier attempt?). If this language was completely unsupported, I would expect it to return a status of "unsupported", so I'm not sure what's going on here.
For other languages (Polish, for example) SpeechTranscriber.supportedLocale(equivalentTo:) is returning nil, so that seems like a clearly unsupported language. But I can't tell if the languages I'm trying, like Arabic, are supported and something is going wrong, or if this error represents something I can work around.
Here's the relevant section of code. The error is thrown from downloadAndInstall(), so I never even get as far as setting up the SpeechAnalyzer itself.
private func setUpAnalyzer() async throws {
guard let sourceLanguage else {
throw Error.languageNotSpecified
}
guard let locale = await SpeechTranscriber.supportedLocale(equivalentTo: Locale(identifier: sourceLanguage.rawValue)) else {
throw Error.unsupportedLanguage
}
let transcriber = SpeechTranscriber(locale: locale, preset: .progressiveTranscription)
self.transcriber = transcriber
let reservedLocales = await AssetInventory.reservedLocales
if !reservedLocales.contains(locale) && reservedLocales.count == AssetInventory.maximumReservedLocales {
if let oldest = reservedLocales.last {
await AssetInventory.release(reservedLocale: oldest)
}
}
do {
let status = await AssetInventory.status(forModules: [transcriber])
print("status: \(status)")
if let installationRequest = try await AssetInventory.assetInstallationRequest(supporting: [transcriber]) {
try await installationRequest.downloadAndInstall()
}
}
...
We require assistance in resolving a critical audio design conflict within our Push-to-Talk (PTT) application. Our current volume amplification strategy—which relies on applying a GAIN factor to PCM samples in conjunction with setting the AVAudioSession category to Playback—is working successfully when PTT is used independently. However, upon integrating and reporting the same PTT call through the CallKit framework, this amplification effect is lost. The CallKit integration appears to be forcing a different, non-amplifying audio session category or configuration, negatively impacting the user's perceived call volume. We need guidance on how to maintain the AVAudioSessionCategoryPlayback setting, or an equivalent high-volume configuration, while operating under the control of CallKit.
Not able to record audio in AAC format with 96 kHz sample rate using AVAudioRecorder or Extended Audio File services with 96 kHz input audio from input device. The audio recording settings used are
let settings: [String: Any] = [
AVFormatIDKey: Int(kAudioFormatMPEG4AAC),
AVSampleRateKey: sampleRate
AVNumberOfChannelsKey: 1
AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue
]
When tried using AVAudioEngine using AVAudioFile,
AVAudioFile(forWriting: fileURL, // file extension .m4a settings: fileSettings,
commonFormat: AVAudioCommonFormat.pcmFormatFloat32, interleaved: interleaved) else { return }
got error
CodecConverterFactory.cpp:977 unable to select compatible encoder sample rate
AudioConverter.cpp:1017 Failed to create a new in process converter -> from 1 ch, 96000 Hz, Float32 to 1 ch, 96000 Hz, aac (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame, with status 1718449215
So experimenting with the new SpeechTranscriber, if I do:
let transcriber = SpeechTranscriber(
locale: locale,
transcriptionOptions: [],
reportingOptions: [.volatileResults],
attributeOptions: [.audioTimeRange]
)
only the final result has audio time ranges, not the volatile results.
Is this a performance consideration? If there is no performance problem, it would be nice to have the option to also get speech time ranges for volatile responses.
I'm not presenting the volatile text at all in the UI, I was just trying to keep statistics about the non-speech and the speech noise level, this way I can determine when the noise level falls under the noisefloor for a while.
The goal here was to finalize the recording automatically, when the noise level indicate that the user has finished speaking.
My app encountered problems when trying to open an x86 audioUnit v2 on a Silicon Mac (although Rosetta is installed).
There seems to be a XPC connection issue with the AUHostingService that I don't know how to fix.
I observed other host apps opening the same plugins without problem, so there is probably something wrong or incompatible in my codes.
I noticed that:
The issue occurs whether or not the app is sandboxed.
The issue does no longer occur when the app itself runs under Rosetta.
There is no error reported by CoreAudio during allocation and initialization of the audio unit. The first notified errors appears when the unit calls AudioUnitRender from the rendering callback.
With most x86 plugins, the error is on first call:
kAudioUnitErr_RenderTimeout
and on any subsequent call:
kAudioComponentErr_InstanceInvalidated
On the UI side, when the Cocoa View is loaded, it appears shortly, then disappears immediately leaving its superview empty.
With another x86 plugin, the Cocoa View is loaded normally, but CoreAudio still emits
kAudioUnitErr_NoConnection
from AudioUnitRender, whether the view has been loaded or not, and the plugin produces no sound.
I also find these messages in the console (printed in that order):
CLIENT ERROR: RemoteAUv2ViewController does not override - and thus cannot react to catastrophic errors beyond logging them
AUAudioUnit_XPC.mm:641 Crashed AU possible component description: aumu/Helm/Tyte
My app uses the AUv2 API and I suspect that working with the AUv3 API would spare me these problems.
However, considering how my audio system is built (audio units are wrapped into C++ classes and most connections between units are managed on the fly from the rendering callback), it would be a lot of work to convert, and I’m even not sure that all I do with the AUv2 API would be possible with the AUv3 API.
I could possibly find an intermediate solution, but in the immediate future I'm looking for the simplest and fastest possible fix. If I cannot find better, I see two fallback options:
In this part of the doc: “Beginning with macOS 11, the system loads audio units into a separate process that depends on the architecture or host preference”, does “host preference” means that it would be possible to disable the “out of process” behavior, for example from the app entitlements or info.plist?
Otherwise, as a last resort, I could completely disable the use of x86 audioUnits when my app runs under ARM64, for at least making things cleaner. But the Audio Component API doesn’t give any info about the plugin architecture, how could I found it?
Any tip or idea about this issue will be much appreciated.
Thanks in advance!
Hi,
I'm working on a project that uses the AVSpeechSynthesizer and AVSpeechUtterance.
I discovered by chance that the AVSpeechSynthesizer automatically completes some words instead of just outputting what it's supposed to.
These are abbreviations for days of the week or months, but not all of them. I don't want either of them automatically completed, but only the specified text. The completion transcends languages.
I have written a short example program for demonstration purposes.
import SwiftUI
import AVFoundation
import Foundation
let synthesizer: AVSpeechSynthesizer = AVSpeechSynthesizer()
struct ContentView: View {
var body: some View {
VStack {
Button {
utter("mon")
} label: {
Text("mon")
}
.buttonStyle(.borderedProminent)
Button {
utter("tue")
} label: {
Text("tue")
}
.buttonStyle(.borderedProminent)
Button {
utter("thu")
} label: {
Text("thu")
}
.buttonStyle(.borderedProminent)
Button {
utter("feb")
} label: {
Text("feb")
}
.buttonStyle(.borderedProminent)
Button {
utter("feb", lang: "de-DE")
} label: {
Text("feb DE")
}
.buttonStyle(.borderedProminent)
Button {
utter("wed")
} label: {
Text("wed")
}
.buttonStyle(.borderedProminent)
}
.padding()
}
private func utter(_ text: String, lang: String = "en-US") {
let utterance = AVSpeechUtterance(string: text)
let voice = AVSpeechSynthesisVoice(language: lang)
utterance.voice = voice
synthesizer.speak(utterance)
}
}
#Preview {
ContentView()
}
Thank you
Christian
Hello!
We stumbled upon a problem with our karaoke app where user on iPhone 16e/iOS 18.5 has problem with mic capture, other users cannot hear him. The mic capture is working fine on 17.5, 16.8. Maybe there is something else we need when configuring AVAudioSession for iOS 18.5?
Currently it's set up like this:
override func viewDidLoad() {
super.viewDidLoad()
UIApplication.shared.isIdleTimerDisabled = true
mRoomId = appDelegate.getRoomId()
let audioSession = AVAudioSession.sharedInstance()
try! audioSession.setCategory(.playAndRecord, mode: .voiceChat, options: [.defaultToSpeaker])
try! audioSession.setPreferredSampleRate(48000)
try! audioSession.setActive(true, options: [])
}
Topic:
Media Technologies
SubTopic:
Audio
Hello,
I am wondering if it is possible to have audio from my AirPods be sent to my speech to text service and at the same time have the built in mic audio input be sent to recording a video?
I ask because I want my users to be able to say "CAPTURE" and I start recording a video (with audio from the built in mic) and then when the user says "STOP" I stop the recording.
Environment
Device: iPhone 16e
iOS Version: 18.4.1 - 18.7.1
Framework: AVFoundation (AVAudioEngine)
Problem Summary
On iPhone 16e (iOS 18.4.1-18.7.1), the installTap callback stops being invoked after resuming from a phone call interruption. This issue is specific to phone call interruptions and does not occur on iPhone 14, iPhone SE 3, or earlier devices.
Expected Behavior
After a phone call interruption ends and audioEngine.start() is called, the previously installed tap should continue receiving audio buffers.
Actual Behavior
After resuming from phone call interruption:
Tap callback is no longer invoked
No audio data is captured
No errors are thrown
Engine appears to be running normally
Note: Normal pause/resume (without phone call interruption) works correctly.
Steps to Reproduce
Start audio recording on iPhone 16e
Receive or make a phone call (triggers AVAudioSession interruption)
End the phone call
Resume recording with audioEngine.start()
Result: Tap callback is not invoked
Tested devices:
iPhone 16e (iOS 18.4.1-18.7.1): Issue reproduces ✗
iPhone 14 (iOS 18.x): Works correctly ✓
iPhone SE 3 (iOS 18.x): Works correctly ✓
Code
Initial Setup (Works)
let inputNode = audioEngine.inputNode
inputNode.installTap(onBus: 0, bufferSize: 4096, format: nil) { buffer, time in
self.processAudioBuffer(buffer, at: time)
}
audioEngine.prepare()
try audioEngine.start()
Interruption Handling
NotificationCenter.default.addObserver(
forName: AVAudioSession.interruptionNotification,
object: AVAudioSession.sharedInstance(),
queue: nil
) { notification in
guard let userInfo = notification.userInfo,
let typeValue = userInfo[AVAudioSessionInterruptionTypeKey] as? UInt,
let type = AVAudioSession.InterruptionType(rawValue: typeValue) else {
return
}
if type == .began {
self.audioEngine.pause()
} else if type == .ended {
try? self.audioSession.setActive(true)
try? self.audioEngine.start()
// Tap callback doesn't work after this on iPhone 16e
}
}
Workaround
Full engine restart is required on iPhone 16e:
func resumeAfterInterruption() {
audioEngine.stop()
inputNode.removeTap(onBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 4096, format: nil) { buffer, time in
self.processAudioBuffer(buffer, at: time)
}
audioEngine.prepare()
try audioSession.setActive(true)
try audioEngine.start()
}
This works but adds latency and complexity compared to simple resume.
Questions
Is this expected behavior on iPhone 16e?
What is the recommended way to handle phone call interruptions?
Why does this only affect iPhone 16e and not iPhone 14 or SE 3?
Any guidance would be appreciated!
Hello,
We are developing a real-time speech recognition application and are utilizing AVAudioEngine with voice processing enabled on the input node. However, we have observed that enabling this mode interferes with the built-in iOS screen recording feature - specifically, the recorded video does not capture any audio when this mode is active.
Since we want users to be able to record their experience within our app, this issue significantly impacts our functionality. Is there a known workaround or recommended approach to ensure that both voice processing and screen recording can function simultaneously?
Any guidance would be greatly appreciated.
Thank you!