Here is the demo from Apple's site
This issues is specific to iOS 18.
When running this demo, we are getting new text when we have a gap in speaking, the recognitionTask(with:resultHandler:) provides new text which is only spoken after the gap and not the concatenation of old text and the new spoken text.
AVAudioEngine
RSS for tagUse a group of connected audio node objects to generate and process audio signals and perform audio input and output.
Posts under AVAudioEngine tag
41 Posts
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I'm using an AVAudioConverter object to decode an OPUS stream for VoIP. The decoding itself works well, however, whenever the stream stalls (no more audio packet is available to decode because of network instability) this can be heard in crackling / abrupt stop in decoded audio. OPUS can mitigate this by indicating packet loss by passing a null pointer in the C-library to
int opus_decode_float (OpusDecoder * st, const unsigned char * data, opus_int32 len, float * pcm, int frame_size, int decode_fec), see https://opus-codec.org/docs/opus_api-1.2/group__opus__decoder.html#ga9c554b8c0214e24733a299fe53bb3bd2.
However, with AVAudioConverter using Swift I'm constructing an AVAudioCompressedBuffer like so:
let compressedBuffer = AVAudioCompressedBuffer(
format: VoiceEncoder.Constants.networkFormat,
packetCapacity: 1,
maximumPacketSize: data.count
)
compressedBuffer.byteLength = UInt32(data.count)
compressedBuffer.packetCount = 1
compressedBuffer.packetDescriptions!
.pointee.mDataByteSize = UInt32(data.count)
data.copyBytes(
to: compressedBuffer.data
.assumingMemoryBound(to: UInt8.self),
count: data.count
)
where data: Data contains the raw OPUS frame to be decoded.
How can I specify data loss in this context and cause the AVAudioConverter to output PCM data whenever no more input data is available?
More context:
I'm specifying the audio format like this:
static let frameSize: UInt32 = 960
static let sampleRate: Float64 = 48000.0
static var networkFormatStreamDescription =
AudioStreamBasicDescription(
mSampleRate: sampleRate,
mFormatID: kAudioFormatOpus,
mFormatFlags: 0,
mBytesPerPacket: 0,
mFramesPerPacket: frameSize,
mBytesPerFrame: 0,
mChannelsPerFrame: 1,
mBitsPerChannel: 0,
mReserved: 0
)
static let networkFormat =
AVAudioFormat(
streamDescription:
&networkFormatStreamDescription
)!
I've tried 1) setting byteLength and packetCount to zero and 2) returning nil but setting .haveData in the AVAudioConverterInputBlock I'm using with no success.
Hello!
I'm experiencing an issue with iOS's audio routing system when trying to use Bluetooth headphones for audio output while also recording environmental audio from the built-in microphone.
Desired behavior:
Play audio through Bluetooth headset (AirPods)
Record unprocessed environmental audio from the iPhone's built-in microphone
Actual behavior:
When explicitly selecting the built-in microphone, iOS reports it's using it (in currentRoute.inputs)
However, the actual audio data received is clearly still coming from the AirPods microphone
The audio is heavily processed with voice isolation/noise cancellation, removing environmental sounds
Environment Details
Device: iPhone 12 Pro Max
iOS Version: 18.4.1
Hardware: AirPods
Audio Framework: AVAudioEngine (also tried AudioQueue)
Code Attempted
I've tried multiple approaches to force the correct routing:
func configureAudioSession() {
let session = AVAudioSession.sharedInstance()
// Configure to allow Bluetooth output but use built-in mic
try? session.setCategory(.playAndRecord,
options: [.allowBluetoothA2DP, .defaultToSpeaker])
try? session.setActive(true)
// Explicitly select built-in microphone
if let inputs = session.availableInputs,
let builtInMic = inputs.first(where: { $0.portType == .builtInMic }) {
try? session.setPreferredInput(builtInMic)
print("Selected input: \(builtInMic.portName)")
}
// Log the current route
let route = session.currentRoute
print("Current input: \(route.inputs.first?.portName ?? "None")")
// Configure audio engine with native format
let inputNode = audioEngine.inputNode
let nativeFormat = inputNode.inputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: nativeFormat) { buffer, time in
// Process audio buffer
// Despite showing "Built-in Microphone" in route, audio appears to be
// coming from AirPods with voice isolation applied - welp!
}
try? audioEngine.start()
}
I've also tried various combinations of:
Different audio session modes (.default, .measurement, .voiceChat)
Different option combinations (with/without .allowBluetooth, .allowBluetoothA2DP)
Setting session.setPreferredInput() both before and after activation
Diagnostic Observations
When AirPods are connected:
AVAudioSession.currentRoute.inputs correctly shows "Built-in Microphone" after setPreferredInput()
The actual audio data received shows clear signs of AirPods' voice isolation processing
Background/environmental sounds are actively filtered out...
When recording a test audio played near the phone (not through the app), the recording is nearly silent. Only headset voice goes through.
Questions
Is there a workaround to force iOS to actually use the built-in microphone while maintaining Bluetooth output?
Are there any lower-level configurations that might resolve this issue?
Any insights, workarounds, or suggestions would be greatly appreciated. This is blocking a critical feature in my application that requires environmental audio recording while providing audio feedback through headphones 😅
Hi community,
I'm wondering how can I request the permission of "System Audio Recording Only" under the Privacy & Security -> Screen & System Audio Recording via swift?
Did a bunch of search but didn't find good documentation on it.
Tried another approach here https://github.com/insidegui/AudioCap/blob/main/AudioCap/ProcessTap/AudioRecordingPermission.swift which doesn't work very reliably.
Topic:
Privacy & Security
SubTopic:
General
Tags:
AudioToolbox
AVAudioEngine
Core Audio
AVFoundation
According to the header file the outputVolume properties supported range is 0.0-1.0:
/*! @property outputVolume
@abstract The mixer's output volume.
@discussion
This accesses the mixer's output volume (0.0-1.0, inclusive).
@property (nonatomic) float outputVolume;
However when setting the volume to 2.0 the audio does indeed play louder. Is the header file out of date and if so, what is the supported range for outputVolume?
Thanks
I have a simple AVAudioEngine graph as follows:
AVAudioPlayerNode -> AVAudioUnitEQ -> AVAudioUnitTimePitch -> AVAudioUnitReverb -> Main mixer node of AVAudioEngine.
I noticed that whenever I have AVAudioUnitTimePitch or AVAudioUnitVarispeed in the graph, I noticed a very distinct crackling/popping sound in my Airpods Pro 2 when starting up the engine and playing the AVAudioPlayerNode and unable to find the reason why this is happening. When I remove the node, the crackling completely goes away. How do I fix this problem since i need the user to be able to control the pitch and rate of the audio during playback.
import AVKit
@Observable @MainActor
class AudioEngineManager {
nonisolated private let engine = AVAudioEngine()
private let playerNode = AVAudioPlayerNode()
private let reverb = AVAudioUnitReverb()
private let pitch = AVAudioUnitTimePitch()
private let eq = AVAudioUnitEQ(numberOfBands: 10)
private var audioFile: AVAudioFile?
private var fadePlayPauseTask: Task<Void, Error>?
private var playPauseCurrentFadeTime: Double = 0
init() {
setupAudioEngine()
}
private func setupAudioEngine() {
guard let url = Bundle.main.url(forResource: "Song name goes here", withExtension: "mp3") else {
print("Audio file not found")
return
}
do {
audioFile = try AVAudioFile(forReading: url)
} catch {
print("Failed to load audio file: \(error)")
return
}
reverb.loadFactoryPreset(.mediumHall)
reverb.wetDryMix = 50
pitch.pitch = 0 // Increase pitch by 500 cents (5 semitones)
engine.attach(playerNode)
engine.attach(pitch)
engine.attach(reverb)
engine.attach(eq)
// Connect: player -> pitch -> reverb -> output
engine.connect(playerNode, to: eq, format: audioFile?.processingFormat)
engine.connect(eq, to: pitch, format: audioFile?.processingFormat)
engine.connect(pitch, to: reverb, format: audioFile?.processingFormat)
engine.connect(reverb, to: engine.mainMixerNode, format: audioFile?.processingFormat)
}
func prepare() {
guard let audioFile else { return }
playerNode.scheduleFile(audioFile, at: nil)
}
func play() {
DispatchQueue.global().async { [weak self] in
guard let self else { return }
engine.prepare()
try? engine.start()
DispatchQueue.main.async { [weak self] in
guard let self else { return }
playerNode.play()
fadePlayPauseTask?.cancel()
playPauseCurrentFadeTime = 0
fadePlayPauseTask = Task { [weak self] in
guard let self else { return }
while true {
let volume = updateVolume(for: playPauseCurrentFadeTime / 0.1, rising: true)
// Ramp up volume until 1 is reached
if volume >= 1 { break }
engine.mainMixerNode.outputVolume = volume
try await Task.sleep(for: .milliseconds(10))
playPauseCurrentFadeTime += 0.01
}
engine.mainMixerNode.outputVolume = 1
}
}
}
}
func pause() {
fadePlayPauseTask?.cancel()
playPauseCurrentFadeTime = 0
fadePlayPauseTask = Task { [weak self] in
guard let self else { return }
while true {
let volume = updateVolume(for: playPauseCurrentFadeTime / 0.1, rising: false)
// Ramp down volume until 0 is reached
if volume <= 0 { break }
engine.mainMixerNode.outputVolume = volume
try await Task.sleep(for: .milliseconds(10))
playPauseCurrentFadeTime += 0.01
}
engine.mainMixerNode.outputVolume = 0
playerNode.pause()
// Shut down engine once ramp down completes
DispatchQueue.global().async { [weak self] in
guard let self else { return }
engine.pause()
}
}
}
private func updateVolume(for x: Double, rising: Bool) -> Float {
if rising {
// Fade in
return Float(pow(x, 2) * (3.0 - 2.0 * (x)))
} else {
// Fade out
return Float(1 - (pow(x, 2) * (3.0 - 2.0 * (x))))
}
}
func setPitch(_ value: Float) {
pitch.pitch = value
}
func setReverbMix(_ value: Float) {
reverb.wetDryMix = value
}
}
struct ContentView: View {
@State private var audioManager = AudioEngineManager()
@State private var pitch: Float = 0
@State private var reverb: Float = 0
var body: some View {
VStack(spacing: 20) {
Text("🎵 Audio Player with Reverb & Pitch")
.font(.title2)
HStack {
Button("Prepare") {
audioManager.prepare()
}
Button("Play") {
audioManager.play()
}
.padding()
.background(Color.green)
.foregroundColor(.white)
.cornerRadius(10)
Button("Pause") {
audioManager.pause()
}
.padding()
.background(Color.red)
.foregroundColor(.white)
.cornerRadius(10)
}
VStack {
Text("Pitch: \(Int(pitch)) cents")
Slider(value: $pitch, in: -2400...2400, step: 100) { _ in
audioManager.setPitch(pitch)
}
}
VStack {
Text("Reverb Mix: \(Int(reverb))%")
Slider(value: $reverb, in: 0...100, step: 1) { _ in
audioManager.setReverbMix(reverb)
}
}
}
.padding()
}
}
I did watch WWDC 2019 Session 716 and understand that an active audio session is key to unlocking low‑level networking on watchOS. I’m configuring my audio session and engine as follows:
private func configureAudioSession(completion: @escaping (Bool) -> Void) {
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(.playAndRecord, mode: .voiceChat, options: [])
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
// Retrieve sample rate and configure the audio format.
let sampleRate = audioSession.sampleRate
print("Active hardware sample rate: \(sampleRate)")
audioFormat = AVAudioFormat(standardFormatWithSampleRate: sampleRate, channels: 1)
// Configure the audio engine.
audioInputNode = audioEngine.inputNode
audioEngine.attach(audioPlayerNode)
audioEngine.connect(audioPlayerNode, to: audioEngine.mainMixerNode, format: audioFormat)
try audioEngine.start()
completion(true)
} catch {
print("Error configuring audio session: \(error.localizedDescription)")
completion(false)
}
}
private func setupUDPConnection() {
let parameters = NWParameters.udp
parameters.includePeerToPeer = true
connection = NWConnection(host: "***.***.xxxxx.***", port: 0000, using: parameters)
setupNWConnectionHandlers()
}
private func setupTCPConnection() {
let parameters = NWParameters.tcp
connection = NWConnection(host: "***.***.xxxxx.***", port: 0000, using: parameters)
setupNWConnectionHandlers()
}
private func setupWebSocketConnection() {
guard let url = URL(string: "ws://***.***.xxxxx.***:0000") else {
print("Invalid WebSocket URL")
return
}
let session = URLSession(configuration: .default)
webSocketTask = session.webSocketTask(with: url)
webSocketTask?.resume()
print("WebSocket connection initiated")
sendAudioToServer()
receiveDataFromServer()
sendWebSocketPing(after: 0.6)
}
private func setupNWConnectionHandlers() {
connection?.stateUpdateHandler = { [weak self] state in
DispatchQueue.main.async {
switch state {
case .ready:
print("Connected (NWConnection)")
self?.isConnected = true
self?.failToConnect = false
self?.receiveDataFromServer()
self?.sendAudioToServer()
case .waiting(let error), .failed(let error):
print("Connection error: \(error.localizedDescription)")
DispatchQueue.main.asyncAfter(deadline: .now() + 2) {
self?.setupNetwork()
}
case .cancelled:
print("NWConnection cancelled")
self?.isConnected = false
default:
break
}
}
}
connection?.start(queue: .main)
}
Duplex in this context refers to two-way audio transmission simultaneously recording and sending audio while also receiving and playing back incoming audio, similar to a VoIP/SIP call.
The setup works fine on the simulator, which suggests that the core logic is correct. However, since the simulator doesn’t fully replicate WatchOS hardware behavior especially for audio sessions and networking issues might arise when running on a real device.
The problem likely lies in either the Watch’s actual hardware limitations, permission constraints, or specific audio session configurations.
I am reaching out to seek further assistance regarding the challenges I've been experiencing with establishing a UDP, TCP & web socket connection on watchOS using NWConnection for duplex audio streaming. Despite implementing the recommendations provided earlier, I am still encountering difficulties
From what I can see, your implementation is focused on streaming audio playback with the server. In my case, I'm looking for a slightly different approach: I want to capture audio and send buffers of a specific size to the server while playing audio simultaneously, essentially achieving full duplex streaming similar to a VOIP call. Additionally, I’d like to ensure that if no external audio route is connected, the Apple Watch speaker is used by default. Any thoughts or insights on adapting this setup for those requirements would be very welcome.
Topic:
Media Technologies
SubTopic:
Streaming
Tags:
AVAudioNode
Network
AVAudioSession
AVAudioEngine
I’m developing an ARKit application where I aim to attach procedurally generated audio to detected planes in the environment. While using a static audio file with SCNAudioSource and SCNAudioPlayer works as expected, integrating procedural audio via AVAudioSourceNode does not produce any sound, nor does it generate any error messages: Stack Overflow Post
Working Implementation with Static Audio File:
let audioPlayer = SCNAudioPlayer(source: audioSource)
node.addAudioPlayer(audioPlayer)
Attempted Implementation with Procedural Audio:
// Audio generation code
}
let audioPlayer = SCNAudioPlayer(avAudioNode: audioNode)
node.addAudioPlayer(audioPlayer)
In this setup, the AVAudioSourceNode successfully generates audio when connected directly to an AVAudioEngine. However, when used with SCNAudioPlayer and attached to an SCNNode, it fails to produce sound. What doesn’t work is creating some procedural audio with an AVAudioNode, as documented here:
Apple docs
Additionally, I explored the WWDC18 AR game project, SwiftShot, which utilizes SCNAudioPlayer(avAudioNode:). After updating it for the latest Xcode, the graphics function correctly, but the audio does not play. I also noted that the Apple documentation mentions an audioPlayerWithAVAudioNode: method, stating:
Using this initializer is typically not necessary. Instead, call the audioPlayerWithAVAudioNode: method, which returns a cached audio player object if one for the specified AVAudioNode object has already been created and is available for use.
However, this method does not appear to be available in Swift. Any insights or guidance on this matter would be greatly appreciated.
I am working on an app which plays audio - https://youtu.be/VbAfUk_eYl0?si=nJg5ayy2faWE78-g - and one of the features is, on restart, if you had paused playback of a file at the time the app was previously shut down (or were playing one at the time of shutdown), the paused state and position in the file is restored exactly as it was, on restart.
The functionality works. However, it seems impossible to get the "now playing" information in iOS into the right state to reflect that via the MediaPlayer API. On restart, handlers are attached to the play/pause/togglePlayPause actions on MPRemoteCommandCenter.shared(), and the map of media info is updated on MPNowPlayingInfoCenter.default().nowPlayingInfo.
What happens is that iOS's media view shows the audio as playing and offers a pause button - even though the play action is enabled and the pause action is disabled.
Once playback has been initiated (my workaround is to have the pause action toggle the play state, since otherwise you wouldn't be able to initiate playback from controls in a car without initiating it once from a device first).
I've created a simplified white-noise-player demo to illustrate the problem - simply build and deploy it, and then start the app, lock your device and look at the playback controls on the lock screen. It will show a pause button - same behavior I've described.
https://github.com/timboudreau/ios-play-pause-demo
I've tried a few things to narrow down the source of the issue - for example, thinking that not MPNowPlayingInfoPropertyPlaybackProgress and MPMediaItemPropertyPlaybackDuration might be the culprit (since the system interpolates elapsed time and it's recommended to update those properties infrequently) on startup might do the trick, but the result is the same, just without a duration or progress shown.
What governs this behavior, and is there some way to explicitly tell the media player API your current state is paused?
I have a SwiftUI app - (https://youtu.be/VbAfUk_eYl0?si=JxUBh0Bpb-vc1E1U) - which I thought was almost ready for release - a manager for airdropped audio files from Logic Pro or other music creation applications. It uses AVAudioEngine and AVAudioPlayerNode to play audio, and the MediaPlayer API to integrate with car audio and similar, all of which works well.
It does not currently have an explicit CarPlay integration (and I'm slightly horrified at the amount of work that is going to require).
I had the good or bad luck of getting a loaner car with carplay while mine is being repaired yesterday, and lo and behold, when connected to the vehicle via CarPlay, there is no audio output in the vehicle at all. The now playing panel correctly shows the information my app provides about the currently playing song; the player node believes it is playing, the AVAudioSession is configured as it should be. But there is no sound.
Obviously I cannot ship it in this state.
I've tried fiddling with the parameters the AVAudioSession is configured with, in case there was some parameter that was preventing audio output, to no avail - currently:
var options = AVAudioSession.CategoryOptions()
options.insert(.allowAirPlay)
options.insert(.allowBluetooth)
options.insert(.allowBluetoothA2DP)
try session.setCategory(.playback, mode: .default, options: options)
try? session.setPreferredIOBufferDuration(0.002) // ~96 samples at 44.1kHz
try? session.setPrefersNoInterruptionsFromSystemAlerts(true)
try? session.setPrefersInterruptionOnRouteDisconnect(false)
try session.setActive(true, options: [.notifyOthersOnDeactivation])
All diagnostics within the app show the player operating correctly - files are played and flushed; AVAudioPlayerNodeCompletionCallbacks are called when they should be. But the output is not audible in the vehicle.
I would much prefer to ship this app without full-blown CarPlay integration, but with working audio when connected via CarPlay, and work on full CarPlay integration for the next release.
Is there some secret handshake I am just missing to make this work?
Hi,
I am looking for a good way to play sounds at a high frequency.
At the moment I am using the AVAudioEngine, and create a couple AVAudioPlayerNode and for each sound I need to play I create a AVAudioPCMBuffer.
When the app needs to play a sound, I get the correct AVAudioPCMBuffer for the sound and use the first available AVAudioPlayerNode and feed it to the buffer.
The timing for a metronome app has to be very precise because if it's of by about 16ms the user can hear that it is not playing had the right interval. For low speeds this is working without any problems, but at high speeds it is getting worse.
Maybe anyone has an idea on how I can improve my method.
Its a Plugin for Flutter.
import AVFoundation
class FastSoundPlayer {
private var audioPlayers: [SoundPlayer?] = []
private var sounds: [String: Sound] = [:]
private var engine = AVAudioEngine()
let session = AVAudioSession.sharedInstance()
init() {
do {
try session.setCategory(AVAudioSession.Category.playback, mode: AVAudioSession.Mode.default, options: [AVAudioSession.CategoryOptions.mixWithOthers])
try session.setActive(true)
createSoundPlayers(count: 20)
try engine.start()
} catch {
print("Error starting audio engine: \(error.localizedDescription)")
}
}
// Selector method to handle applicationDidBecomeActiveNotification
func applicationDidBecomeActive() {
// Reinitialize AVAudioEngine and reattach all nodes
do {
engine.reset()
objc_sync_enter(audioPlayers)
audioPlayers.removeAll()
createSoundPlayers(count: 20)
objc_sync_exit(audioPlayers)
try engine.start()
} catch {
print("Error starting audio engine: \(error.localizedDescription)")
}
}
func createSoundPlayers(count: Int) {
for _ in 0..<count {
let player = SoundPlayer()
engine.attach(player.player)
engine.connect(player.player, to: engine.mainMixerNode, format: nil)
audioPlayers.append(player)
}
}
func load(sound: Data, name: String) {
let sound = Sound(soundData: sound)
sounds[name] = sound
}
func play(name: String) {
if !engine.isRunning {
applicationDidBecomeActive()
}
guard let sound = sounds[name] else {
print("Sound not found")
return
}
if let player = getAvailablePlayer() {
player.play(sound: sound)
}
}
func getAvailablePlayer() -> SoundPlayer? {
for player in audioPlayers {
if !player!.isPlaying {
return player
}
}
return nil
}
}
class SoundPlayer {
let player = AVAudioPlayerNode()
var isPlaying = false
init() {
player.volume = 1.0
}
func play(sound: Sound) {
player.scheduleBuffer(sound.sound!, at: nil, options: .interrupts, completionCallbackType: .dataPlayedBack) { _ in
self.complete()
}
if (player.engine != nil && player.engine!.isRunning) {
player.play()
isPlaying = true
}
}
func complete() {
isPlaying = false
}
}
class Sound {
var sound: AVAudioPCMBuffer?
init(soundData: Data) {
do {
let temporaryURL = FileManager.default.temporaryDirectory.appendingPathComponent("tempSound.wav")
try soundData.write(to: temporaryURL)
// Create AVAudioFile from the temporary file URL
let audioFile = try AVAudioFile(forReading: temporaryURL)
// Define the format for the PCM buffer (44100Hz, stereo)
let format = AVAudioFormat(commonFormat: .pcmFormatInt16, sampleRate: 44100, channels: 2, interleaved: false)
// Create AVAudioPCMBuffer
guard let pcmBuffer = AVAudioPCMBuffer(pcmFormat: format!, frameCapacity: AVAudioFrameCount(audioFile.length)) else {
// Failed to create PCM buffer
self.sound = nil
return
}
// Read audio file into PCM buffer
try audioFile.read(into: pcmBuffer)
// Assign the created AVAudioPCMBuffer to the sound property
self.sound = pcmBuffer
} catch {
print("Error loading sound file: \(error.localizedDescription)")
self.sound = nil
}
}
}
Thanks!
I have an app under development - demo here - https://youtu.be/VbAfUk_eYl0?si=s6EDBx-4G6P_QbZO - which is sort of an audio player for airdropped files - something useful to musicians who dump work in progress to their phone, make notes, revise and update.
I've been testing my handling of audio session interruption notifications, but seems to be a lot of inconsistency in how, when and why iOS delivers them, and I'm wondering if there is some rhyme or reason to it that I'm just not detecting.
For example, I am playing a song in my app. Switch to Apple Music and start playing a song there. My app gets an interruption began notification - this is consistent.
Switch back to my app, and about half the time, I will get an interruption ended notification (coupled often with a blast of the tail of whatever audio buffer was partially played when the interruption started, even though the engine was stopped - and followed by call to my AVAudioPlayerNodeCompletionCallback - is there some way to avoid this?). Half the time I don't get an interruption ended notification; my app can (as expected) end the interruption by activating the AVAudioSession and playing something.
I have not been able to determine any pattern to this behavior, other than that if my app started playing using AVAudioPlayerNode.scheduleSegment rather than scheduleFile I think the notification will be consistently delivered on app activation rather than when I activate the session programmatically.
I would like my app to behave deterministically, and would appreciate any help in deciphering what causes the inconsistent behavior in notifications from iOS.
Hello! I'm use AVFoundation for preview video and audio from selected device, and I try use AVAudioEngine for preview audio in real-time, but I can't or I don't understand how select input device? I can hear only my microphone in real-time
So far, I'm using AVCaptureAudioPreviewOutput for in real-time hear audio, but I think has delay.
On iOS works easy with AVAudioEngine, but on macOS bruh...
Topic:
Media Technologies
SubTopic:
Audio
Tags:
AudioToolbox
AVAudioSession
AVAudioEngine
AVFoundation
I am developing an iOS app that needs to play spoken audio on demand from a server, while ducking the audio of background music from another app (e.g., SoundtrackYourBrand or Apple Music). This must work even when the app is in the background, and the server dictates when and what audio is played. Ideally, the message should be played within a minute of the server requesting it.
Current Attempt & Observations
I initially tried using Firebase Cloud Messaging (FCM) silent notifications to send a URL to an audio file, which the app would then play using AVPlayer.
This works consistently when the app is active, but in the background, it only works about 60% of the time.
In cases where it fails, iOS ducks the background music (e.g., from SoundtrackYourBrand) but never plays the spoken audio.
Interestingly, when I play the audio without enabling audio ducking, it seems to work 100% of the time from my limited testing, even in the background.
The app has background modes enabled for Audio, Background Fetch, and Remote Notifications.
Best Approach to Achieve This?
I’d like guidance on the best Apple-compliant approach to reliably play audio on command from the server, even when the app is in the background. Some possible paths:
Ensuring the app remains active in the background – Are there recommended ways to prevent the app from getting suspended, such as background tasks, a special background mode, or a persistent connection to the server?
Alternative triggering mechanisms – Would something like VoIP, Push-to-Talk, or another background service be better suited for this use case?
Built-in iOS speech synthesis (AVSpeechSynthesizer) – If playing external audio is unreliable, would generating speech dynamically from text be a more robust approach?
Streaming audio instead of sending a URL – Could continuous streaming from the server keep the app active and allow playback at the right moment?
I want to ensure the solution is reliable and works 100% of the time when needed. Any recommendations on the best approach for this would be greatly appreciated.
Thank you for your time and guidance.
Hi all, I have spent a lot of time reading the tech note and watching the WDDC video that introduce the PTTFramework on iOS. I currently have a custom setup where I am using AVAudioEngine to schedule and play buffers that are being streamed through a call.
I am looking to use the PTTFramework to allow a user to trigger this push to talk behavior from the lock screen and the various places with the system UI it provides.
However I am unsure what the correct behavior is regarding the handling of the audio session. Right now I am using .playback when there is no active voice transmission so that devices such as AirPods can be in AD2P mode where applicable, and then transitioning to .playbackAndRecord category only when the mic input should become active. Following this change in my AVAudioEngine manager I am then manually activating and deactivating the audio session manually when the engine is either playing/recording or idle.
In the documentation it states that you should not attempt to activate or deactivate your audio session directly, but allow the framework to handle it.
Does that mean that I need to either call the request to transmit delegate function or set an active participant on the channel manager first, and then wait for the didBecomeActive delegate method to trigger before I actually attempt to play or record any audio? (I am using the fullDuplex mode currently.) I noticed that that delegate method will only trigger if the audio session wasn't active before doing one of the above (setting active participant, requesting transmit).
Lastly, when using the PTTFramework it also mentions that we get support for PTT devices and I notice on the didBeginTransmittingFrom property we have a handsfreeButton case. Is there any documentation or resources for what is actually supported out of the box for this? I am currently working on handling a lot of the push to talk through bluetooth LE, and wanted to make sure there wasn't overlap with what the system provides.
Thank you!
Hi all!
I have been experiencing some issues when using the AVAudioEngine to play audio and record input while doing a voice chat (through the PTT Interface).
I noticed if I connect any players to the AudioGraph OR call start that the audio session becomes active (this is on iOS).
I don't see anything in the docs or the header files in the AVFoundation, but is it possible that calling the stop method on an engine deactivates the audio session too?
In a normal app this behavior seems logical, but when using PTT all activation and deactivation of the audio session must go through the framework and its delegate methods.
The issue I am debugging is that when the engine with the input node tapped gets stopped, and there is a gap between the input and when the server replies with inbound audio to be played and something seems to be getting the hardware/audio session into a jammed state.
Thanks for any feedback and/or confirmation on this behavior!
I'm trying to implement Ambisonic B-Format audio playback on Vision Pro with head tracking. So far audio plays, head tracking works, and the sound appears to be stereo. The problem is that it is not a proper binaural playback when compared to playing back the audiofile with a DAW. Has anyone successfully implemented B-Format playback on Vision Pro? Any suggestions on my current implementation:
func playAmbiAudioForum() async {
do {
try AVAudioSession.sharedInstance().setCategory(.playback)
try AVAudioSession.sharedInstance().setActive(true)
// AudioFile laoding/preperation
guard let testFileURL = Bundle.main.url(forResource: "audiofile", withExtension: "wav") else {
print("Test file not found")
return
}
let audioFile = try AVAudioFile(forReading: testFileURL)
let audioFileFormat = audioFile.fileFormat
// create AVAudioFormat with Ambisonics B Format
guard let layout = AVAudioChannelLayout(layoutTag: kAudioChannelLayoutTag_Ambisonic_B_Format) else {
print("layout failed")
return
}
let format = AVAudioFormat(
commonFormat: audioFile.processingFormat.commonFormat,
sampleRate: audioFile.fileFormat.sampleRate,
interleaved: false,
channelLayout: layout
)
// write audiofile to buffer
guard let buffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: UInt32(audioFile.length)) else {
print("buffer failed")
return
}
try audioFile.read(into: buffer)
playerNode.renderingAlgorithm = .HRTF
// connecting nodes
audioEngine.attach(playerNode)
audioEngine.connect(playerNode, to: audioEngine.outputNode, format: format)
audioEngine.prepare()
playerNode.scheduleBuffer(buffer, at: nil) {
print("File finished playing")
}
try audioEngine.start()
playerNode.play()
} catch {
print("Setup error:", error)
}
}
I’m currently developing an iOS metronome app using DispatchSourceTimer as the timer. The interval is set very small, around 50 milliseconds, and I’m using CFAbsoluteTimeGetCurrent to calculate the elapsed time to ensure the beat is played within a ±0.003-second margin.
The problem is that once the app goes to the background, the timing becomes unstable—it slows down noticeably, then recovers after 1–2 seconds.
When coming back to the foreground, it suddenly speeds up, and again, it takes 1–2 seconds to return to normal. It feels like the app is randomly “powering off” and then “overclocking.” It’s super frustrating.
I’ve noticed that some metronome apps in the App Store have similar issues, but there’s one called “Professional Metronome” that’s rock solid with no such problems. What kind of magic are they using? Any experts out there who can help? Thanks in advance!
P.S. I’ve already enabled background audio permissions.
The professional metronome that has no issues: https://link.zhihu.com/?target=https%3A//apps.apple.com/cn/app/pro-metronome-%25E4%25B8%2593%25E4%25B8%259A%25E8%258A%2582%25E6%258B%258D%25E5%2599%25A8/id477960671
I'm experiencing audio issues while developing for visionOS when playing PCM data through AVAudioPlayerNode.
Issue Description:
Occasionally, the speaker produces loud popping sounds or distorted noise
This occurs during PCM audio playback using AVAudioPlayerNode
The issue is intermittent and doesn't happen every time
Technical Details:
Platform: visionOS
Device: vision pro / simulator
Audio Framework: AVFoundation
Audio Node: AVAudioPlayerNode
Audio Format: PCM
I would appreciate any insights on:
Common causes of audio distortion with AVAudioPlayerNode
Recommended best practices for handling PCM playback in visionOS
Potential configuration issues that might cause this behavior
Has anyone encountered similar issues or found solutions? Any guidance would be greatly helpful.
Thank you in advance!
I followed this guide, and added com.apple.developer.spatial-audio.profile-access as an entitlement to the app (via the + Capability button – Spatial Audio Profile). I have a audio graph that outputs to AVAudioEngine.
However, the Xcode Cloud build ended up with this error:
Invalid Code Signing Entitlements. Your application bundle's signature contains code signing entitlements that are not supported on iOS. Specifically, key 'com.apple.developer.spatial-audio.profile-access' in 'Payload/…' is not supported.
This guide says it's available on iOS. Does it mean not on iOS 17? In which case how can I provide fallback for iOS 17?