It sounds simple but searching for the name "Favorite Songs" is a non-starter because it's called different names in different countries, even if I specify "&l=en_us" on the query.
So is there another property, relationship or combination thereof which I can use to tell me when I've found the right playlist?
Properties I've looked at so far:
canEdit: will always be false so narrows things down a little
inFavorites: not helpful as it depends on whether the user has favourite the favourites playlist, so not relevant
hasCatalog: seems always true so again may narrow things down a bit
isPublic: doesn't help
Adding the catalog relationship doesn't seem to show anything immediately useful either.
Can anyone help?
Ideally I'd like to see this as a "kind" or "type" as it has different properties to other playlists, but frankly I'll take anything at this point.
Audio
RSS for tagDive into the technical aspects of audio on your device, including codecs, format support, and customization options.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hi everyone 👋
I’m building an iOS app in Swift where I want to do the following:
Record the user’s voice
Transcribe the spoken sentence (speech-to-text)
Auto-detect the spoken language
Translate it to another language selected by the user (e.g., English → Spanish or Hindi → English)
Speak back (text-to-speech) the translated text on the same device
Is this possible to record via phone mic and play the transcribe voice into headphone's audio?
I'm streaming mp3 audio data using URLSession/AudioFileStream/AVAudioConverter and getting occasional silent buffers and glitches (little bleeps and whoops as opposed to clicks). The issues are present in an offline test, so this isn't an issue of underruns.
Doing some buffering on the input coming from the URLSession (URLSessionDataTask) reduces the glitches/silent buffers to rather infrequent, but they do still happen occasionally.
var bufferedData = Data()
func parseBytes(data: Data) {
bufferedData.append(data)
// XXX: this buffering reduces glitching
// to rather infrequent. But why?
if bufferedData.count > 32768 {
bufferedData.withUnsafeBytes { (bytes: UnsafeRawBufferPointer) in
guard let baseAddress = bytes.baseAddress else { return }
let result = AudioFileStreamParseBytes(audioStream!,
UInt32(bufferedData.count),
baseAddress,
[])
if result != noErr {
print("❌ error parsing stream: \(result)")
}
}
bufferedData = Data()
}
}
No errors are returned by AudioFileStream or AVAudioConverter.
func handlePackets(data: Data,
packetDescriptions: [AudioStreamPacketDescription]) {
guard let audioConverter else {
return
}
var maxPacketSize: UInt32 = 0
for packetDescription in packetDescriptions {
maxPacketSize = max(maxPacketSize, packetDescription.mDataByteSize)
if packetDescription.mDataByteSize == 0 {
print("EMPTY PACKET")
}
if Int(packetDescription.mStartOffset) + Int(packetDescription.mDataByteSize) > data.count {
print("❌ Invalid packet: offset \(packetDescription.mStartOffset) + size \(packetDescription.mDataByteSize) > data.count \(data.count)")
}
}
let bufferIn = AVAudioCompressedBuffer(format: inFormat!, packetCapacity: AVAudioPacketCount(packetDescriptions.count), maximumPacketSize: Int(maxPacketSize))
bufferIn.byteLength = UInt32(data.count)
for i in 0 ..< Int(packetDescriptions.count) {
bufferIn.packetDescriptions![i] = packetDescriptions[i]
}
bufferIn.packetCount = AVAudioPacketCount(packetDescriptions.count)
_ = data.withUnsafeBytes { ptr in
memcpy(bufferIn.data, ptr.baseAddress, data.count)
}
if verbose {
print("handlePackets: \(data.count) bytes")
}
// Setup input provider closure
var inputProvided = false
let inputBlock: AVAudioConverterInputBlock = { packetCount, statusPtr in
if !inputProvided {
inputProvided = true
statusPtr.pointee = .haveData
return bufferIn
} else {
statusPtr.pointee = .noDataNow
return nil
}
}
// Loop until converter runs dry or is done
while true {
let bufferOut = AVAudioPCMBuffer(pcmFormat: outFormat, frameCapacity: 4096)!
bufferOut.frameLength = 0
var error: NSError?
let status = audioConverter.convert(to: bufferOut, error: &error, withInputFrom: inputBlock)
switch status {
case .haveData:
if verbose {
print("✅ convert returned haveData: \(bufferOut.frameLength) frames")
}
if bufferOut.frameLength > 0 {
if bufferOut.isSilent {
print("(haveData) SILENT BUFFER at frame \(totalFrames), pending: \(pendingFrames), inputPackets=\(bufferIn.packetCount), outputFrames=\(bufferOut.frameLength)")
}
outBuffers.append(bufferOut)
totalFrames += Int(bufferOut.frameLength)
}
case .inputRanDry:
if verbose {
print("🔁 convert returned inputRanDry: \(bufferOut.frameLength) frames")
}
if bufferOut.frameLength > 0 {
if bufferOut.isSilent {
print("(inputRanDry) SILENT BUFFER at frame \(totalFrames), pending: \(pendingFrames), inputPackets=\(bufferIn.packetCount), outputFrames=\(bufferOut.frameLength)")
}
outBuffers.append(bufferOut)
totalFrames += Int(bufferOut.frameLength)
}
return // wait for next handlePackets
case .endOfStream:
if verbose {
print("✅ convert returned endOfStream")
}
return
case .error:
if verbose {
print("❌ convert returned error")
}
if let error = error {
print("error converting: \(error.localizedDescription)")
}
return
@unknown default:
fatalError()
}
}
}
Hello,
I am wondering if it is possible to have audio from my AirPods be sent to my speech to text service and at the same time have the built in mic audio input be sent to recording a video?
I ask because I want my users to be able to say "CAPTURE" and I start recording a video (with audio from the built in mic) and then when the user says "STOP" I stop the recording.
I've got a setup using AVAudioEngine with several tone generator nodes, each with a chain of processing nodes, the chains then mixed into the main output.
Generator ➡️ Effect ➡️... ➡️ .mainMixerNode ➡️ .outputNode).
Generator ➡️ Effect ➡️... ⤴️
...
Generator ➡️ Effect ➡️... ⤴️
The user should be able to mute any chain individually. I've found several potential approaches to muting, but not terribly happy with any of them.
Adjust the amplitudes directly in my tone generators. Issue: Consumes CPU even when completely muted. 4 generators adds ~15% cpu, even when all chains are muted.
Detach/attach chains that are muted/unmuted. Issue: Causes loud clicking/popping sounds whenever muted/unmuted.
Fade mixer output volume while detaching/attaching a chain (just cutting the volume immediately to 0 doesn't get rid of the clicking/popping). Issue: Causes all channels to fade during the transition, so not ideal.
The rest of these ideas are variations on making volume control+detatch/attach work for individual chains, since approach #3 worked well.
Add an AVAudioMixer to the end of each chain (just for volume control). Issue: Only the mixer on the final chain functions -- the others block all output. Not sure what's going on there.
Use matrix mixer (for multi-input volume control). Plus detach/attach to reduce CPU if necessary. Not yet attempted, due to perceived complexity and reports of fragility in order of wiring in. A bunch of effort before I even know if it's going to work.
Develop my own fader node to put on the end of each channel. Unlike the tone generator (simple AVSourceNode), developing an effect node seems complex and time consuming. Might not even fix CPU use.
I'm not completely averse to the learning curve of either 5 or 6, but would rather get some guidance on best approach before diving in. They both seem likely to take more effort than I'd like for the simple behavior I'm trying to achieve.
Hi,
I try to record audio on the iPhone with the AVAudioRecorder and Xcode 26.0.1.
Maybe the problem is that I can not record audio with the simulator. But there's a menu for audio.
In the plist I added 'Privacy - Microphone Usage Description' and I ask for permission before recording.
if await AVAudioApplication.requestRecordPermission() {
print("permission granted")
recordPermission = true
} else {
print("permission denied")
}
Permission is granted.
let settings: [String : Any] = [
AVFormatIDKey: kAudioFormatMPEG4AAC,
AVSampleRateKey: 12000,
AVNumberOfChannelsKey: 1,
AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue
]
recorder = try AVAudioRecorder(url: filename, settings: settings)
let prepared = recorder.prepareToRecord()
print("prepared started: \(prepared)")
let started = recorder.record()
print("recording started: \(started)")
started is always false and I tried many settings.
Error messages
AddInstanceForFactory: No factory registered for id <CFUUID 0x600000211480> F8BB1C28-BAE8-11D6-9C31-00039315CD46
AudioConverter.cpp:1052 Failed to create a new in process converter -> from 0 ch, 12000 Hz, .... (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame to 1 ch, 12000 Hz, aac (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame, with status -50
AudioQueueObject.cpp:1892 BuildConverter: AudioConverterNew returned -50
from: 0 ch, 12000 Hz, .... (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame
to: 1 ch, 12000 Hz, aac (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
prepared started: true
AudioQueueObject.cpp:7581 ConvertInput: aq@0x10381be00: AudioConverterFillComplexBuffer returned -50, packetCount 5
recording started: false
All examples I find are the same, but apparently there must be something different.
I have an AUv3 plugin which uses an FFT - which requires n samples before it can produce any output - so, depending on the relation between the host's buffer size and the FFT window size, it may receive a several buffers of samples, producing no output, and then dumping out what it has once a sufficient number of samples have been received.
This means that output is produced in fits and starts, in batches that match the FFT size (modulo oversampling) - e.g. if being fed buffers of 256 samples with an fft size of 1024, the output buffer sizes will be 0 for the first 3 buffers, and upon the fourth, the first 256 processed samples are returned and the remaining 768 cached; the next three buffers will return the remaining cached samples while processing and buffering subsequent ones, and so forth.
The internal mechanics of that I have solved, caching output if the current output buffer is too small, and so forth - so it all works as advertised, and the plugin reports its latency correctly. And when run as an app in demo-mode, playback works as expected.
In the plugin's render block, it captures the number of frames written, and if it is less than the number of frames passed in, adjusts the mDataByteSize of the output buffers to match the actual quantity of data being returned:
unsigned int framesWritten = (unsigned int) processHelper->processWithEvents(inAudioBufferList, outAudioBufferList, timestamp, frameCount, realtimeEventListHead);
if (framesWritten < frameCount) {
for (UInt32 i = 0; i < outAudioBufferList->mNumberBuffers; ++i) {
outAudioBufferList->mBuffers[i].mDataByteSize = framesWritten * 4; // assume 4 byte floats
}
}
However, there are a couple of serious issues:
auval -v fails it with - Render Test at 64 frames, sample rate: 22050 Hz ERROR: Output Buffer Size does not match requested
When connected to Logic Pro, it appears that mDataByteSize is ignored, and the entire allocated buffer is read - audio has sections of silence snipped into it which corresponds the number of empty buffers being returned
If I set Logic's buffer size to 1024 and use a 1024 sample FFT window, the plugin works correctly - but of course a plugin cannot dictate buffer size, and `1024 is too small a window size to be useful for anything but filtering very high frequencies
This seems like it has to be a solvable problem, and most likely the issue is in how my code reports the number of usable samples in the returned buffer.
So, what is the correct way for a plugin to report that it has no samples to return, but will, uh, real soon now?
I know I could convert this plugin to be one that does offline rendering of the entire input, but this is real-time processing, just with a fixed amount of latency, so that should not be necessary.
Your draft looks great! Here's a refined version with the iOS 17 comparison emphasized and slightly better flow:
Hi Apple Engineers and fellow developers,
I'm experiencing a critical regression with ShazamKit's background operation on iOS 18. ShazamKit's SHManagedSession stops identifying songs in the background after approximately 20 seconds on iOS 18, while the exact same code works perfectly on iOS 17.
The behavior is consistent: the app works perfectly in the foreground, but when backgrounded or device is locked, it initially works for about 20 seconds then stops identifying new songs. The microphone indicator remains active suggesting audio access is maintained, but ShazamKit doesn't send identified songs in the background until you open the app again. Detection immediately resumes when bringing the app to foreground.
My technical setup uses SHManagedSession for continuous matching with background modes properly configured in Info.plist including audio mode, and Background App Refresh enabled. I've tested this on physical devices running iOS 18.0 through 18.5 with the same results across all versions. The exact same code running on iOS 17 devices works flawlessly in the background.
To reproduce: initialize SHManagedSession and start matching, begin song identification in foreground, background the app or lock device, play different songs which are initially detected for about 20 seconds, then after the timeout period new songs are no longer identified until you bring the app to foreground.
This regression has impacted my production app as users who rely on continuous background music identification are experiencing a broken feature. I submitted this as Feedback ID FB15255903 last September with no solution so far.
I've created a minimal demo project that reproduces this issue: https://github.com/tfmart/ShazamKitBackground
Has anyone else experienced this ShazamKit background regression on iOS 18? Are there any known workarounds or alternative approaches? Given the time this issue has persisted, could we please get acknowledgment of this regression, expected timeline for a fix, or any recommended workarounds?
Testing environment is Xcode 16.0+ on iOS 18.0-18.5 across multiple physical device models.
Any guidance would be greatly appreciated.
I developed an educational app that implements audio-video communication through RTC, while using WebView to display course materials during classes. However, some users are experiencing an issue where the audio playback from WebView is very quiet. I've checked that the AVAudioSessionCategory is set by RTC to AVAudioSessionCategoryPlayAndRecord, and the AVAudioSessionCategoryOption also includes AVAudioSessionCategoryOptionMixWithOthers. What could be causing the WebView audio to be suppressed, and how can this be resolved?
I'm trying to use the new Speech framework for streaming transcription on macOS 26.3, and I can reproduce a failure with SpeechAnalyzer.start(inputSequence:).
What is working:
SpeechAnalyzer + SpeechTranscriber
offline path using start(inputAudioFile:finishAfterFile:)
same Spanish WAV file transcribes successfully and returns a coherent final result
What is not working:
SpeechAnalyzer + SpeechTranscriber
stream path using start(inputSequence:)
same WAV, replayed as AnalyzerInput(buffer:bufferStartTime:)
fails once replay starts with:
_GenericObjCError domain=Foundation._GenericObjCError code=0 detail=nilError
I also tried:
DictationTranscriber instead of SpeechTranscriber
no realtime pacing during replay
Both still fail in stream mode with the same error.
So this does not currently look like a ScreenCaptureKit issue or a Python integration issue. I reduced it to a pure Swift CLI repro.
Environment:
macOS 26.3 (25D122)
Xcode 26.3
Swift 6.2.4
Apple Silicon Mac
Has anyone here gotten SpeechAnalyzer.start(inputSequence:) working reliably on macOS 26.x?
If so, I'd be interested in any workaround or any detail that differs from the obvious setup:
prepareToAnalyze(in:)
bestAvailableAudioFormat(...)
AnalyzerInput(buffer:bufferStartTime:)
replaying a known-good WAV in chunks
I already filed Feedback Assistant:
FB22149971
I’m developing a macOS audio monitoring app using AVAudioEngine, and I’ve run into a critical issue on macOS 26 beta where AVFoundation fails to detect any input devices, and AVAudioEngine.start() throws the familiar error 10877.
FB#: FB19024508
Strange Behavior:
AVAudioEngine.inputNode shows no channels or input format on bus 0.
AVAudioEngine.start() fails with -10877 (AudioUnit connection error).
AVCaptureDevice.DiscoverySession returns zero audio devices.
Microphone permission is granted (authorized), and the app is properly signed and sandboxed with com.apple.security.device.audio-input.
However, CoreAudio HAL does detect all input/output devices:
Using AudioObjectGetPropertyDataSize and AudioObjectGetPropertyData with kAudioHardwarePropertyDevices, I can enumerate 14+ devices, including AirPods, USB DACs, and BlackHole.
This suggests the lower-level audio stack is functional.
I have tried:
Resetting CoreAudio with sudo killall coreaudiod
Rebuilding and re-signing the app
Clearing TCC with tccutil reset Microphone
Running on Apple Silicon and testing Rosetta/native detection via sysctl.proc_translated
Using a fallback mechanism that logs device info from HAL and rotates logs for submission via Feedback Assistant
I have submitted logs and a reproducible test case via Feedback Assitant : FB#: FB19024508]
There appears to be no method of going forward or backwards in Get Info in the Music application,
Topic:
Media Technologies
SubTopic:
Audio
Dear Sirs,
I’ve written a virtual audio driver based on AudioDriverKit and running as dext in my MacOS app. Sometimes when waking up from a sleep state the recording side of my driver extension seems to hang and I don’t see any calls to my io_operation callback. Then the recording app like a DAW seems to hang when trying to start a recording. This doesn’t happen after short sleep states or after a complete new start of my MacBook.
I already opened a case in Feedback-Assistant on 5th of May (FB17503622) which also includes a sysdiagnose and a ktrace but I didn't get any feedback so far. Meanwhile some of our customers are getting angry and I'd like to know if there's anything I could do to fix this problem on my side.
We’re not sure whether this worked in previous MacOS versions, we think we didn’t observe this before 15.3.1 but at least since 15.3.1. we’ve seen this problem.
Best regards,
Johannes
I prefer to use the album fetched from the library instead of the catalog since this is faster. If doing so, how can I check if all tracks of an album are added to the library. In this case I'd like to fetch the catalog version or throw an error (for example when offline).
Using .with(.tracks) on the library album fetches the tracks added to the library.
The trackCount property is referring to the tracks that can be fetched from the library.
The isComplete property is always nil when fetching from the library.
One possible way is checking the trackNumber and discCount properties. However this only detects that not all tracks of an album are added to the library if there is a song not added ahead of one that is. I'd like to be able to handle this edge case as well.
Is there currently a way to do this? I'd prefer to not rely on the apple music catalog for this since this is supposed to work offline as well. Fetching and storing all trackIDs when connected and later comparing against these would work, but this would potentially mean storing tens of thousands of track ids.
Thank you
I ran 5.1 audio tests in both YouTube and Apple Music, and I noticed that when sound is supposed to play from the rear or front surround speakers, it’s also duplicated in the front left and right channels. I’m absolutely sure the issue is with the Apple TV, because I played the same video directly through my TV’s native system, and the channel separation was correct.
Everything used to work perfectly before, so this must be a software issue. I’m currently on tvOS 26 Developer Beta 5, but I’m certain the problem also existed on the stable tvOS 18.5.
I’ve already reset and updated my Apple TV, and I also tried switching the audio format to forced Dolby Atmos 5.1. On the forums, I mostly see complaints about Dolby Atmos not working at all — in my case, everything technically works, but not the way it’s supposed to.
Topic:
Media Technologies
SubTopic:
Audio
Thread 5 Crashed:
0 libobjc.A.dylib 0x19af7b038 objc_msgSend + 56
1 CoreFoundation 0x19dfdb618 cow_cleanup + 135
2 CoreFoundation 0x19dfdb6fc -[__NSDictionaryM dealloc] + 147
3 MediaToolbox 0x1b167636c FigRemotePropertyCacheTeardown + 31
4 MediaToolbox 0x1b1c5b648 remoteXPCAsset_Finalize + 107
5 CoreMedia 0x1b1e9166c FigBaseObjectFinalize + 275
6 CoreFoundation 0x19dfcc5ec _CFRelease + 295
7 AVFCore 0x1b1054d64 -[AVFigAssetTrackInspector dealloc] + 151
8 AVFCore 0x1b0f818d8 -[AVAssetTrack dealloc] + 63
9 CoreFoundation 0x19dfdba28 RELEASE_OBJECTS_IN_THE_ARRAY + 115
10 CoreFoundation 0x19dfdb7e0 -[__NSArrayM dealloc] + 147
11 AVFCore 0x1b0f52e04 -[AVURLAsset dealloc] + 167
12 libobjc.A.dylib 0x19af821f8 object_cxxDestructFromClass(objc_object*, objc_class*) + 115
13 libobjc.A.dylib 0x19af7df20 objc_destructInstance_nonnull_realized(objc_object*) + 75
14 libobjc.A.dylib 0x19af7d4a4 _objc_rootDealloc + 71
15 AVFCore 0x1b0fef988 -[AVAssetReaderOutput dealloc] + 415
16 AVFCore 0x1b0ff11ec -[AVAssetReaderTrackOutput dealloc] + 127
17 CoreFoundation 0x19dfe20a4 -[__NSSingleObjectArrayI dealloc] + 63
18 libobjc.A.dylib 0x19af7d3f8 AutoreleasePoolPage::releaseUntil(objc_object**) + 203
Topic:
Media Technologies
SubTopic:
Audio
Using the official SwiftTranscriptionSampleApp from WWDC 2025, speech transcription takes 14+ seconds from audio input to first result, making it unusable for real-time applications.
Environment
iOS: 26.0 Beta
Xcode: Beta 5
Device: iPhone 16 pro
Sample App: Official Apple SwiftTranscriptionSampleApp from WWDC 2025
Configuration Tested
Locale: en-US (properly allocated with AssetInventory.allocate(locale:)) and es-ES
Setup: All optimizations applied (preheating, high priority, model retention)
I started testing in my own app to replace SFSpeech API and include speech detection but after long fights with documentation (this part is quite terrible TBH) I tested the example (https://developer.apple.com/documentation/speech/bringing-advanced-speech-to-text-capabilities-to-your-app) and saw same results.
I added some logs to check the specific time:
🎙️ [20:30:41.532] ✅ Analyzer started successfully - ready to receive audio!
🎙️ [20:30:41.532] Listening for transcription results...
🎙️ [20:30:56.342] 🚀 FIRST TRANSCRIPTION RESULT after 14.810s: 'Hello' (isFinal: false)
Questions
Is this expected performance for iOS 26 Beta, because old SFSpeech is far faster?
Are there additional optimization steps for SpeechTranscriber?
Should we expect significant performance improvements in later betas?
I've tried SpeechTranscriber with a lot of my devices (from iPhone 12 series ~ iPhone 17 series) without issues. However, SpeechTranscriber.isAvailable value is false for my iPhone 11 Pro.
https://developer.apple.com/documentation/speech/speechtranscriber/isavailable
I'am curious why the iPhone 11 Pro device is not supported. Are all iPhone 11 series not supported intentionally? Or is there any problem with my specific device?
I've also checked the supportedLocales, and the value is an empty array.
https://developer.apple.com/documentation/speech/speechtranscriber/supportedlocales
After updating to iOS 18.5, we’ve observed that outgoing audio from our app intermittently stops being transmitted during VoIP calls using AVAudioSession configured with .playAndRecord and .voiceChat. The session is set active without errors, and interruptions are handled correctly, yet audio capture suddenly ceases mid-call. This was not observed in earlier iOS versions (≤ 18.4). We’d like to confirm if there have been any recent changes in AVAudioSession, CallKit, or related media handling that could affect audio input behavior during long-running calls.
func configureForVoIPCall() throws {
try setCategory(
.playAndRecord, mode: .voiceChat,
options: [.allowBluetooth, .allowBluetoothA2DP, .defaultToSpeaker])
try setActive(true)
}
Hi,
I'm still stuck getting a basic record-with-playthrouh pipeline to work.
Has anyone a sample of setting up a AVAudioEngine pipeline for recording with playthrough?
Plkaythrough works with AVPlayerNode as input but not with any microphone input. The docs mention the "enabled state" of the outputNode of the engine without explaining the concept, i.e. how to enable an output.
When the engine renders to and from an audio device, the AVAudioSession category and the availability of hardware determines whether an app performs output. Check the output node’s output format (specifically, the hardware format) for a nonzero sample rate and channel count to see if output is in an enabled state.
Well, in my setup the output is NOT enabled, and any attempt to switch (e.g. audioEngine.outputNode.auAudioUnit.setDeviceID(deviceID) )/ attach a dedicated device / ... results in exceptions / errors