HI Guys,
I'm using Shazamkit in my IOS app and successfully capturing the currently playing track details, when using the devices (iPhone) built-in mic.
When I test with AirPods though, my app cannot both send the output to through the AirPods and capture that same output with the AirPods mic, for Shazamkit recognition.
I believe this must be possible, because the Shazamkit widget on IOS can do this.
Is it restricted in some way for third party apps?
If not, I'd appreciate some guidance on how to achieve this in Swift code.
Thanks in advance.
Audio
RSS for tagDive into the technical aspects of audio on your device, including codecs, format support, and customization options.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hey folks, I'm running into an odd issue suddenly with an app that had a working MusicKit integration before.
I'm using ApplicationMusicPlayer to play Apple Music albums and songs. I'm testing on a physical device, signed in to Apple ID, and with a valid subscription. Apple Music via the first-party app works entirely fine on this device.
Attempting to play back any content at all gives the log:
<ICUserIdentityStoreACAccountBackend: 0x1070bf3e0> Failed to initialize primary apple account, error=Error Domain=ICError Code=-7013 "Client is not entitled to access account store" UserInfo={NSDebugDescription=Client is not entitled to access account store}
[ICUserIdentityStore] - initializing account histories with activeAccountDSID = nil, activeLockerAccountDSID = nil, timestamp = 14605951908
[ICUserIdentityStore] Failed to fetch local store account with error: Error Domain=ICError Code=-7013 "Client is not entitled to access account store" UserInfo={NSDebugDescription=Client is not entitled to access account store}.
The album artwork, track names, etc, all appear in the control center playback controls, but the music doesn't play. Trying to trigger playback with control center just results in it skipping to the next track, which doesn't play either.
This exact code used to work. I have the MusicKit service selected in Apple Connect. Since this isn't entitlement-based, I'm not sure how else to check that I'm set up correctly.
I've tried deleting/reinstalling the app, restarting the device, cleaning/rebuilding, and deleting DerivedData, to no avail.
Any help?
Running Xcode 16.4 (16F6), testing on iOS 18.5 (22F76)
We are using a VoiceProcessingIO audio unit in our VoIP application on Mac. In certain scenarios, the AudioComponentInstanceNew call blocks for up to five seconds (at least two). We are using the following code to initialize the audio unit:
OSStatus status;
AudioComponentDescription desc;
AudioComponent inputComponent;
desc.componentType = kAudioUnitType_Output;
desc.componentSubType = kAudioUnitSubType_VoiceProcessingIO;
desc.componentFlags = 0;
desc.componentFlagsMask = 0;
desc.componentManufacturer = kAudioUnitManufacturer_Apple;
inputComponent = AudioComponentFindNext(NULL, &desc);
status = AudioComponentInstanceNew(inputComponent, &unit);
We are having the issue with current MacOS versions on a host of different Macs (x86 and x64 alike). It takes two to three seconds until AudioComponentInstanceNew returns.
We also see the following errors in the log multiple times:
AUVPAggregate.cpp:2560 AggInpStreamsChanged wait failed
and those right after (which I don't know if they matter to this issue):
KeystrokeSuppressorCore.cpp:44 ERROR: KeystrokeSuppressor initialization was unsuccessful. Invalid or no plist was provided. AU will be bypassed. vpStrategyManager.mm:486 Error code 2003332927 reported at GetPropertyInfo
Hi,
our CourAudio server plugin utilizes the SystemConfiguration.framework to store and restore specific shared system wide settings.
While our application can authenticate to utilize the SystemConfiguration.framework to gain write access to the shared configuration settings the CoreAudio server plugin obviously can't have any user interaction and therefor does not authenticate.
Is it possible to authenticate the CoreAudio server plugin to gain write permissions? Are there any entitlements or other means that would allow this?
Thanks!
Topic:
Media Technologies
SubTopic:
Audio
Tags:
System Configuration
Core Audio
Inter-process communication
Service Management
iOS 26.0 (23A5276f) – Bluetooth Call Audio Issue
I’m experiencing a Bluetooth audio issue on iOS 26.0 (build 23A5276f). I cannot make or receive phone calls properly using Bluetooth devices — this affects both my car’s Bluetooth system and my AirPods Pro (2nd generation).
Notably:
Regular phone calls have no audio (either I can’t hear the other person, or they can’t hear me).
WhatsApp and other VoIP apps work fine with the same Bluetooth devices.
Media playback (music, video, etc.) works without issues over Bluetooth.
It seems this bug is limited to the native Phone app or the system audio routing for regular cellular calls. Please advise if this is a known issue or if a fix is expected in upcoming beta releases.
Hello.
My team and I think we have an issue where our app is asked to gracefully shutdown with a following SIGTERM. As we’ve learned, this is normally not an issue. However, it seems to also be happening while our app (an audio streamer) is actively playing in the background.
From our perspective, starting playback is indicating strong user intent. We understand that there can be extreme circumstances where the background audio needs to be killed, but should it be considered part of normal operation? We hope that’s not the case.
All we see in the logs is the graceful shutdown request. We can say with high certainty that it’s happening though, as we know that playback is running within 0.5 seconds of the crash, without any other tracked user interaction.
Can you verify if this is intended behavior, and if there’s something we can do about it from our end. From our logs it doesn’t look to be related to either memory usage within the app, or the system as a whole.
Best,
John
Hi,
I’m an iOS developer building an app with an use case that needs advanced playback on Apple Music subscription streams, specifically:
• Real-time tempo change (BPM) during playback — i.e., time-stretch with key-lock, not just crossfade.
• Beat-matched transitions between tracks.
From what I can tell, this capability seems to exist only for approved partners and isn’t available through public MusicKit.
Question: What’s the official request path to be evaluated for that restricted partner entitlement (application form, questionnaire, NDA, or internal team/BD contact)? If the entitlement identifier is internal, how can I get my account routed to the right Apple Music team?
For reference, publicly announced partners include Algoriddim djay, Serato DJ Pro, rekordbox (AlphaTheta), and Engine DJ—all of which appear to implement mixing features that imply advanced playback (tempo/beat-matching) on Apple Music content. I’d prefer not to share product details publicly for the moment and can provide specifics privately if needed.
Thanks in advance!
Topic:
Media Technologies
SubTopic:
Audio
Tags:
Apple Music API
FairPlay Streaming
MusicKit
AVFoundation
Hello,
I have been running into issues with setting nowPlayingInfo information, specifically updating information for CarPlay and the CPNowPlayingTemplate.
When I start playback for an item, I see lock screen information update as expected, along with the CarPlay now playing information.
However, the playing items are books with collections of tracks. When I select a new track(chapter) within the book, I set the MPMediaItemPropertyTitle to the new chapter name. This change is reflected correctly on the lock screen, but almost never appears correctly on the CarPlay CPNowPlayingTemplate. The previous chapter title remains set and never updates.
I see "Application exceeded audio metadata throttle limit." in the debug console fairly frequently.
From that a I figured that I need to minimize updates to the nowPlayingInfo dictionary. What I did:
I store the metadata dictionary in a local dictionary and only set values in the main nowPlayingInfo dictionary when they are different from the current value.
I kick off the nowPlayingInfo update via a task that initially sleeps for around 2 seconds (not a final value, just for my current testing). If a previous Task is active, it gets cancelled, so that only one update can happen within that time window.
Neither of these things have been sufficient. I can switch between different titles entirely and the information updates (including cover art).
But when I switch chapters within a title, the MPMediaItemPropertyTitle continues to get dropped. I know the value is getting set, because it updates on the lock screen correctly.
In total, I have 12 keys I update for info, though with the above changes, usually 2-4 of them actually get updated with high frequency.
I am running out of ideas to satisfy the throttling thresholds to accurately display metadata. I could use some advice.
Thanks.
I'm working on adding CarPlay support to an audio app and am running into an issue. Occasionally, when a user opens the app from CarPlay while the main app scene is either not connected or is currently in the background, I will receive an error when attempting to activate the audio session. The code below mimics my setup:
do {
try AVAudioSession.sharedInstance().setCategory(.playback, mode: .spokenAudio)
try AVAudioSession.sharedInstance().setActive(true)
} catch {
print(error) // NSOSStatusErrorDomain - 560557684: Session activation failed
}
That error code maps to AVAudioSession.ErrorCode.cannotInterruptOthers.
Once in this state, all subsequent attempts to play different pieces of content will fail. However, things will start working normally if the user opens the app on their phone and tries again from CarPlay (while the app is in the foreground on their phone).
I'm not sure why it would behave this way and want to note that I do have the audio background mode capability enabled.
Has anyone else encountered this? Are there any workarounds or changes I could make to prevent this from happening?
I was trying to set custom audio output device for a generated audio on macCatalyst.
While using let status = AudioUnitSetProperty(outputUnit,
kAudioOutputUnitProperty_CurrentDevice,
kAudioUnitScope_Global,
0,
&outputDeviceID,
UInt32(MemoryLayout.size))
kAudioOutputUnitProperty_CurrentDevice is invalid, and status = -10879, indicating an error.
STEPS TO REPRODUCE
Set Run Destination to MacOS and run the program. "AudioUnitSetProperty: 0" should be printed, indicating it works fine.
Set Run Destination to Mac Catalyst and run the program. "Error setting output device: -10879" should be printed, indicating an error.
I noticed that while playing back the same tracks via MusicKit on different OSes I get different results regarding the audio files being streamed.
Playing back a lossless file with 24Bit 48kHz and watching the Console for RemotePlayerService I get:
on iPadOS: Lossless; groupID: audio-alac-stereo-48000-24; bitDepth: 24-bit; sampleRate: 48khz; codec: alac; channels: 2; layout: Stereo;
on macOS: Creating AudioQueue with format:'paac', framesPerPacket:1024, sampleRate:44100
While the iPad looks perfect, the Mac does not. Is there a way to fix this issue on macOS.
BTW: I switched the Audio-Midi Settings before, after and while the macOS App was lunched. I also switched to different output devices. I wasn't able to change the bad audio-output on the mac. I tested this under Sequoia 15.5 and Tahoe beta 1, Xcode 16.4 and 26 beta 1.
The AudioVariants of the Album/Tracks are .dolbyAtmos, .lossless, .lossyStereo
Apple Music displays Lossless 24 Bit/48 kHz ALAC when clicking on the playercontroll icon on macOS
I hope there are only some missing or misconfigured properties to get macOS up to par.
Thanks :-)
I’m an amateur developer working on a free utility for composers/producers, for which the macOS release needs to create and name RTP-MIDI sessions in Audio MIDI Setup from the command line (so I can ship a small C helper instead of telling users to click through the UI). Here’s what I’ve tried so far, without luck:
• Plist hacks: Injecting entries into ~/Library/Audio/MIDI Configurations/*.mcfg works when AMS is closed, but AMS immediately locks and reverts my changes when it’s open.
• CoreMIDI C API: I can create virtual ports with MIDISourceCreate, but attempting MIDIObjectGetDataProperty on the apple.midirtp.session plugin always returns err –10836.
• Obj-C & Swift: Loading MIDINetworkSession and calling defaultSession, init, setNetworkName: and setting enabled = YES doesn’t produce a new session object in the Network panel.
• dlopen/dlsym: I extracted the real CoreMIDI binary out of the dyld shared cache and tried binding _MIDINetworkSessionCreate, _SetName, _SetEnabled, etc., but all the symbols come back null or my tool segfaults.
• Plugin registration: I’ve pulled the factory UUID (70C9C5EA-7C65-11D8-B317-000393A34B5A) from /System/Library/Extensions/AppleMIDIRTPDriver.plugin/Contents/Info.plist and called CFPlugInRegisterFactories, but it still never exposes the session-creation calls.
At this point I’m convinced I’m either loading the wrong binary or missing one critical step in registering the RTP-MIDI plugin’s private API. Can anyone point me to:
The exact path of the dylib or bundle that actually exports the MIDINetworkSessionCreate/MIDINetworkSessionSetName/MIDINetworkSessionSetEnabled symbols?
A minimal working snippet (C or Obj-C) that reliably creates and names a Network-MIDI session?
Any pointers, sample code, or even ideas about where Apple hides this functionality on macOS 15 would be hugely appreciated. Thanks!
I'm using an AVAudioConverter object to decode an OPUS stream for VoIP. The decoding itself works well, however, whenever the stream stalls (no more audio packet is available to decode because of network instability) this can be heard in crackling / abrupt stop in decoded audio. OPUS can mitigate this by indicating packet loss by passing a null pointer in the C-library to
int opus_decode_float (OpusDecoder * st, const unsigned char * data, opus_int32 len, float * pcm, int frame_size, int decode_fec), see https://opus-codec.org/docs/opus_api-1.2/group__opus__decoder.html#ga9c554b8c0214e24733a299fe53bb3bd2.
However, with AVAudioConverter using Swift I'm constructing an AVAudioCompressedBuffer like so:
let compressedBuffer = AVAudioCompressedBuffer(
format: VoiceEncoder.Constants.networkFormat,
packetCapacity: 1,
maximumPacketSize: data.count
)
compressedBuffer.byteLength = UInt32(data.count)
compressedBuffer.packetCount = 1
compressedBuffer.packetDescriptions!
.pointee.mDataByteSize = UInt32(data.count)
data.copyBytes(
to: compressedBuffer.data
.assumingMemoryBound(to: UInt8.self),
count: data.count
)
where data: Data contains the raw OPUS frame to be decoded.
How can I specify data loss in this context and cause the AVAudioConverter to output PCM data whenever no more input data is available?
More context:
I'm specifying the audio format like this:
static let frameSize: UInt32 = 960
static let sampleRate: Float64 = 48000.0
static var networkFormatStreamDescription =
AudioStreamBasicDescription(
mSampleRate: sampleRate,
mFormatID: kAudioFormatOpus,
mFormatFlags: 0,
mBytesPerPacket: 0,
mFramesPerPacket: frameSize,
mBytesPerFrame: 0,
mChannelsPerFrame: 1,
mBitsPerChannel: 0,
mReserved: 0
)
static let networkFormat =
AVAudioFormat(
streamDescription:
&networkFormatStreamDescription
)!
I've tried 1) setting byteLength and packetCount to zero and 2) returning nil but setting .haveData in the AVAudioConverterInputBlock I'm using with no success.
I’m using the shared instance of AVAudioSession. After activating it with .setActive(true), I observe the outputVolume, and it correctly reports the device’s volume.
However, after deactivating the session using .setActive(false), changing the volume, and then reactivating it again, the outputVolume returns the previous volume (before deactivation), not the current device volume. The correct volume is only reported after the user manually changes it again using physical buttons or Control Center, which triggers the observer.
What I need is a way to retrieve the actual current device volume immediately after reactivating the audio session, even on the second and subsequent activations.
Disabling and re-enabling the audio session is essential to how my application functions.
I’ve tested this behavior with my colleagues, and the issue is consistently reproducible on iOS 18.0.1, iOS 18.1, iOS 18.3, iOS 18.5 and iOS 18.6.2. On devices running iOS 17.6.1 and iOS 16.0.3, outputVolume correctly reflects the current volume immediately after calling .setActive(true) multiple times.
I have a simple AVAudioEngine graph as follows:
AVAudioPlayerNode -> AVAudioUnitEQ -> AVAudioUnitTimePitch -> AVAudioUnitReverb -> Main mixer node of AVAudioEngine.
I noticed that whenever I have AVAudioUnitTimePitch or AVAudioUnitVarispeed in the graph, I noticed a very distinct crackling/popping sound in my Airpods Pro 2 when starting up the engine and playing the AVAudioPlayerNode and unable to find the reason why this is happening. When I remove the node, the crackling completely goes away. How do I fix this problem since i need the user to be able to control the pitch and rate of the audio during playback.
import AVKit
@Observable @MainActor
class AudioEngineManager {
nonisolated private let engine = AVAudioEngine()
private let playerNode = AVAudioPlayerNode()
private let reverb = AVAudioUnitReverb()
private let pitch = AVAudioUnitTimePitch()
private let eq = AVAudioUnitEQ(numberOfBands: 10)
private var audioFile: AVAudioFile?
private var fadePlayPauseTask: Task<Void, Error>?
private var playPauseCurrentFadeTime: Double = 0
init() {
setupAudioEngine()
}
private func setupAudioEngine() {
guard let url = Bundle.main.url(forResource: "Song name goes here", withExtension: "mp3") else {
print("Audio file not found")
return
}
do {
audioFile = try AVAudioFile(forReading: url)
} catch {
print("Failed to load audio file: \(error)")
return
}
reverb.loadFactoryPreset(.mediumHall)
reverb.wetDryMix = 50
pitch.pitch = 0 // Increase pitch by 500 cents (5 semitones)
engine.attach(playerNode)
engine.attach(pitch)
engine.attach(reverb)
engine.attach(eq)
// Connect: player -> pitch -> reverb -> output
engine.connect(playerNode, to: eq, format: audioFile?.processingFormat)
engine.connect(eq, to: pitch, format: audioFile?.processingFormat)
engine.connect(pitch, to: reverb, format: audioFile?.processingFormat)
engine.connect(reverb, to: engine.mainMixerNode, format: audioFile?.processingFormat)
}
func prepare() {
guard let audioFile else { return }
playerNode.scheduleFile(audioFile, at: nil)
}
func play() {
DispatchQueue.global().async { [weak self] in
guard let self else { return }
engine.prepare()
try? engine.start()
DispatchQueue.main.async { [weak self] in
guard let self else { return }
playerNode.play()
fadePlayPauseTask?.cancel()
playPauseCurrentFadeTime = 0
fadePlayPauseTask = Task { [weak self] in
guard let self else { return }
while true {
let volume = updateVolume(for: playPauseCurrentFadeTime / 0.1, rising: true)
// Ramp up volume until 1 is reached
if volume >= 1 { break }
engine.mainMixerNode.outputVolume = volume
try await Task.sleep(for: .milliseconds(10))
playPauseCurrentFadeTime += 0.01
}
engine.mainMixerNode.outputVolume = 1
}
}
}
}
func pause() {
fadePlayPauseTask?.cancel()
playPauseCurrentFadeTime = 0
fadePlayPauseTask = Task { [weak self] in
guard let self else { return }
while true {
let volume = updateVolume(for: playPauseCurrentFadeTime / 0.1, rising: false)
// Ramp down volume until 0 is reached
if volume <= 0 { break }
engine.mainMixerNode.outputVolume = volume
try await Task.sleep(for: .milliseconds(10))
playPauseCurrentFadeTime += 0.01
}
engine.mainMixerNode.outputVolume = 0
playerNode.pause()
// Shut down engine once ramp down completes
DispatchQueue.global().async { [weak self] in
guard let self else { return }
engine.pause()
}
}
}
private func updateVolume(for x: Double, rising: Bool) -> Float {
if rising {
// Fade in
return Float(pow(x, 2) * (3.0 - 2.0 * (x)))
} else {
// Fade out
return Float(1 - (pow(x, 2) * (3.0 - 2.0 * (x))))
}
}
func setPitch(_ value: Float) {
pitch.pitch = value
}
func setReverbMix(_ value: Float) {
reverb.wetDryMix = value
}
}
struct ContentView: View {
@State private var audioManager = AudioEngineManager()
@State private var pitch: Float = 0
@State private var reverb: Float = 0
var body: some View {
VStack(spacing: 20) {
Text("🎵 Audio Player with Reverb & Pitch")
.font(.title2)
HStack {
Button("Prepare") {
audioManager.prepare()
}
Button("Play") {
audioManager.play()
}
.padding()
.background(Color.green)
.foregroundColor(.white)
.cornerRadius(10)
Button("Pause") {
audioManager.pause()
}
.padding()
.background(Color.red)
.foregroundColor(.white)
.cornerRadius(10)
}
VStack {
Text("Pitch: \(Int(pitch)) cents")
Slider(value: $pitch, in: -2400...2400, step: 100) { _ in
audioManager.setPitch(pitch)
}
}
VStack {
Text("Reverb Mix: \(Int(reverb))%")
Slider(value: $reverb, in: 0...100, step: 1) { _ in
audioManager.setReverbMix(reverb)
}
}
}
.padding()
}
}
I am trying to debug the AAX version of my plugin (MIDI effect) on Pro Tools, but I am getting the following error (Mac console) when attempting to load it:
dlsym cannot find symbol g_dwILResult in CFBundle etc..
I used Xcode 16.4 to build the plugin.
Has anybody come across the same or a similar message?
Best,
Achillefs
Axart Labs
Hi!
I get personal recommendations MusicItemCollection using this code:
func getRecommendations() async throws -> MusicItemCollection<MusicPersonalRecommendation> {
let request = MusicPersonalRecommendationsRequest()
let response = try await request.response()
let recommendations = response.recommendations
return recommendations
}
However, all recommendations contain no more than 12 MusicItem's, while the Music.app application provides much more for some recommendations, for example, for the You recently listened recommendation, the Music.app application displays 40 items. Each recommendation has an items property that contains a collection of musical items MusicItemCollection<MusicPersonalRecommendation.Item>, the hasNextBatch property for these collections is always false. I expected that for some collections loading of new items would be available. Please tell me if I'm doing something wrong or is this a MusicKit bug?
Thank you!
Context:
I am currently developing an app using the Push-to-Talk (PTT) framework. I have reviewed both the PTT framework documentation and the CallKit demo project to better understand how to properly manage audio session activation and AVAudioEngine setup.
I am not activating the audio session manually. The audio session configuration is handled in the incomingPushResult or didBeginTransmitting callbacks from the PTChannelManagerDelegate.
I am using a single AVAudioEngine instance for both input and playback. The engine is started in the didActivate callback from the PTChannelManagerDelegate. When I receive a push in full duplex mode, I set the active participant to the user who is speaking.
Issue
When I attempt to talk while the other participant is already speaking, my input tap on the input node takes a few seconds to return valid PCM audio data. Initially, it returns an empty PCM audio block.
Details:
The audio session is already active and configured with .playAndRecord.
The input tap is already installed when the engine is started.
When I talk from a neutral state (no one is speaking), the system plays the standard "microphone activation" tone, which covers this initial delay. However, this does not happen when I am already receiving audio.
Assumptions / Current Setup
Because the audio session is active in play and record, I assumed that microphone input would be available immediately, even while receiving audio.
However, there seems to be a delay before valid input is delivered to the tap, only occurring when switching from a receive state to simultaneously talking.
Questions
Is this expected behavior when using the PTT framework in full duplex mode with a shared AVAudioEngine?
Should I be restarting or reconfiguring the engine or audio session when beginning to talk while receiving audio?
Is there a recommended pattern for managing microphone readiness in this scenario to avoid the initial empty PCM buffer?
Would using separate engines for input and output improve responsiveness?
I would like to confirm the correct approach to handling simultaneous talk and receive in full duplex mode using PTT framework and AVAudioEngine. Specifically, I need guidance on ensuring the microphone is ready to capture audio immediately without the delay seen in my current implementation.
Relevant Code Snippets
Engine Setup
func setup() {
let input = audioEngine.inputNode
do {
try input.setVoiceProcessingEnabled(true)
} catch {
print("Could not enable voice processing \(error)")
return
}
input.isVoiceProcessingAGCEnabled = false
let output = audioEngine.outputNode
let mainMixer = audioEngine.mainMixerNode
audioEngine.connect(pttPlayerNode, to: mainMixer, format: outputFormat)
audioEngine.connect(beepNode, to: mainMixer, format: outputFormat)
audioEngine.connect(mainMixer, to: output, format: outputFormat)
// Initialize converters
converter = AVAudioConverter(from: inputFormat, to: outputFormat)!
f32ToInt16Converter = AVAudioConverter(from: outputFormat, to: inputFormat)!
audioEngine.prepare()
}
Input Tap Installation
func installTap() {
guard AudioHandler.shared.checkMicrophonePermission() else {
print("Microphone not granted for recording")
return
}
guard !isInputTapped else {
print("[AudioEngine] Input is already tapped!")
return
}
let input = audioEngine.inputNode
let microphoneFormat = input.inputFormat(forBus: 0)
let microphoneDownsampler = AVAudioConverter(from: microphoneFormat, to: outputFormat)!
let desiredFormat = outputFormat
let inputFramesNeeded = AVAudioFrameCount((Double(OpusCodec.DECODED_PACKET_NUM_SAMPLES) * microphoneFormat.sampleRate) / desiredFormat.sampleRate)
input.installTap(onBus: 0, bufferSize: inputFramesNeeded, format: input.inputFormat(forBus: 0)) { [weak self] buffer, when in
guard let self = self else { return }
// Output buffer: 1920 frames at 16kHz
guard let outputBuffer = AVAudioPCMBuffer(pcmFormat: desiredFormat, frameCapacity: AVAudioFrameCount(OpusCodec.DECODED_PACKET_NUM_SAMPLES)) else { return }
outputBuffer.frameLength = outputBuffer.frameCapacity
let inputBlock: AVAudioConverterInputBlock = { inNumPackets, outStatus in
outStatus.pointee = .haveData
return buffer
}
var error: NSError?
let converterResult = microphoneDownsampler.convert(to: outputBuffer, error: &error, withInputFrom: inputBlock)
if converterResult != .haveData {
DebugLogger.shared.print("Downsample error \(converterResult)")
} else {
self.handleDownsampledBuffer(outputBuffer)
}
}
isInputTapped = true
}
Hi I'm new to the forum,
I'm planning an app just for Apple watch, I would like to use bluetooth audio in background, how can I do it?
The messages I send via bluetooth stop as soon as the watch display turns off.
Thank you!
Nax
So,
I've been wondering how fast a an offline STT -> ML Prompt -> TTS roundtrip would be.
Interestingly, for many tests, the SpeechTranscriber (STT) takes the bulk of the time, compared to generating a FoundationModel response and creating the Audio using TTS.
E.g.
InteractionStatistics:
- listeningStarted: 21:24:23 4480 2423
- timeTillFirstAboveNoiseFloor: 01.794
- timeTillLastNoiseAboveFloor: 02.383
- timeTillFirstSpeechDetected: 02.399
- timeTillTranscriptFinalized: 04.510
- timeTillFirstMLModelResponse: 04.938
- timeTillMLModelResponse: 05.379
- timeTillTTSStarted: 04.962
- timeTillTTSFinished: 11.016
- speechLength: 06.054
- timeToResponse: 02.578
- transcript: This is a test.
- mlModelResponse: Sure! I'm ready to help with your test. What do you need help with?
Here, between my audio input ending and the Text-2-Speech starting top play (using AVSpeechUtterance) the total response time was 2.5s.
Of that time, it took the SpeechAnalyzer 2.1s to get the transcript finalized, FoundationModel only took 0.4s to respond (and TTS started playing nearly instantly).
I'm already using reportingOptions: [.volatileResults, .fastResults] so it's probably as fast as possible right now?
I'm just surprised the STT takes so much longer compared to the other parts (all being CoreML based, aren't they?)