Post not yet marked as solved
I have scheduled 2 player nodes with same file to audio engine but sound waves are not correctly visualised in audacity. It is creating noise or silence in between. I am not understanding what this about. If i scheduled 2 different file or single file then it is perfectly working.
Use case - There are 2 player nodes connect to same audio engine and each player node scheduled with same audio file. Then it is not working as expected. Sound waves has error like SILENCE and mute for one of player node
Below is code snippet for reference
self.audioEngine = AVAudioEngine()
self.mainMixerNode = AVAudioMixerNode()
self.audioEngine.attach(self.mainMixerNode)
self.audioEngine.connect(mainMixerNode, to: self.audioEngine.outputNode, format: nil)
self.audioEngine.prepare()
try! audioEngine.start()
// Scheduling same file
playerNode.prepare(withFrameCount: AVAudioFrameCount(segmentFrameCount))
playerNode.scheduleSegment(audioFile, startingFrame: 0, frameCount: AVAudioFrameCount(segmentFrameCount), at: playerTime, completionHandler: nil)
Post not yet marked as solved
This is on a Mac Mini M1 with OSX Monterey.
I am trying to write an audio network using AVAudioEngine as opposed to AUAudioGraph (which I understand is deprecated in favor of AVAudioEngine). My code works properly with AUAudioGraph.
The input is a microphone which has a sample rate of 8 kHz. In the render proc, the data is written to a ring buffer. Debugging shows that the render proc is called every 0.064 seconds and writes 512 samples (8000 * 0x064 = 512).
The program creates an AVAudioSourceNode. The render block for that node pulls data from the above ring buffer. But debugging shows that it is trying to take 512 samples about every 0.0107 seconds. That works out to 48000 samples per second, which is the output device sample rate. Obviously the ring buffer can't keep up.
In the statement connecting the above source node to the AVEngine's mixer node, I specify (at least I think I am) a sample rate of 8000, but it still seems to be running at 48000.
let inputFormat = AVAudioFormat(
commonFormat: outputFormat.commonFormat,
sampleRate: 8000,
channels: 1,
interleaved: outputFormat.isInterleaved)
engine.connect(srcNode, to: mixerNode, fromBus: 0, toBus: 0, format: inputFormat)
Also, looking at the microphone input using Audio MIDI Setup shows that microphone format is 8000 Hz, 1 channel 16-bit integer, but when I examine the input format of the AudioNode it is reported as 8000 Hz, 1 channel 32-bit float. The input node is using HAL. Obviously, somewhere in the internals of the node the samples are being converted from 16-bit ints to 32-bit floats. Is there a way to also have the sample rate changed?
Am I doing this wrong? The HAL node was used with AUAudioGraph. Is there a different node that should be used with AVAudioEngine? I see that AVAudioEngine has an input node, but it seems if I connect it to the microphone, the input goes straight to the hardware output without going through the mixer node (where I want to mix in other audio sources).
The original AUGraph code was modeled after the code in "Learning Core Audio" by Adamson & Avila, which, although it is old (pre-dating Swift and AVAudioEngine), is the only detailed reference on CoreAudio that I have been able to find. Is there a newer reference?
Thanks,
Mark
Post not yet marked as solved
I updated Xcode to Xcode 13 and iPadOS to 15.0.
Now my previously working application using SFSpeechRecognizer fails to start, regardless of whether I'm using on device mode or not.
I use the delegate approach, and it looks like although the plist is set-up correctly (the authorization is successful and I get the orange circle indicating the microphone is on), the delegate method speechRecognitionTask(_:didFinishSuccessfully:) always returns false, but there is no particular error message to go along with this.
I also downloaded the official example from Apple's documentation pages:
SpokenWord SFSpeechRecognition example project page
Unfortunately, it also does not work anymore.
I'm working on a time-sensitive project and don't know where to go from here. How can we troubleshoot this? If it's an issue with Apple's API update or something has changed in the initial setup, I really need to know as soon as possible.
Thanks.
Post not yet marked as solved
I've currently received a task that requires to evaluate the possibility, as title, of recording via mic on BLE headset and play sound via built-in speaker at same time on iOS.
I've done implementing forcing audio device set to built-in speaker whenever the BLE headset is connected/disconnected. It works if both mic/speaker need to be set to built-in one. But after days of search and try, I found that it is not possible to make mic/speaker set separately. Even specifying input device on AVAudioEngine is supported only on MacOS, not iOS.
Can anyone or any technician give me a persuading answer about "Possibility of record via mic on BLE headset and play sound via built-in speaker at same time"?
Post not yet marked as solved
How can you add a live audio player to Xcode where they will have a interactive UI to control the audio and they will be able to exit out of the app and or turn their device off and it will keep playing? Is their a framework or API that will work for this? Thanks! Really need help with this…. 🤩 I have looked everywhere and haven’t found something that works….
Post not yet marked as solved
How can you add a live audio player to Xcode where they will have a interactive UI to control the audio and they will be able to exit out of the app and or turn their device off and it will keep playing? Is their a framework or API that will work for this? Thanks! Really need help with this…. 🤩
Post not yet marked as solved
I'm unclear on how to access the inward facing microphone in the AirPods Pro (not the outward facing one). If this is possible, can you point me in the right direction?
More context is that there is a ticking noise coming a spasm inside someone's ears that I'd like to try canceling for them.
The standard AirPods Pro noise cancellation modes don't have any effect on the sound.
I know latency may be too high to do this on the phone with a custom app, but thought if I could reach the point of that being the problem, then I could experiment with predictive algorithms.
Thank you in advance for ideas or recommendations.
Post not yet marked as solved
This just seems like a useful thing to have when rendering audio. For example, let's say you have an effect that pitches audio up/down. That typically requires that you know the sample rate of the incoming audio. The way I do this right now is just to save the sample rate after the AUAudioUnit's render resources have been allocated, but being provided this info on a per-render-callback basis seems more useful.
Another use case is for AUAudioUnit's on the input chain. Since the format for connections must match the hardware format, you can no longer explicitly set the format that you expect the audio to come in at. You can check the sample rate on the AVAudioEngine's input node or the sample rate on the AVAudioSession singleton, but when you are working with the audio from within the render callback, you don't want to be accessing those methods due to the possibility they are blocking calls. This is especially true when using the AVAudioSinkNode where you don't have the ability to set the sample rate before the underlying node's render resources are allocated.
Am I missing something here, or does this actually seem useful?
Post not yet marked as solved
When using VoiceProcessingIO audio unit with voicechat audio session mode to have echo cancellation, I can't play audio in stereo, it only allows mono audio.
How can I enable stereo playback with echo cancellation?
Is it some kind of limitation? since it isn't mentioned anywhere in the documentation.
Post not yet marked as solved
I’m developing a voice communication app for the iPad with both playback and record and using AudioUnit of type kAudioUnitSubType_VoiceProcessingIO to have echo cancellation.
When playing the audio before initializing the recording audio unit, volume is high. But if I'm playing the audio after initializing the audio unit or when switching to remoteio and then back to vpio the playback volume is low.
It seems like a bug in iOS, any solution or workaround for this? Searching the net I only found this post without any solution: https://developer.apple.com/forums/thread/671836
Post not yet marked as solved
I receive a buffer from[AVSpeechSynthesizer convertToBuffer:fromBuffer:] and want to schedule it on an AVPlayerNode.
The player node's output format need to be something that the next node could handle and as far as I understand most nodes can handle a canonical format.
The format provided by AVSpeechSynthesizer is not something thatAVAudioMixerNode supports.
So the following:
AVAudioEngine *engine = [[AVAudioEngine alloc] init];
playerNode = [[AVAudioPlayerNode alloc] init];
AVAudioFormat *format = [[AVAudioFormat alloc]
initWithSettings:utterance.voice.audioFileSettings];
[engine attachNode:self.playerNode];
[engine connect:self.playerNode to:engine.mainMixerNode format:format];
Throws an exception:
Thread 1: "[[busArray objectAtIndexedSubscript:(NSUInteger)element] setFormat:format error:&nsErr]: returned false, error Error Domain=NSOSStatusErrorDomain Code=-10868 \"(null)\""
I am looking for a way to obtain the canonical format for the platform so that I can use AVAudioConverter to convert the buffer.
Since different platforms have different canonical formats, I imagine there should be some library way of doing this. Otherwise each developer will have to redefine it for each platform the code will run on (OSX, iOS etc) and keep it updated when it changes.
I could not find any constant or function which can make such format, ASDB or settings.
The smartest way I could think of, which does not work:
AudioStreamBasicDescription toDesc;
FillOutASBDForLPCM(toDesc, [AVAudioSession sharedInstance].sampleRate,
2, 16, 16, kAudioFormatFlagIsFloat, kAudioFormatFlagsNativeEndian);
AVAudioFormat *toFormat = [[AVAudioFormat alloc] initWithStreamDescription:&toDesc];
Even the provided example for iPhone, in the documentation linked above, uses kAudioFormatFlagsAudioUnitCanonical and AudioUnitSampleType which are deprecated.
So what is the correct way to do this?
Post not yet marked as solved
I have trouble understanding AVAudioEngine's behaviour when switching audio input sources.
Expected Behaviour
When switching input sources, AVAudioEngine's inputNode should adopt the new input source seamlessly.
Actual Behaviour
When switching from AirPods to the iPhone speaker, AVAudioEngine stops working. No audio is routed through anymore. Querying engine.isRunning still returns true.
When subsequently switching back to AirPods, it still isn't working, but now engine.isRunning returns false.
Stopping and starting the engine on a route change does not help. Neither does calling reset(). Disconnecting and reconnecting the input node does not help, either. The only thing that reliably helps is discarding the whole engine and creating a new one.
OS
This is on iOS 14, beta 5. I can't test this on previous versions I'm afraid; I only have one device around.
Code to Reproduce
Here is a minimum code example. Create a simple app project in Xcode (doesn't matter whether you choose SwiftUI or Storyboard), and give it permissions to access the microphone in Info.plist. Create the following file Conductor.swift:
import AVFoundation
class Conductor {
		static let shared: Conductor = Conductor()
		
		private let _engine = AVAudioEngine?
		
		init() {
				// Session
				let session = AVAudioSession.sharedInstance()
				try? session.setActive(false)
				try! session.setCategory(.playAndRecord, options: [.defaultToSpeaker,
																													 .allowBluetooth,
																													 .allowAirPlay])
				try! session.setActive(true)
				_engine.connect(_engine.inputNode, to: _engine.mainMixerNode, format: nil)
				_engine.prepare()
		}
		func start() { _engine.start() }
}
And in AppDelegate, call:
Conductor.shared.start()
This example will route the input straight to the output. If you don't have headphones, it will trigger a feedback loop.
Question
What am I missing here? Is this expected behaviour? If so, it does not seem to be documented anywhere.
Post not yet marked as solved
I tried to run multiple demos utilising spatial audio. However no matter what I do, I only get 2 channel output. Which is also confirmed by calling:
let numHardwareOutputChannels = gameView.audioEngine.outputNode.outputFormat(forBus: 0).channelCount
My appleTV is connected to DolbyAtmos capable audio system which works just fine.
So my question is more less - how to convince TVOS app that my appleTV has multichannel output ?!
Post not yet marked as solved
Working on a recording app. So I started from scratch, and basically jump right into recording. I made sure to add the Privacy - Microphone Usage Description string.
What strikes me as odd, is that the app launches straight into recording. No alert comes up the first time asking the user for permission, which I thought was the norm.
Have I misunderstood something?
override func viewDidLoad() {
super.viewDidLoad()
record3()
}
func record3() {
print ("recording")
let node = audioEngine.inputNode
let recordingFormat = node.inputFormat(forBus: 0)
var silencish = 0
var wordsish = 0
makeFile(format: recordingFormat)
node.installTap(onBus: 0, bufferSize: 8192, format: recordingFormat, block: {
[self]
(buffer, _) in
do {
try audioFile!.write(from: buffer);
x += 1;
if x > 300 {
print ("it's over sergio")
endThis()
}
} catch {return};})
audioEngine.prepare()
do {
try audioEngine.start()
} catch let error {
print ("oh catch \(error)")
}
}
Post not yet marked as solved
I'm using an AVAudioConverter object to decode an OPUS stream for VoIP. The decoding itself works well, however, whenever the stream stalls (no more audio packet is available to decode because of network instability) this can be heard in crackling / abrupt stop in decoded audio. OPUS can mitigate this by indicating packet loss by passing a null pointer in the C-library to
int opus_decode_float (OpusDecoder * st, const unsigned char * data, opus_int32 len, float * pcm, int frame_size, int decode_fec), see https://opus-codec.org/docs/opus_api-1.2/group__opus__decoder.html#ga9c554b8c0214e24733a299fe53bb3bd2.
However, with AVAudioConverter using Swift I'm constructing an AVAudioCompressedBuffer like so:
let compressedBuffer = AVAudioCompressedBuffer(
format: VoiceEncoder.Constants.networkFormat,
packetCapacity: 1,
maximumPacketSize: data.count
)
compressedBuffer.byteLength = UInt32(data.count)
compressedBuffer.packetCount = 1
compressedBuffer.packetDescriptions!
.pointee.mDataByteSize = UInt32(data.count)
data.copyBytes(
to: compressedBuffer.data
.assumingMemoryBound(to: UInt8.self),
count: data.count
)
where data: Data contains the raw OPUS frame to be decoded.
How can I specify data loss in this context and cause the AVAudioConverter to output PCM data whenever no more input data is available?
More context:
I'm specifying the audio format like this:
static let frameSize: UInt32 = 960
static let sampleRate: Float64 = 48000.0
static var networkFormatStreamDescription =
AudioStreamBasicDescription(
mSampleRate: sampleRate,
mFormatID: kAudioFormatOpus,
mFormatFlags: 0,
mBytesPerPacket: 0,
mFramesPerPacket: frameSize,
mBytesPerFrame: 0,
mChannelsPerFrame: 1,
mBitsPerChannel: 0,
mReserved: 0
)
static let networkFormat =
AVAudioFormat(
streamDescription:
&networkFormatStreamDescription
)!
I've tried 1) setting byteLength and packetCount to zero and 2) returning nil but setting .haveData in the AVAudioConverterInputBlock I'm using with no success.
Post not yet marked as solved
I am working on music application where multiple wav files are scheduling within time frame. Everything is working perfect except one scenario where there is small beef is coming while scheduling player node again.
For example - one.wav is playing on PlayerNode1 and now I am rescheduling to second.wav after 2 second then there is small beep is coming. I have tried to stop node by checking isPlaying condition. Still it is not working. Am I doing anything wrong here.
if playerNode.isPlaying {
playerNode.stop()
}
playerNode.scheduleFile(audioFile, at: nil, completionHandler: nil)
playerNode.play()
I am using same player node for performance as there are 24 wav files that needs to be played in 1 minute so there is no point to keeping all player nodes.
How would stop beef while rescheduling new audio file for same player node?
I have shared link below to check issue.
https://drive.google.com/file/d/1FjZtLUj_wUp0LQPyjIwfJNy67HWUlt0I/view?usp=sharing
Expected result should be song continuity
Post not yet marked as solved
Is there any possible way to produce a system for my macOS program that will:
-Allow the user to pick whether my program will output its audio either to the system output or to an AirPlay destination?
-While doing so, offer the ability to control the volume of the asset currently playing? (AVPlayer 'volume' setter stops responding when connected to an AirPlay endpoint)
-Also allow me to attach a 10-band EQ to the output?
I tried to do this five years ago, in 2017, and expected the ecosystem to have improved by now. While AVRoutePickerView and AVPlayer are user-friendly and convenient, the fact that basic functionality like the volume control ceases functioning over AirPlay is quite frustrating. AVAudioPlayer seems like it might offer this functionality, but only on iOS and not on macOS!
I am basically only trying to offer the same AirPlay controls that Music.app does. Is this really so difficult?
I am unable to get AVSpeechSynthesizer to write or to acknowledge the delegate actions .
I was informed this was resolved in macOS 11.
I thought it was a lot to ask but am now running on macOS 11.4 (Big Sur).
My target is to output speech faster than real-time and and drive the output through AVAudioengine.
First, I need to know why the write doesnt occur and neither do delegates get called whether I am using write or simply uttering to the default speakers in "func speak(_ string: String)".
What am I missing?
Is there a workaround?
Reference: https://developer.apple.com/forums/thread/678287
let sentenceToSpeak = "This should write to buffer and also call 'didFinish' and 'willSpeakRangeOfSpeechString' delegates."
SpeakerTest().writeToBuffer(sentenceToSpeak)
SpeakerTest().speak(sentenceToSpeak)
class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate {
let synth = AVSpeechSynthesizer()
override init() {
super.init()
synth.delegate = self
}
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
print("Utterance didFinish")
}
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
willSpeakRangeOfSpeechString characterRange: NSRange,
utterance: AVSpeechUtterance)
{
print("speaking range: \(characterRange)")
}
func speak(_ string: String) {
let utterance = AVSpeechUtterance(string: string)
var usedVoice = AVSpeechSynthesisVoice(language: "en") // should be the default voice
let voices = AVSpeechSynthesisVoice.speechVoices()
let targetVoice = "Allison"
for voice in voices {
// print("\(voice.identifier) \(voice.name) \(voice.quality) \(voice.language)")
if (voice.name.lowercased() == targetVoice.lowercased())
{
usedVoice = AVSpeechSynthesisVoice(identifier: voice.identifier)
break
}
}
utterance.voice = usedVoice
print("utterance.voice: \(utterance.voice)")
synth.speak(utterance)
}
func writeToBuffer(_ string: String)
{
print("entering writeToBuffer")
let utterance = AVSpeechUtterance(string: string)
synth.write(utterance) { (buffer: AVAudioBuffer) in
print("executing synth.write")
guard let pcmBuffer = buffer as? AVAudioPCMBuffer else {
fatalError("unknown buffer type: \(buffer)")
}
if pcmBuffer.frameLength == 0 {
print("buffer is empty")
} else {
print("buffer has content \(buffer)")
}
}
}
}
Post not yet marked as solved
I have my Swift app that records audio in chunks of multiple files, each M4A file is approx 1 minute long. I would like to go through those files and detect silence, or the lowest level.
While I am able to read the file into a buffer, my problem is deciphering it. Even with Google, all it comes up with is "audio players" instead of sites that describe the header and the data.
Where can I find what to look for? Or even if I should be reading it into a WAV file? But even then I cannot seem to find a tool, or a site, that tells me how to decipher what I am reading.
Obviously it exists, since Siri knows when you've stopped speaking. Just trying to find the key.
Post not yet marked as solved
Hello,
Am starting to work with/learn the AVAudioEngine.
Currently am at the point where I would like to be able read an audio file of a speech and determine if there are any moments of silence in the speech.
Does this framework provide any such properties, such as power lever, decibels, etc. that I can use in finding long enough moments of silence?