AVAudioEngine

RSS for tag

Use a group of connected audio node objects to generate and process audio signals and perform audio input and output.

AVAudioEngine Documentation

Posts under AVAudioEngine tag

70 Posts
Sort by:
Post not yet marked as solved
1 Replies
350 Views
Hello, I am trying to use AVAudioFile to save audio buffer to .wav file. The buffer is of type [Float]. Currently I am able to successfully create the .wav files and even play them, but they are blank - I cannot hear any sound. private func saveAudioFile(using buffer: [Float]) { let fileUrl = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!.appendingPathComponent("\(UUID().uuidString).wav") let fileSettings = [ AVFormatIDKey: Int(kAudioFormatLinearPCM), AVSampleRateKey: 15600, AVNumberOfChannelsKey: 1, AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue ] guard let file = try? AVAudioFile(forWriting: fileUrl, settings: fileSettings, commonFormat: .pcmFormatInt16, interleaved: true) else { print("Cannot create AudioFile") return } guard let bufferFormat = AVAudioFormat(settings: settings) else { print("Cannot create buffer format") return } guard let outputBuffer = AVAudioPCMBuffer(pcmFormat: bufferFormat, frameCapacity: AVAudioFrameCount(buffer.count)) else { print("Cannot create output buffer") return } for i in 0..<buffer.count { outputBuffer.int16ChannelData!.pointee[i] = Int16(buffer[i]) } outputBuffer.frameLength = AVAudioFrameCount(buffer.count) do { try file.write(from: outputBuffer) } catch { print(error.localizedDescription) print("Write to file failed") } } Where should I be looking first for the problem? Is it format issue? I am getting the data from the microphone with the AVAudioEngine. Its format is created like this: let outputFormat = AVAudioFormat(commonFormat: .pcmFormatInt16, sampleRate: Double(15600), channels: 1, interleaved: true)! And here is the installTap implementation with the buffer callback: input.installTap(onBus: 0, bufferSize: AVAudioFrameCount(sampleRate*2), format: inputFormat) { (incomingBuffer, time) in DispatchQueue.global(qos: .background).async { let pcmBuffer = AVAudioPCMBuffer(pcmFormat: outputFormat, frameCapacity: AVAudioFrameCount(outputFormat.sampleRate * 2.0)) var error: NSError? = nil let inputBlock: AVAudioConverterInputBlock = { inNumPackets, outStatus in outStatus.pointee = AVAudioConverterInputStatus.haveData return incomingBuffer } formatConverter.convert(to: pcmBuffer!, error: &error, withInputFrom: inputBlock) if error != nil { print(error!.localizedDescription) } else if let pcmBuffer = pcmBuffer, let channelData = pcmBuffer.int16ChannelData { let channelDataPointer = channelData.pointee self.buffer = stride(from: 0, to: self.windowLengthSamples, by: 1).map { Float(channelDataPointer[$0]) / 32768.0 } onBufferUpdated(self.buffer) } } } The onBufferUpdated is the block that provides [Float] for the saveAudioFile method above. I have tried some experiements with different output formats, but that ended up with unplayable audio files.
Posted
by nemecek_f.
Last updated
.
Post not yet marked as solved
0 Replies
245 Views
Hello, I would like to create a VoIP App I want to build a sound visualizer for the voice of the other party. example: https://medium.com/swlh/swiftui-create-a-sound-visualizer-cadee0b6ad37 The call function was implemented using the call kit. (Based on twilio's quick-start) https://jp.twilio.com/docs/voice/sdks/ios/get-started https://github.com/twilio/voice-quickstart-ios After importing AVFoundation, have to need routing the AVAudioengine mixer. I have no idea how to put voice in the mixer. Any guidance is appreciated! Thank you!
Posted Last updated
.
Post marked as solved
1 Replies
262 Views
I am trying to save the buffer from my installTap to a file. I do it in chunks of 10 so I'll get a bigger file. When I try to play the written file (from the simulator's directory) QuickTime says that it's not compatible. I have examined the bad m4a file and a working one. There are a lot of zero's in the bad file at the beginning followed by a lot of data. However both files appears to have the same header. A lot of people mention that I have to nil the AudioFile, but: audioFile = nil is not a valid syntax, nor can I file a close method in AudioFile. Here's the complete code, edited into one working file: import UIKit import AVFoundation class ViewController: UIViewController { let audioEngine = AVAudioEngine() var audioFile = AVAudioFile() var x = 0 override func viewDidLoad() { super.viewDidLoad() record() // Do any additional setup after loading the view. } func makeFile(format: AVAudioFormat) { let paths = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first do { _ = try FileManager.default.contentsOfDirectory(at: paths!, includingPropertiesForKeys: nil) } catch { print ("error")} let destinationPath = paths!.appendingPathComponent("audioT.m4a") print ("\(destinationPath)") do { audioFile = try AVAudioFile(forWriting: destinationPath, settings: format.settings) print ("file created") } catch { print ("error creating file")} } func record(){ let node = audioEngine.inputNode let recordingFormat = node.inputFormat(forBus: 0) makeFile(format: recordingFormat) node.installTap(onBus: 0, bufferSize: 8192, format: recordingFormat, block: { [self] (buffer, _) in do { try audioFile.write(from: buffer); print ("buffer filled"); x += 1; print("wrote \(x)") if x > 9 { endThis() } } catch {return};}) audioEngine.prepare() do { try audioEngine.start() } catch let error { print ("oh catch") } } func endThis(){ audioEngine.stop() audioEngine.inputNode.removeTap(onBus: 0) } }
Posted
by SergioDCQ.
Last updated
.
Post not yet marked as solved
2 Replies
532 Views
I recently released my first ShazamKit app, but there is one thing that still bothers me. When I started I followed the steps as documented by Apple right here : https://developer.apple.com/documentation/shazamkit/shsession/matching_audio_using_the_built-in_microphone however when I was running this on iPad I receive a lot of high pitched feedback noise when I ran my app with this configuration. I got it to work by commenting out the output node and format and only use the input. But now I want to be able to recognise the song that’s playing from the device that has my app open and was wondering if I need the output nodes for that or if I can do something else to prevent the Mic. Feedback from happening. In short: What can I do to prevent feedback from happening Can I use the output of a device to recognise songs or do I just need to make sure that the microphone can run at the same time as playing music? Other than that I really love the ShazamKit API and can highly recommend to have a go with it! This is the code as documented in the above link (I just added the comments of what broke it for me) func configureAudioEngine() { // Get the native audio format of the engine's input bus. let inputFormat = audioEngine.inputNode.inputFormat(forBus: 0) // THIS CREATES FEEDBACK ON IPAD PRO let outputFormat = AVAudioFormat(standardFormatWithSampleRate: 48000, channels: 1) // Create a mixer node to convert the input. audioEngine.attach(mixerNode) // Attach the mixer to the microphone input and the output of the audio engine. audioEngine.connect(audioEngine.inputNode, to: mixerNode, format: inputFormat) // THIS CREATES FEEDBACK ON IPAD PRO audioEngine.connect(mixerNode, to: audioEngine.outputNode, format: outputFormat) // Install a tap on the mixer node to capture the microphone audio. mixerNode.installTap(onBus: 0, bufferSize: 8192, format: outputFormat) { buffer, audioTime in // Add captured audio to the buffer used for making a match. self.addAudio(buffer: buffer, audioTime: audioTime) } }
Posted Last updated
.
Post not yet marked as solved
0 Replies
201 Views
Am trying to go from the installTap straight to AVAudioFile(forWriting: I call: let recordingFormat = node.outputFormat(forBus: 0) and I get back : <AVAudioFormat 0x60000278f750:  1 ch,  48000 Hz, Float32> But AVAudioFile has a settings parameter of [String : Any] and am curious of how to place those values into recording the required format. Hopefully these are the values I need?
Posted
by SergioDCQ.
Last updated
.
Post marked as solved
5 Replies
389 Views
Expanding a speech to text demo, and while it works, I am still trying to learn Swift. Is .installTap the Swift version of a C callback function? From what I interpret here, every time the buffer becomes full, the code in between the last { } runs, as well, the code below it is also run. It almost feels like a callback combined with a GOTO line from basic. yes, it works, but I'd like to understand that I am getting the flow of the code correctly. func startSpeechRecognition (){ let node = audioEngine.inputNode let recordingFormat = node.outputFormat(forBus: 0) node.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, _) in self.request.append(buffer) } audioEngine.prepare() do { try audioEngine.start() } catch let error { ... } guard let myRecognition = SFSpeechRecognizer() else { ... return } if !myRecognition.isAvailable { ... } task = speechRecognizer?.recognitionTask(with: request, resultHandler: { (response, error) in guard let response = response else { if error != nil { print ("\(String(describing: error.debugDescription))") } else { print ("problem in repsonse") } return } let message = response.bestTranscription.formattedString print ("\(message)") }) }
Posted
by SergioDCQ.
Last updated
.
Post not yet marked as solved
3 Replies
1.3k Views
I have been using AVAudioEngine to take audio from the mic and send it out over a WebRTC connection. When I use the iPhone device mic, this seems to work as expected. But if I run the app with bluetooth headphones connected, the engine reports this error when trying to start: [avae]  AVAudioEngine.mm:160   Engine@0x2833e1790: could not initialize, error = -10868 [avae]  AVAEInternal.h:109   [AVAudioEngineGraph.mm:1397:Initialize: (err = AUGraphParser::InitializeActiveNodesInInputChain(ThisGraph, *GetInputNode())): error -10868 Error starting audio engine: The operation couldn’t be completed. (com.apple.coreaudio.avfaudio error -10868.) I see that Error code -10878 is: @constant kAudioUnitErr_FormatNotSupported Returned if an input or output format is not supported ... kAudioUnitErr_FormatNotSupported = -10868 but that doesn't seem like it can be quite correct. I know that the output format is supported because the same format works correctly when my headphones are not attached. And I am pretty sure that the input format is supported because I am able to simply hook up Headphones InputNode -> Mixer -> Headphones OutputNode and correctly hear the audio from the mic. So I can only assume that this means the format conversion is not supported. My Questions: Is this a bug? Is there any way I can work around this? Notes: My full audio graph looks like this, where all the "mixers" are just AVAudioMixerNodes: // InputNode (Mic)  -> Mic Mixer -\ // &#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9;&#9; >-> WebRTC Mixer -> Tap -> WebRTC Framework // AudioPlayer 1 -> Player Mixer  -/ // // AudioPlayer 2 -> Player Mixer -----> LocalOutputMixer -> OutputNode (Device Speakers/Headphones) but the issue still happens even if I simplify down to this: InputNode (Mic)  ->&#9;Mixer -> Tap -> WebRTC Framework Specifically it happens when a single mixer node is connected with an input format and output format as follows: The input format is: (lldb) po audioEngine.inputNode.inputFormat(forBus: 0).streamDescription.pointee ▿ AudioStreamBasicDescription &#9;- mSampleRate : 16000.0 &#9;- mFormatID : 1819304813 &#9;- mFormatFlags : 41 &#9;- mBytesPerPacket : 4 &#9;- mFramesPerPacket : 1 &#9;- mBytesPerFrame : 4 &#9;- mChannelsPerFrame : 1 &#9;- mBitsPerChannel : 32 &#9;- mReserved : 0 The output format WebRTC expects is: ▿ AudioStreamBasicDescription &#9;- mSampleRate : 48000.0 &#9;- mFormatID : 1819304813 &#9;- mFormatFlags : 12 &#9;- mBytesPerPacket : 2 &#9;- mFramesPerPacket : 1 &#9;- mBytesPerFrame : 2 &#9;- mChannelsPerFrame : 1 &#9;- mBitsPerChannel : 16 &#9;- mReserved : 0 My headphones are Jaybird Freedom 2.
Posted Last updated
.
Post not yet marked as solved
0 Replies
195 Views
I understand that both the AirPods Pro and AirPods Max have an outer microphone whose purpose is to record the environment and replay processed audio under "transparency mode." I am wishing to access the samples recorded by this microphone and run them through processing via the AVAudioEngine. Is the audio from this microphone accessible via Xcode or is it only internally wired? If the former, how does one access theses samples?
Posted
by FoeHammar.
Last updated
.
Post marked as solved
1 Replies
515 Views
I'm developing a game that will use speech recognition to execute various commands. I am using code from Apple's Recognizing Speech in Live Audio documentation page. When I run this in a Swift Playground, it works just fine. However, when I make a SpriteKit game application (basic setup from Xcode's "New Project" menu option), I get the following error: required condition is false: IsFormatSampleRateAndChannelCountValid(hwFormat) Upon further research, it appears that my input node has no channels. The following is the relevant portion of my code, along with debug output: let inputNode = audioEngine.inputNode print("Number of inputs: \(inputNode.numberOfInputs)") // 1 print("Input Format: \(inputNode.inputFormat(forBus: 0))") // <AVAudioFormat 0x600001bcf200: 0 ch, 0 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved> let channelCount = inputNode.inputFormat(forBus: 0).channelCount print("Channel Count: \(channelCount)") // 0 <== Agrees with the inputFormat output listed previously // Configure the microphone input. print("Number of outputs: \(inputNode.numberOfOutputs)") // 1 let recordingFormat = inputNode.outputFormat(forBus: 0) print("Output Format: \(recordingFormat)") // <AVAudioFormat 0x600001bf3160: 2 ch, 44100 Hz, Float32, non-inter> inputNode.installTap(onBus: 0, bufferSize: 256, format: recordingFormat, block: audioTap) // <== This is where the error occurs. // NOTE: 'audioTap' is a function defined in this class. Using this defined function instead of an inline, anonymous function. The code snippet is included in the game's AppDelegate class (which includes import statements for Cocoa, AVFoundation, and Speech), and executes during its applicationDidFinishLaunching function. I'm having trouble understanding why Playground works, but a game app doesn't work. Do I need to do something specific to get the application to recognize the microphone? NOTE: This if for MacOS, NOT iOS. While the "How To" documentation cited earlier indicates iOS, Apple stated at WWDC19 that it is now supported on the MacOS. NOTE: I have included the NSSpeechRecognitionUsageDescription key in the applications plist, and successfully acknowledged the authorization request for the microphone.
Posted Last updated
.
Post marked as solved
1 Replies
298 Views
Am using the demo code below to flesh out an audio recording app in Swift 5.x I would like to monitor certain aspects of the AVAudioRecorder as it is recording. Such as: frequency, power, volume, etc. but in live time. I found an example in Swift 3 where the user sets up a callback timer for 0.5 sec. I was wondering if this was still the case, or that in the latest version of Swift, there might be a callback function in the AVAudioEngine that gets called at a regular frequency? do { audioRecorder = try AVAudioRecorder(url: audioFilename!, settings: settings) audioRecorder.delegate = self audioRecorder.record() recordButton.setTitle("Tap to Stop", for: .normal) } catch { finishRecording(success: false) } }
Posted
by SergioDCQ.
Last updated
.
Post not yet marked as solved
4 Replies
1.1k Views
Hi all, I'm using AVAudioEngine to play multiple nodes at various times (like GarageBand for example). So far I managed to play the various files at the right time using this code : DispatchQueue.global(qos: .background).async {             AudioManager.instance.audioEngine.attach(AudioManager.instance.mixer)             AudioManager.instance.audioEngine.connect(AudioManager.instance.mixer, to: AudioManager.instance.audioEngine.outputNode, format: nil)           // !important - start the engine *before* setting up the player nodes           try! AudioManager.instance.audioEngine.start()                      for audioFile in data {             // Create and attach the audioPlayer node for this file             let audioPlayer = AVAudioPlayerNode()             AudioManager.instance.audioEngine.attach(audioPlayer)             AudioManager.instance.nodes.append(audioPlayer)             // Notice the output is the mixer in this case             AudioManager.instance.audioEngine.connect(audioPlayer, to: AudioManager.instance.mixer, format: nil)             let fileUrl = audioFile.audio.fileUrl             if let file : AVAudioFile = try? AVAudioFile.init(forReading: fileUrl) {                 let time = audioFile.start > 0 ? AudioManager.instance.secondsToAVAudioTime(hostTime: mach_absolute_time(), time: Double(audioFile.start / CGFloat.secondsToPoints)) : nil                 audioPlayer.scheduleFile(file, at: time, completionHandler: nil)                 audioPlayer.play(at: time)             }           }         } Basically my data object contains struct that have a reference to an audio fileURL and the startPosition at which it should begin. That works great. now I would like to export all these tracks mixed into a single file and save it to the Document's directory of the user. How can I achieve this? Thanks for your help.
Posted
by radada.
Last updated
.
Post not yet marked as solved
0 Replies
779 Views
I'm trying to change device of the inputNode of AVAudioEngine. To do so, I'm calling setDeviceID on its auAudioUnit. Although this call doesn't fail, something wrong happens to the output busses. When I ask for its format, it shows a 0Hz and 0 channels format. It makes the app crash when I try to connect the node to the mainMixerNode. Can anyone explain what's wrong with this code? avEngine = AVAudioEngine() print(avEngine.inputNode.auAudioUnit.inputBusses[0].format) // <AVAudioFormat 0x1404b06e0: 2 ch, 44100 Hz, Float32, non-inter> print(avEngine.inputNode.auAudioUnit.outputBusses[0].format) // <AVAudioFormat 0x1404b0a60: 2 ch, 44100 Hz, Float32, inter> // Now, let's change a device from headphone's mic to built-in mic. try! avEngine.inputNode.auAudioUnit.setDeviceID(inputDevice.deviceID) print(avEngine.inputNode.auAudioUnit.inputBusses[0].format) // <AVAudioFormat 0x1404add50: 2 ch, 44100 Hz, Float32, non-inter> print(avEngine.inputNode.auAudioUnit.outputBusses[0].format) // <AVAudioFormat 0x1404adff0: 0 ch, 0 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved> // !!! // Interestingly, 'inputNode' shows a different format than `auAudioUnit` print(avEngine.inputNode.inputFormat(forBus: 0)) // <AVAudioFormat 0x1404af480: 1 ch, 44100 Hz, Float32> print(avEngine.inputNode.outputFormat(forBus: 0)) // <AVAudioFormat 0x1404ade30: 1 ch, 44100 Hz, Float32> Edit: Further debugging revels another puzzling thing. avEngine.inputNode.auAudioUnit == avEngine.outputNode.auAudioUnit // this is true ?! inputNode and outputNode share the same AUAudioUnit. And its deviceID is by default set to the speakers. It's so confusing to me...why would inpudeNode's device be a speaker?
Posted
by smialek.
Last updated
.
Post not yet marked as solved
0 Replies
198 Views
I have a simple audio graph, input, effect audioUnit, mixer and out. When I include my effect audioUnit, I lose my audio output. I believe the graph shows the problem, I just don't know how to interpret it. ________ GraphDescription ________ AVAudioEngineGraph 0x7fd2b7c08c90: initialized = 1, running = 1, number of nodes = 6 ******** output chain ******** node 0x6000016b3f00 {'auou' 'rioc' 'appl'}, 'I' inputs = 1 (bus0, en1) <- (bus0) 0x6000016b7580, {'aumx' 'mcmx' 'appl'}, [ 2 ch, 44100 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved] node 0x6000016b7580 {'aumx' 'mcmx' 'appl'}, 'I' inputs = 1 (bus0, en1) <- (bus0) 0x600001680d80, {'aufx' 'vpio' 0x00000000}, [ 2 ch, 44100 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved] outputs = 1 (bus0, en1) -> (bus0) 0x6000016b3f00, {'auou' 'rioc' 'appl'}, [ 2 ch, 44100 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved] node 0x600001680d80 {'aufx' 'vpio' 0x00000000}, 'I' inputs = 1 (bus0, en1) <- (bus0) 0x600000abb330, {'aufc' 'conv' 'appl'}, [ 2 ch, 44100 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved] outputs = 1 (bus0, en1) -> (bus0) 0x6000016b7580, {'aumx' 'mcmx' 'appl'}, [ 2 ch, 44100 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved] node 0x600000abb330 {'aufc' 'conv' 'appl'}, 'I' outputs = 1 (bus0, en1) -> (bus0) 0x600001680d80, {'aufx' 'vpio' 0x00000000}, [ 2 ch, 44100 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved] ******** input chain ******** node 0x600001681100 {'auou' 'rioc' 'appl'}, 'I' ******** other nodes ******** node 0x600001685080 {'aumx' 'mcmx' 'appl'}, 'U' I'm guessing these 0's are telling me something: node 0x600001680d80 {'aufx' 'vpio' 0x00000000}, 'I' inputs = 1 Any assistance is really appreciated. I encountered this, thought it would be simple to diagnose and fix, but I find myself spinning my wheels.
Posted
by FredTr.
Last updated
.
Post not yet marked as solved
0 Replies
432 Views
I have an AVMutableAudioMix and use MTAudioProcessingTap to process the audio data.But After I pass the buffer to AVAudioEngine and to render it with renderOffline,the audio has no any effects...How can I do it? Any idea? Here is the code for MTAudioProcessingTapProcessCallback var callback = MTAudioProcessingTapCallbacks(version: kMTAudioProcessingTapCallbacksVersion_0, clientInfo:UnsafeMutableRawPointer(Unmanaged.passUnretained(self.engine).toOpaque()), init: tapInit, finalize: tapFinalize, prepare: tapPrepare, unprepare: tapUnprepare) { tap, numberFrames, flags, bufferListInOut, numberFramesOut, flagsOut in                       guard MTAudioProcessingTapGetSourceAudio(tap, numberFrames, bufferListInOut, flagsOut, nil, numberFramesOut) == noErr else {         preconditionFailure()       }       let storage = MTAudioProcessingTapGetStorage(tap)       let engine = Unmanaged<Engine>.fromOpaque(storage).takeUnretainedValue()       // render the audio with effect       engine.render(bufferPtr: bufferListInOut,numberOfFrames: numberFrames)     } And here is the Engine code class Engine {   let engine = AVAudioEngine()       let player = AVAudioPlayerNode()   let pitchEffect = AVAudioUnitTimePitch()   let reverbEffect = AVAudioUnitReverb()   let rateEffect = AVAudioUnitVarispeed()   let volumeEffect = AVAudioUnitEQ()   let format = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: 44100, channels: 2, interleaved: false)!   init() {     engine.attach(player)     engine.attach(pitchEffect)     engine.attach(reverbEffect)     engine.attach(rateEffect)     engine.attach(volumeEffect)           engine.connect(player, to: pitchEffect, format: format)     engine.connect(pitchEffect, to: reverbEffect, format: format)     engine.connect(reverbEffect, to: rateEffect, format: format)     engine.connect(rateEffect, to: volumeEffect, format: format)     engine.connect(volumeEffect, to: engine.mainMixerNode, format: format)           try! engine.enableManualRenderingMode(.offline, format: format, maximumFrameCount: 4096)           reverbEffect.loadFactoryPreset(AVAudioUnitReverbPreset.largeRoom2)     reverbEffect.wetDryMix = 100     pitchEffect.pitch = 2100           try! engine.start()     player.play()   }       func render(bufferPtr:UnsafeMutablePointer<AudioBufferList>,numberOfFrames:CMItemCount) {     let buffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: 4096)!     buffer.frameLength = AVAudioFrameCount(numberOfFrames)     buffer.mutableAudioBufferList.pointee = bufferPtr.pointee     self.player.scheduleBuffer(buffer) {       try! self.engine.renderOffline(AVAudioFrameCount(numberOfFrames), to: buffer)     }   } }
Posted
by luckysmg.
Last updated
.
Post not yet marked as solved
1 Replies
593 Views
I have a kind of trivial request, I need to seek sound playback. The problem is I don't have a local sound file, I got a pointer to the sound instead (as well as other params), from (my internal) native lib. There is a method that I use in order to convert UnsafeRawPointer to AVAudioPCMBuffer ... var byteCount: Int32 = 0       var buffer: UnsafeMutableRawPointer?               defer {         buffer?.deallocate()         buffer = nil       }               if audioReader?.getAudioByteData(byteCount: &byteCount, data: &buffer) ?? false && buffer != nil {         let audioFormat = AVAudioFormat(standardFormatWithSampleRate: Double(audioSampleRate), channels: AVAudioChannelCount(audioChannels))!         if let pcmBuf = AVAudioPCMBuffer(pcmFormat: audioFormat, frameCapacity: AVAudioFrameCount(byteCount)) {           let monoChannel = pcmBuf.floatChannelData![0]           pcmFloatData = [Float](repeating: 0.0, count: Int(byteCount))                       //>>> Convert UnsafeMutableRawPointer to [Int8] array           let int16Ptr: UnsafeMutablePointer<Int16> = buffer!.bindMemory(to: Int16.self, capacity: Int(byteCount))           let int16Buffer: UnsafeBufferPointer<Int16> = UnsafeBufferPointer(start: int16Ptr, count: Int(byteCount) / MemoryLayout<Int16>.size)           let int16Arr: [Int16] = Array(int16Buffer)           //<<<                       // Int16 ranges from -32768 to 32767 -- we want to convert and scale these to Float values between -1.0 and 1.0           var scale = Float(Int16.max) + 1.0           vDSP_vflt16(int16Arr, 1, &pcmFloatData[0], 1, vDSP_Length(int16Arr.count)) // Int16 to Float           vDSP_vsdiv(pcmFloatData, 1, &scale, &pcmFloatData[0], 1, vDSP_Length(int16Arr.count)) // divide by scale              memcpy(monoChannel, pcmFloatData, MemoryLayout<Float>.size * Int(int16Arr.count))           pcmBuf.frameLength = UInt32(int16Arr.count)                       usagePlayer.setupAudioEngine(with: audioFormat)           audioClip = pcmBuf         }       } ... So, at the end of the method, you can see this line audioClip = pcmBuf, where the prepared pcmBuf is pass to the local variable. Then, what I need is just to start it like this ... /*player is AVAudioPlayerNode*/ player.scheduleBuffer(buf, at: nil, options: .loops) player.play() ... and that is it, now I can hear the sound. But let's say I need to seek forward in 10 sec, in order to do this I need stop() the player node, set a new AVAudioPCMBuffer but this time with an offset of 10 sec. The problem is there is no method for offset, nor from the player node side nor AVAudioPCMBuffer. For example, if I would work with a file (instead of a buffer), I could use this method ... player.scheduleSegment(         file,         startingFrame: seekFrame,         frameCount: frameCount,         at: nil       ) ... There at least you can use startingFrame: seekFrame and frameCount: frameCount params. But in my case I don't use file I use buffer and the problem is that - there is no such params for buffer implementation. Looks like I can't implement seek logic if I use AVAudioPCMBuffer. What am I doing wrong?
Posted Last updated
.
Post not yet marked as solved
5 Replies
1k Views
Is the format description AVSpeechSynthesizer for the speech buffer is correct? When I attempt to convert it, I get back noise from two different conversion methods. I am seeking to convert the speech buffer provided by the AVSpeechSynthesizer "func write(_ utterance: AVSpeechUtterance..." method. The goal is to convert the sample type, change the sample rate and change from mono to stereo buffer. I later manipulate the buffer data and pass it through AVAudioengine. For testing purposes, I have kept the sample rate to the original 22050.0 What have I tried? I have a method that I've been using for years named "resampleBuffer" that does this. When I apply it to the speech buffer, I get back noise. When I attempt to manually convert format and to stereo with "convertSpeechBufferToFloatStereo", I am getting back clipped output. I tested flipping the samples, addressing the Big Endian, Signed Integer but that didn't work. The speech buffer description is inBuffer description: <AVAudioFormat 0x6000012862b0: 1 ch, 22050 Hz, 'lpcm' (0x0000000E) 32-bit big-endian signed integer> import Cocoa import AVFoundation class SpeakerTest: NSObject, AVSpeechSynthesizerDelegate { let synth = AVSpeechSynthesizer() override init() { super.init() } func resampleBuffer( inSource: AVAudioPCMBuffer, newSampleRate: Double) -> AVAudioPCMBuffer? { // resample and convert mono to stereo var error : NSError? let kChannelStereo = AVAudioChannelCount(2) let convertRate = newSampleRate / inSource.format.sampleRate let outFrameCount = AVAudioFrameCount(Double(inSource.frameLength) * convertRate) let outFormat = AVAudioFormat(standardFormatWithSampleRate: newSampleRate, channels: kChannelStereo)! let avConverter = AVAudioConverter(from: inSource.format, to: outFormat ) let outBuffer = AVAudioPCMBuffer(pcmFormat: outFormat, frameCapacity: outFrameCount)! let inputBlock : AVAudioConverterInputBlock = { (inNumPackets, outStatus) -> AVAudioBuffer? in outStatus.pointee = AVAudioConverterInputStatus.haveData // very important, must have let audioBuffer : AVAudioBuffer = inSource return audioBuffer } avConverter?.sampleRateConverterAlgorithm = AVSampleRateConverterAlgorithm_Mastering avConverter?.sampleRateConverterQuality = .max if let converter = avConverter { let status = converter.convert(to: outBuffer, error: &error, withInputFrom: inputBlock) // print("\(status): \(status.rawValue)") if ((status != .haveData) || (error != nil)) { print("\(status): \(status.rawValue), error: \(String(describing: error))") return nil // conversion error } } else { return nil // converter not created } // print("success!") return outBuffer } func writeToFile(_ stringToSpeak: String, speaker: String) { var output : AVAudioFile? let utterance = AVSpeechUtterance(string: stringToSpeak) let desktop = "~/Desktop" let fileName = "Utterance_Test.caf" // not in sandbox var tempPath = desktop + "/" + fileName tempPath = (tempPath as NSString).expandingTildeInPath let usingSampleRate = 22050.0 // 44100.0 let outSettings = [ AVFormatIDKey : kAudioFormatLinearPCM, // kAudioFormatAppleLossless AVSampleRateKey : usingSampleRate, AVNumberOfChannelsKey : 2, AVEncoderAudioQualityKey : AVAudioQuality.max.rawValue ] as [String : Any] // temporarily ignore the speaker and use the default voice let curLangCode = AVSpeechSynthesisVoice.currentLanguageCode() utterance.voice = AVSpeechSynthesisVoice(language: curLangCode) // utterance.volume = 1.0 print("Int32.max: \(Int32.max), Int32.min: \(Int32.min)") synth.write(utterance) { (buffer: AVAudioBuffer) in guard let pcmBuffer = buffer as? AVAudioPCMBuffer else { fatalError("unknown buffer type: \(buffer)") } if ( pcmBuffer.frameLength == 0 ) { // done } else { // append buffer to file var outBuffer : AVAudioPCMBuffer outBuffer = self.resampleBuffer( inSource: pcmBuffer, newSampleRate: usingSampleRate)! // doesnt work // outBuffer = self.convertSpeechBufferToFloatStereo( pcmBuffer ) // doesnt work // outBuffer = pcmBuffer // original format does work if ( output == nil ) { //var bufferSettings = utterance.voice?.audioFileSettings // Audio files cannot be non-interleaved. var outSettings = outBuffer.format.settings outSettings["AVLinearPCMIsNonInterleaved"] = false let inFormat = pcmBuffer.format print("inBuffer description: \(inFormat.description)") print("inBuffer settings: \(inFormat.settings)") print("inBuffer format: \(inFormat.formatDescription)") print("outBuffer settings: \(outSettings)\n") print("outBuffer format: \(outBuffer.format.formatDescription)") output = try! AVAudioFile( forWriting: URL(fileURLWithPath: tempPath),settings: outSettings) } try! output?.write(from: outBuffer) print("done") } } } } class ViewController: NSViewController { let speechDelivery = SpeakerTest() override func viewDidLoad() { super.viewDidLoad() let targetSpeaker = "Allison" var sentenceToSpeak = "" for indx in 1...10 { sentenceToSpeak += "This is sentence number \(indx). [[slnc 3000]] \n" } speechDelivery.writeToFile(sentenceToSpeak, speaker: targetSpeaker) } } Three test can be performed. The only one that works is to directly write the buffer to disk Is this really "32-bit big-endian signed integer"? Am I addressing this correctly or is this a bug? I'm on macOS 11.4
Posted
by MisterE.
Last updated
.
Post not yet marked as solved
0 Replies
321 Views
After discovering that the upgrade of ios15, the topSpeaking method using AVSpeechSynthesizer did not correctly trigger the speedSynthesizer (:didCancel:), but rather triggered the speedSynthesizer method ('didFinish:') which led to some of my business errors and solved
Posted
by yuxutao.
Last updated
.