Can anyone please guide me on how to use SFCustomLanguageModelData.CustomPronunciation?
I am following the below example from WWDC23
https://wwdcnotes.com/documentation/wwdcnotes/wwdc23-10101-customize-ondevice-speech-recognition/
While using this kind of custom pronunciations we need X-SAMPA string of the specific word.
There are tools available on the web to do the same
Word to IPA: https://openl.io/
IPA to X-SAMPA: https://tools.lgm.cl/xsampa.html
But these tools does not seem to produce the same kind of X-SAMPA strings used in demo, example - "Winawer" is converted to "w I n aU @r".
While using any online tools it gives - "/wI"nA:w@r/".
Explore the integration of media technologies within your app. Discuss working with audio, video, camera, and other media functionalities.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Context:
I am currently developing an app using the Push-to-Talk (PTT) framework. I have reviewed both the PTT framework documentation and the CallKit demo project to better understand how to properly manage audio session activation and AVAudioEngine setup.
I am not activating the audio session manually. The audio session configuration is handled in the incomingPushResult or didBeginTransmitting callbacks from the PTChannelManagerDelegate.
I am using a single AVAudioEngine instance for both input and playback. The engine is started in the didActivate callback from the PTChannelManagerDelegate. When I receive a push in full duplex mode, I set the active participant to the user who is speaking.
Issue
When I attempt to talk while the other participant is already speaking, my input tap on the input node takes a few seconds to return valid PCM audio data. Initially, it returns an empty PCM audio block.
Details:
The audio session is already active and configured with .playAndRecord.
The input tap is already installed when the engine is started.
When I talk from a neutral state (no one is speaking), the system plays the standard "microphone activation" tone, which covers this initial delay. However, this does not happen when I am already receiving audio.
Assumptions / Current Setup
Because the audio session is active in play and record, I assumed that microphone input would be available immediately, even while receiving audio.
However, there seems to be a delay before valid input is delivered to the tap, only occurring when switching from a receive state to simultaneously talking.
Questions
Is this expected behavior when using the PTT framework in full duplex mode with a shared AVAudioEngine?
Should I be restarting or reconfiguring the engine or audio session when beginning to talk while receiving audio?
Is there a recommended pattern for managing microphone readiness in this scenario to avoid the initial empty PCM buffer?
Would using separate engines for input and output improve responsiveness?
I would like to confirm the correct approach to handling simultaneous talk and receive in full duplex mode using PTT framework and AVAudioEngine. Specifically, I need guidance on ensuring the microphone is ready to capture audio immediately without the delay seen in my current implementation.
Relevant Code Snippets
Engine Setup
func setup() {
let input = audioEngine.inputNode
do {
try input.setVoiceProcessingEnabled(true)
} catch {
print("Could not enable voice processing \(error)")
return
}
input.isVoiceProcessingAGCEnabled = false
let output = audioEngine.outputNode
let mainMixer = audioEngine.mainMixerNode
audioEngine.connect(pttPlayerNode, to: mainMixer, format: outputFormat)
audioEngine.connect(beepNode, to: mainMixer, format: outputFormat)
audioEngine.connect(mainMixer, to: output, format: outputFormat)
// Initialize converters
converter = AVAudioConverter(from: inputFormat, to: outputFormat)!
f32ToInt16Converter = AVAudioConverter(from: outputFormat, to: inputFormat)!
audioEngine.prepare()
}
Input Tap Installation
func installTap() {
guard AudioHandler.shared.checkMicrophonePermission() else {
print("Microphone not granted for recording")
return
}
guard !isInputTapped else {
print("[AudioEngine] Input is already tapped!")
return
}
let input = audioEngine.inputNode
let microphoneFormat = input.inputFormat(forBus: 0)
let microphoneDownsampler = AVAudioConverter(from: microphoneFormat, to: outputFormat)!
let desiredFormat = outputFormat
let inputFramesNeeded = AVAudioFrameCount((Double(OpusCodec.DECODED_PACKET_NUM_SAMPLES) * microphoneFormat.sampleRate) / desiredFormat.sampleRate)
input.installTap(onBus: 0, bufferSize: inputFramesNeeded, format: input.inputFormat(forBus: 0)) { [weak self] buffer, when in
guard let self = self else { return }
// Output buffer: 1920 frames at 16kHz
guard let outputBuffer = AVAudioPCMBuffer(pcmFormat: desiredFormat, frameCapacity: AVAudioFrameCount(OpusCodec.DECODED_PACKET_NUM_SAMPLES)) else { return }
outputBuffer.frameLength = outputBuffer.frameCapacity
let inputBlock: AVAudioConverterInputBlock = { inNumPackets, outStatus in
outStatus.pointee = .haveData
return buffer
}
var error: NSError?
let converterResult = microphoneDownsampler.convert(to: outputBuffer, error: &error, withInputFrom: inputBlock)
if converterResult != .haveData {
DebugLogger.shared.print("Downsample error \(converterResult)")
} else {
self.handleDownsampledBuffer(outputBuffer)
}
}
isInputTapped = true
}
Hi!
I am creating a aumi AUv3 extension and I am trying to achieve simultaneous connections to multiple other avaudionodes. I would like to know it is possible to route the midi to different outputs inside the render process in the AUv3.
I am using connectMIDI(_:to:format:eventListBlock:) to connect the output of the AUv3 to multiple AvAudioNodes. However, when I send midi out of the AUv3, it gets sent to all the AudioNodes connected to it. I can't seem to find any documentation on how to route the midi only to one of the connected nodes. Is this possible?
AVCaptureSession's startRunning method is thread blocking and seems to be slow. What is this method doing behind the scenes?
For context: I'm working on Simulator Camera support and I have a 'fake' AVCaptureDevice that might be causing this. My hypothesis is that AVCaptureSession tries to connect to the device and waits for a notification to be posted back.
I'd love to find a way to let my fake device message AVCaptureSession that it's connected.
I'm using AVAudioEngine to play AVAudioPCMBuffers. I'd like to synchronize some events with the playback. For example if the audio's frame position is >= some point && less than some point trigger some code.
So I'm looking at - (void)installTapOnBus:(AVAudioNodeBus)bus bufferSize:(AVAudioFrameCount)bufferSize format:(AVAudioFormat * __nullable)format block:(AVAudioNodeTapBlock)tapBlock;
Now I have frame positions calculated (predetermined before audio is scheduled I already made all necessary computations) . So I just need to fire code at certain points during playback:
[playerNode installTapOnBus:bus
bufferSize:bufferSize
format:format
block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {
//Inspect current audio here and fire...
}];
[playerNode scheduleBuffer:fullbuffer
atTime:startTime
options:0
completionCallbackType:AVAudioPlayerNodeCompletionDataPlayedBack
completionHandler:^(AVAudioPlayerNodeCompletionCallbackType callbackType)
{
// some code is here, not important to this question.
}];
The problem I'm having is figuring out at what point in full buffer I'm at within the tap block. The tap block passes chunks (not the full audio buffer). I tried using the when parameter of the block to calculate the frame position relative to the entire audio but have be unsuccessful so far. I'm assuming the when parameter is relative to the buffer passed in the tap block (not my entire audio buffer I scheduled).
Not installing a tap and just using a timer before scheduling my fullBuffer has given me good results but I'd rather avoid using a timer if possible and use sample time.
Topic:
Media Technologies
SubTopic:
Audio
Tags:
AVAudioNode
AVAudioSession
AVAudioEngine
AVFoundation
Please Update Andorid MusicKit,the version 1.1.2 will complied fail。the error msg:•SDKUriHandlerActivity>. Apps targeting Android 12 and higher are required to specify an explicit value for android:exported when the corres
When shooting with an iPhone 15 or later, it’s possible to capture HEIC or JPEG images that include gain map information conforming to the ISO 21496-1 standard. However, during image format transcoding, the HEIC codec is able to preserve the ISO 21496-1 gain map. But when converting from HEIC to JPEG, the gain map is transformed into the Apple Gain Map format instead. Is there any solution to this issue?
I'm working on an app where a user needs to select a video from their Photos library, and I need to get the original, unmodified HEVC (H.265) data stream to preserve its encoding.
The Problem
I have confirmed that my source videos are HEVC. I can record a new video with my iPhone 15 Pro Max camera set to "High Efficiency," export the "Unmodified Original" from Photos on my Mac, and verify that the codec is MPEG-H Part2/HEVC (H.265).
However, when I select that exact same video in my app using PHPickerViewController, the itemProvider does not list public.hevc as an available type identifier. This forces me to fall back to a generic movie type, which results in the system providing me with a transcoded H.264 version of the video.
Here is the debug output from my app after selecting a known HEVC video:
⚠️ 'public.hevc' not found. Falling back to generic movie type (likely H.264).
What I've Tried
My code explicitly checks for the public.hevc identifier in the registeredTypeIdentifiers array. Since it's not found, my HEVC-specific logic is never triggered.
Here is a minimal version of my PHPickerViewControllerDelegate implementation:
import UniformTypeIdentifiers
// ... inside the Coordinator class ...
func picker(_ picker: PHPickerViewController, didFinishPicking results: [PHPickerResult]) {
picker.dismiss(animated: true)
guard let result = results.first else { return }
let itemProvider = result.itemProvider
let hevcIdentifier = "public.hevc"
let identifiers = itemProvider.registeredTypeIdentifiers
print("Available formats from itemProvider: \(identifiers)")
if identifiers.contains(hevcIdentifier) {
print("✅ HEVC format found, requesting raw data...")
itemProvider.loadDataRepresentation(forTypeIdentifier: hevcIdentifier) { (data, error) in
// ... process H.265 data ...
}
} else {
print("⚠️ 'public.hevc' not found. Falling back to generic movie type (likely H.264).")
itemProvider.loadFileRepresentation(forTypeIdentifier: UTType.movie.identifier) { url, error in
// ... process H.264 fallback ...
}
}
}
My Environment
Device: iPhone 15 Pro Max
iOS Version: iOS 18.5
Xcode Version: 16.2
My Questions
Are there specific conditions (e.g., the video being HDR/Dolby Vision, Cinematic, or stored in iCloud) under which PHPickerViewController's itemProvider would intentionally not offer the public.hevc type identifier, even for an HEVC video?
What is the definitive, recommended API sequence to guarantee that I receive the original, unmodified data stream for a video asset, ensuring that no transcoding to H.264 occurs during the process?
Any insight into why public.hevc might be missing from the registeredTypeIdentifiers for a known HEVC asset would be greatly appreciated. Thank you.
Hi,
I’m building a PPG-based heart rate feature where the user places their finger over the rear telephoto camera. On iPhone 16 Pro Max, I'm explicitly selecting the telephoto lens like this:
videoDevice = AVCaptureDevice.default(.builtInTelephotoCamera, for: .video, position: .back)
And trying to lock it:
if #available(iOS 15.0, *),
device.activePrimaryConstituentDeviceSwitchingBehavior != .unsupported {
try? device.lockForConfiguration()
device.setPrimaryConstituentDeviceSwitchingBehavior(.locked, restrictedSwitchingBehaviorConditions: [])
device.unlockForConfiguration()
}
I also lock everything else to prevent dynamic changes:
try device.lockForConfiguration()
device.focusMode = .locked
device.exposureMode = .locked
device.whiteBalanceMode = .locked
device.videoZoomFactor = 1.0
device.automaticallyEnablesLowLightBoostWhenAvailable = false
device.automaticallyAdjustsVideoHDREnabled = false
device.unlockForConfiguration()
Despite this, the camera still switches to another lens, especially under different lighting, even though the user’s finger fully covers the lens.
Questions:
How can I completely prevent lens switching in this scenario?
Would using videoZoomFactor = 3.0 or 5.0 better enforce use of the telephoto lens?
Thanks!
Gal
I am using ImageCaptureCore to access and (sometimes) download media files from a digital camera connected via USB (either to a Mac oder to an iOS device with Apple lightning to USB3 camera adapter).
This works very well in general, but what puzzles me is that for the ICCameraFile's EXIF creation/modification date, it always returns nil.
I can access the ICCameraItem's creation/modification date instead, which, as it says in the documentation "usually [is] the same as its EXIF creation date", but, well not always. Generally the EXIF tags are more reliable than the file dates, especially the modification date is easily messed up when copying files.
As for my cameras, they show the stable EXIF date on their display, so for consistency I would prefer to use the same in my app. Is there a way to get it without downloading the image from the camera and reading it from the file?
Does it possibly depend on the brand of camera (I mostly have Canon) whether ICCameraFile.exifCreationDate is ever populated or always nil?
For a thumb drive with DCIM folder, which is treated just like a camera, it is also nil.
Topic:
Media Technologies
SubTopic:
Photos & Camera
In the latest production release of our iOS app (deployed via the App Store), we’ve observed a significant increase in AVCaptureSessionWasInterrupted notifications where the interruption reason has a rawValue of 4. The session does not automatically recover, even after returning from background or deleting/reinstalling the app. An employee ran into this and was able to get a recording. We see the below error when attempting to take photos.
"Error Domain=AVFoundationErrorDomain Code=-11803 \"Cannot Record\" UserInfo={AVErrorRecordingFailureDomainKey=3, NSLocalizedDescription=Cannot Record, NSLocalizedRecoverySuggestion=Try recording again.}",
}
This interruption causes the camera preview to remain black, and any attempt to capture an image results in a failure with the following error:
Some questions from our team:
What common system conditions or foreground app behaviors can cause .videoDeviceNotAvailableWithMultipleForegroundApps (reason 4) to become persistent? Our teams under is under the impression the interruption reason 4 is mostly associated with iPad and PiP, but neither of these are true in the logs we see.
Is manual recovery of the session required?
Is there a recommended strategy to detect that the session is unrecoverable and gracefully notify the user or rebuild the session?
Is there an instrument(s) in XCode you would recommend when trying to evaluate the increase in reason 4?
Best,
Ben
I take that MusicKit JS is built with TypeScript, based on the attributions in the script: https://js-cdn.music.apple.com/musickit/v3/musickit.js
In the script it points to https://js-cdn.music.apple.com/musickit/v1/acknowledgements.txt – I assume this should be the v3 URL for the v3 version? It returns the same content nonetheless.
This contains attributions for TypeScript.
Currently there's a third-party effort with DefinitelyTyped, which publishes the NPM package @types/musickit-js. The latest supported version available is v1.
However, there is no version compatible with v3.
This makes it hard to use MusicKit JS v3 in a TypeScript project.
Please publish the types, ideally on the CDN along with the musickit.js file. Also consider publishing an officially Apple supported DefinitelyTyped package, or help to maintain the existing @types/musickit-js to make consuming this even easier.
Hi, my daughter was given an Apple Watch due to her grandfather's passing. It is not GPS/cellular, and we cannot connect it to her Apple ID because of this. It doesn't seem right to leave her logged in to his Apple ID, but we are currently out of options. Is there a workaround to this? When I try to set her up from my phone, it tells me that the watch must have GPS/cellular to set it up. Why? Am I missing something?
Topic:
Media Technologies
SubTopic:
General
I'm working on an application that uses the iPhone camera for scientific purposes - and, as a result would like to receive sensor data in as unprocessed format as possible.
I'm using AVCapturePhotoOutput to take Bayer RAW stills and receiving data in kCVPixelFormatType_14Bayer_RGGB format.
However, I'm puzzled as to the content of the bits. I simply demosaic the image by taking each 2x2 square:
RG
GB
and use R, (G+G)/2, B to get 16-bit RGB values - and this indeed works.
However, I am puzzled as to the values we are getting as they seem to be approximately in the range 2048 - 16383. The top value is understandable - the maximum that you can fit in 14-bits (as implied by the pixel format type).
However we don't seem to be able to get lower than ~2048 no matter how black/dark we make the sensor.
I'm aware that the sensor is probably not 14-bits (we're using the iPhone 16e camera) and that maybe this is to do with the way the sensor data is packaged.
The Advances in iOS Photography video (https://developer.apple.com/videos/play/wwdc2016/501/) describes it as "10-bit sensor RAW packaged in 14 bits per pixel instead of eight."
Is there any documentation describing what is going on here? It's vital for our use that we get as close to the raw camera sensor light readings as possible, so any pointers as to the mapping (e.g. decompanding?) being used would be extremely useful.
Many thanks in advance for your help.
I’ve tried both AVCaptureVideoDataOutputSampleBufferDelegate (captureOutput) and AVCaptureDataOutputSynchronizerDelegate (dataOutputSynchronizer), but the number of depth frames and saved timestamps is significantly lower than the number of frames in the .mp4 file written by AVAssetWriter.
In my code, I save:
Timestamps for each frame to a metadata file
Depth frames to a binary file
Video to an .mp4 file
If I record a 4-second video at 30fps, the .mp4 file correctly plays for 4 seconds, but the number of stored timestamps and depth frames is much lower—around 70 frames instead of the expected 120.
Does anyone know why this mismatch happens?
func dataOutputSynchronizer(_ synchronizer: AVCaptureDataOutputSynchronizer,
didOutput synchronizedDataCollection: AVCaptureSynchronizedDataCollection) {
// Read all outputs
guard let syncedDepthData: AVCaptureSynchronizedDepthData =
synchronizedDataCollection.synchronizedData(for: depthDataOutput) as? AVCaptureSynchronizedDepthData,
let syncedVideoData: AVCaptureSynchronizedSampleBufferData =
synchronizedDataCollection.synchronizedData(for: videoDataOutput) as? AVCaptureSynchronizedSampleBufferData else {
// only work on synced pairs
return
}
if syncedDepthData.depthDataWasDropped || syncedVideoData.sampleBufferWasDropped {
return
}
let depthData = syncedDepthData.depthData
let depthPixelBuffer = depthData.depthDataMap
let sampleBuffer = syncedVideoData.sampleBuffer
guard let videoPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer),
let formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer) else {
return
}
addToPreviewStream?(CIImage(cvPixelBuffer: videoPixelBuffer))
if !canWrite() {
return
}
// Extract the presentation timestamp (PTS) from the sample buffer
let timestamp = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
//sessionAtSourceTime is the first buffer we will write to the file
if self.sessionAtSourceTime == nil {
//Make sure we don't start recording until the buffer reaches the correct time (buffer is always behind, this will fix the difference in time)
guard sampleBuffer.presentationTimeStamp >= self.recordFromTime! else { return }
self.sessionAtSourceTime = sampleBuffer.presentationTimeStamp
self.videoWriter!.startSession(atSourceTime: sampleBuffer.presentationTimeStamp)
}
if self.videoWriterInput!.isReadyForMoreMediaData {
self.videoWriterInput!.append(sampleBuffer)
self.videoTimestamps.append(
Timestamp(
frame: videoTimestamps.count,
value: timestamp.value,
timescale: timestamp.timescale
)
)
let ddm = depthData.depthDataMap
depthCapture.addDepthData(pixelBuffer: ddm, timestamp: timestamp)
}
}
We are using a VoiceProcessingIO audio unit in our VoIP application on Mac. In certain scenarios, the AudioComponentInstanceNew call blocks for up to five seconds (at least two). We are using the following code to initialize the audio unit:
OSStatus status;
AudioComponentDescription desc;
AudioComponent inputComponent;
desc.componentType = kAudioUnitType_Output;
desc.componentSubType = kAudioUnitSubType_VoiceProcessingIO;
desc.componentFlags = 0;
desc.componentFlagsMask = 0;
desc.componentManufacturer = kAudioUnitManufacturer_Apple;
inputComponent = AudioComponentFindNext(NULL, &desc);
status = AudioComponentInstanceNew(inputComponent, &unit);
We are having the issue with current MacOS versions on a host of different Macs (x86 and x64 alike). It takes two to three seconds until AudioComponentInstanceNew returns.
We also see the following errors in the log multiple times:
AUVPAggregate.cpp:2560 AggInpStreamsChanged wait failed
and those right after (which I don't know if they matter to this issue):
KeystrokeSuppressorCore.cpp:44 ERROR: KeystrokeSuppressor initialization was unsuccessful. Invalid or no plist was provided. AU will be bypassed. vpStrategyManager.mm:486 Error code 2003332927 reported at GetPropertyInfo
I'm capturing video stream from GoPro camera (I demux UDP MPEG-TS packets) and create CMSampleBuffers from them, this works fine when I display them using CMSampleBufferLayer.
However when I dump them to disk using AVAssetWriter and then playback it with AVPlayer, AVPlayer has problems with scrubbing, it also cannot render previous frames, it needs to go back to key frames. Also thumbnails generated with AVAssetImageGenerator are mostly distorted and green, even though I set the requestedTimeToleranceAfter longer than the key frames frequency.
When I re-encode saved video once again with AVAssetExportSession and play it back then I can scrub the video just fine.
Is it because re-transcoding adds additional metadata to enable generating frames when rewinding the video and scrubbing?
If so is there a way to achieve it with AVAssetWriter without much time penalty? I need the dump/save operation to be very fast.
I also considered the following: Instead of de-muxing video and creating CMSampleBuffers, maybe I could directly dump the stream to disk and somehow add moov atoms with timing information. Would this approach work? If so where I can find information how to do it?
Thank you!
I'm using an AVCaptureSession to send video and audio samples to an AVAssetWriter. When I play back the resultant video, sometimes there is a significant lag between the audio compared with the video, so they're just not in sync. But sometimes they are, with the same code.
If I look at the very first presentation time stamps of the buffers being sent to the delegate, via
func captureOutput(_: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection)
I see something like this:
Adding audio samples for pts time 227711.0855328798,
Adding video samples for pts time 227710.778785374
That is, the clock for audio vs video is behind: the first audio sample I receive is at 11.08 something, while the video video sample is earlier in time, at 10.778 something. The times are the presentation time stamps of the buffer, and the outputPresentationTimeStamp is the exact same number.
It feels like "video" vs the "audio" clock are just mismatched.
This doesn't always happen: sometimes they're synced. Sometimes they're not.
Any ideas? The device I'm recording is a webcam, on iPadOS, connected via the usb-c port.
I tried adding watermarks to the recorded video. Appending sample buffers using AVAssetWriterInput's append method fails and when I inspect the AVAssetWriter's error property, I get the following:
Error Domain=AVFoundation Error Domain Code=-11800 "This operation cannot be completed" UserInfo={NSLocalizedFailureReason=An unknown error occurred (-12780), NSLocalizedDDescription=This operation cannot be completed, NSUnderlyingError=0x302399a70 {Error Domain=NSOSStatusErrorDomain Code=-12780 "(null)"}}
As far as I can tell -11800 indicates an AVErrorUknown, however I have not been able to find information about the -12780 error code, which as far as I can tell is undocumented.
Thanks!
Here is the code
Is it possible to stream video from a UVC (USB Video Class) camera on an iPhone 15? If so, are there any specific hardware or software requirements to enable this functionality?