Hello,
I’m experiencing a severe performance degradation when running CoreML models on a live AVFoundation video feed compared to offline or synthetic inference. This happens across multiple models I've converted (including SCI, RTMPose, and RTMW) and affects multiple devices.
The Environment
OS: macOS 26.3, iOS 26.3, iPadOS 26.3
Hardware: Mac14,6 (M2 Max), iPad Pro 11 M1, iPhone 13 mini
Compute Units: cpuAndNeuralEngine
The Numbers
When testing my SCI_output_image_int8.mlpackage model, the inference timings are drastically different:
Synthetic/Offline Inference: ~1.34 ms
Live Camera Inference: ~15.96 ms
Preprocessing is completely ruled out as the bottleneck. My profiling shows total preprocessing (nearest-neighbor resize + feature provider creation) takes only ~0.4 ms in camera mode. Furthermore, no frames are being dropped.
What I've Tried
I am building a latency-critical app and have implemented almost every recommended optimization to try and fix this, but the camera-feed penalty remains:
Matched the AVFoundation camera output format exactly to the model input (640x480 at 30/60fps).
Used IOSurface-backed pixel buffers for everything (camera output, synthetic buffer, and resize buffer).
Enabled outputBackings.
Loaded the model once and reused it for all predictions.
Configured MLModelConfiguration with reshapeFrequency = .frequent and specializationStrategy = .fastPrediction.
Wrapped inference in ProcessInfo.processInfo.beginActivity(options: .latencyCritical, reason: "CoreML_Inference").
Set DispatchQueue to qos: .userInteractive.
Disabled the idle timer and enabled iOS Game Mode.
Exported models using coremltools 9.0 (deployment target iOS 26) with ImageType inputs/outputs and INT8 quantization.
Reproduction
To completely rule out UI or rendering overhead, I wrote a standalone Swift CLI script that isolates the AVFoundation and CoreML pipeline. The script clearly demonstrates the ~15ms latency on live camera frames versus the ~1ms latency on synthetic buffers.
(I have attached camera_coreml_benchmark.swift and coreml model (very light low light enghancement model) to this repo on github https://github.com/pzoltowski/apple-coreml-camera-latency-repro).
My Question:
Is this massive overhead expected behavior for AVFoundation + Core ML on live feeds, or is this a framework/runtime bug? If expected, what is the Apple-recommended pattern to bypass this camera-only inference slowdown?
One think found interesting when running in debug model was faster (not as fast as in performance benchmark but faster than 16ms. Also somehow if I did some dummy calculation on on different DispatchQueue also seems like model got slightly faster. So maybe its related to ANE Power State issues (Jitter/SoC Wake) and going to fast to sleep and taking a long time to wakeup? Doing dummy calculation in background thought is probably not a solution.
Thanks in advance for any insights!
AVFoundation
RSS for tagWork with audiovisual assets, control device cameras, process audio, and configure system audio interactions using AVFoundation.
Posts under AVFoundation tag
200 Posts
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
In iOS 26, AVSpeechSynthesizer read Mandarin into Cantonese pronunciation.
No matter how you set the language, and change the settings of my phone system, it doesn't work.
let utterance = AVSpeechUtterance(string: "你好啊")
//let voice = AVSpeechSynthesisVoice(language: "zh-CN") // not work
let voice = AVSpeechSynthesisVoice(language: "zh-Hans") // not work too
utterance.voice = voice
et synth = AVSpeechSynthesizer()
synth.speak(utterance)
Topic:
Media Technologies
SubTopic:
General
Tags:
Speech
Internationalization
Localization
AVFoundation
Please include the line below in follow-up emails for this request.
Case-ID: 11089799
When using AVSpeechUtterance and setting it to play in Mandarin, if Siri is set to Cantonese on iOS 18, it will be played in Cantonese. There is no such issue on iOS 17 and 16.
1.let utterance = AVSpeechUtterance(string: textView.text)
let voice = AVSpeechSynthesisVoice(language: "zh-CN")
utterance.voice = voice
2.In the phone settings, Siri is set to Cantonese
My code that streams buffers into AVAudioPlayerNode is stuttering when the buffer is finished and before the next one is played.
while engine.isRunning {
let framesToCopy = min(buffer.frameLength - framePosition, Self.BufferSize)
let srcRaw = UnsafeRawPointer(srcPtr)
let playbackBuffer = AVAudioPCMBuffer(pcmFormat: buffer.format, frameCapacity: Self.BufferSize)!
let playbackPtr = playbackBuffer.floatChannelData![0]
let destRaw = UnsafeMutableRawPointer(mutating: playbackPtr)
memcpy(destRaw, srcRaw, Int(framesToCopy) * MemoryLayout<Float>.stride)
srcPtr = srcPtr.advanced(by: Int(framesToCopy))
playbackBuffer.frameLength = framesToCopy
await player.scheduleBuffer(playbackBuffer,
at: nil,
options: [],
completionCallbackType: .dataRendered)
}
I've tried to schedule multiple buffers at once using a combination of both the synchronous and async versions of scheduleBuffer because I thought the delay might be but it still stutters and the data copied into the playbackBuffer matches the source buffer. I've tried all combinations of options and completionCallbackType but no luck.
I've tried increasing the buffer size but that just spaces out the stutters because the buffer is larger.
What am I missing about this API?
Hi everyone,
We’re encountering an issue where AudioQueueNewOutput blocks indefinitely and never returns, and we’re hoping to get some insight or confirmation if this is a known behavior/regression on newer iOS versions.
Issue Description
When triggering audio playback, we create an output AudioQueue using AudioQueueNewOutput.
On some devices, the call hangs inside AudioQueueNewOutput and never returns, with no OSStatus error and no subsequent logs.
This behavior is reproducible mainly on iOS 18.3.
Earlier iOS versions do not show this issue under the same code path.
if (audioDes)
{
mAudioDes.mSampleRate = audioDes->mSampleRate;
mAudioDes.mBitsPerChannel = audioDes->mBitsPerChannel;
mAudioDes.mChannelsPerFrame = audioDes->mChannelsPerFrame;
mAudioDes.mFormatID = audioDes->mFormatID;
mAudioDes.mFormatFlags = audioDes->mFormatFlags;
mAudioDes.mFramesPerPacket = audioDes->mFramesPerPacket;
mAudioDes.mBytesPerFrame = audioDes->mBytesPerFrame;
mAudioDes.mBytesPerPacket = audioDes->mBytesPerFrame;
mAudioDes.mReserved = 0;
}
// Create AudioQueue for output
OSStatus status = AudioQueueNewOutput(
&mAudioDes,
AQOutputCallback,
this,
NULL,
NULL,
0,
&audioQueue
);
code-block
The thread blocks inside AudioQueueNewOutput, and execution never reaches the next line.
Additional Notes / Observations
ASBD is confirmed to be valid
Standard PCM output
Sample rate, channels, bytes per frame/packet all consistent
Same ASBD works correctly on earlier iOS versions
AudioQueue is created on a background thread
Not on the main thread
Not inside the AudioQueue callback
On first creation, AVAudioSession may not yet be active
setCategory and setActive:YES may be called shortly before creating the AudioQueue
There may be a timing window where the session is still activating
Issue is reported mainly on iOS 18.3
Multiple user reports point to iOS 18.3 devices
Same code path works on iOS 17.x and earlier
No OSStatus error is returned — the call simply never returns.
Questions
Is it expected that AudioQueueNewOutput can block indefinitely while waiting for AVAudioSession / audio route / HAL readiness?
Have there been any behavior changes in iOS 18.3 regarding AudioQueue creation or AudioSession synchronization?
Is it unsafe to call AudioQueueNewOutput before AVAudioSession is fully active on recent iOS versions?
Are there recommended patterns (or delays / callbacks) to ensure AudioQueue creation does not hang?
Any insight or confirmation would be greatly appreciated.
Thanks in advance!
I read somewhere that the frames are returned in decode order instead of presentation order when using AVAssetReader. The documentation seems sparse on the subject. I have so far failed to find a video file where the frames are not returned in presentation order.
Can anyone confirm the frames are actually returned in decode order?
Hello,
I am currently developing a live streaming application using AVPlayer to play LL-HLS (Low-Latency HLS) content.
During our testing phase, we consistently encountered the following error in the logs:
CoreMediaErrorDomain Code=-15517
The challenge we are facing is that the error description is quite vague. It only provides cryptic messages such as "Key not found" or "No value information," which makes it extremely difficult to identify the root cause or perform a deep-dive analysis.
I have searched through the official Apple Developer documentation and technical notes, but I couldn’t find any specific reference to what Code -15517 signifies in the context of LL-HLS or CoreMedia.
Regarding this issue, I have the following questions:
What is the specific meaning of this error code (-15517)? Does it relate to missing tags in the HLS manifest, or is it an internal state issue within the AVPlayer stack?
Specifically, I would like to know if this is a critical error that disrupts playback, or if it is just a warning that can be safely ignored.
Is there any additional logging or debugging tool you would recommend to further investigate "Key not found" issues in LL-HLS?
Any insights or guidance from the community or Apple engineers would be greatly appreciated.
Thank you in advance for your help.
Hello everyone,
I’m looking for more detailed information regarding UVC (USB Video Class) over MFi within the Apple ecosystem and would appreciate some clarification.
I’m interested in developing (or interfacing with) an accessory that transmits video over USB using the UVC standard, and I’d like to better understand how this works within the MFi (Made for iPhone) program.
Here are my main questions:
1. Do iOS devices provide native support for UVC over USB-C or Lightning within the MFi framework?
2. Are there any specific firmware or authentication requirements when the accessory is MFi-certified?
3. Does UVC support depend solely on the hardware interface (USB-C vs Lightning), or are there additional software-level requirements?
4. Is there any official documentation outlining the recommended flow for implementing UVC-based video capture accessories on iOS?
From what I understand, USB-C iPads appear to offer more direct support for standard UVC devices, but it’s not entirely clear how this integrates with the MFi ecosystem with iOS, especially for commercial product development.
If anyone has gone through this process or can point me to relevant technical documentation, I would greatly appreciate the guidance.
Thank you!
I just bought a monitor S2725QC from Dell Technologies and isn't fully-integrated with MacOS even though it says on the website it is compatible with MacOS.
https://www.dell.com/en-us/shop/all-monitors/sac/monitors/all-monitors/macos-compatible?appliedRefinements=51765
The screen brightness and volume control buttons don't work with the monitors (I have two). What can I do in terms of writing code with Dell Monitor SDK and MacOS Frameworks/Technologies?
I want to:
Run ARKit on the main rear camera, and while it's running shoot high resolution pictures on the wide camera, without disturbing the AR tracking.
Is this possible?
I'm relatively new to Swift development (and native iOS development for that matter)
I've got an iOS app that uses the iPhone / iPad built in cameras, and am looking to make this more compatible with macOS.
Using the normal AVCaptureDevice.DiscoverySession I seem to get the iPhone Continuity Camera and the in-built MacBook Pro camera but I don't see other input devices that I see in QuickTime Player (for example) such as connected external cameras or Virtual Inputs provided by NDI Virtual Input and OBS.
Is there a way to see access these without a specific Mac build (as the rest of the functionality works great, and I'd rather not diverge the codebase too much as it's easier to update one app than two!
I have a question regarding the behavior of AVAudioSession.sharedInstance().outputVolume.
Observed behavior:
When the app is in the foreground, I read audioSession.outputVolume (for example, 0.1).
The app is then moved to the background.
While the app is in the background, the user changes the system volume using the hardware buttons (for example, to 0.5).
When the app returns to the foreground, audioSession.outputVolume still reports the previous value (0.1).
From my testing, outputVolume only seems to update when the system volume is changed while the app is in the foreground. Volume changes made while the app is in the background are not reflected when the app returns to the foreground.
Questions:
According to Apple’s documentation for AVAudioSession.outputVolume:
“The systemwide output volume set by the user.”
https://developer.apple.com/documentation/avfaudio/avaudiosession/outputvolume
However, based on our testing on iOS 18.6.2 and iOS 18.1, the observed behavior seems to differ from this description.
Questions:
The documentation states that outputVolume represents the system-wide volume set by the user. In our testing, the value does not reflect volume changes made while the app is in the background and only updates when the app is in the foreground.Is this the expected behavior of AVAudioSession.outputVolume?
Is there any other recommended way in Swift to retrieve the current system volume that reflects user changes made both while the app is in the foreground and while it is in the background?
Any clarification on the intended behavior or recommended handling would be greatly appreciated.
I’m building a teleprompter-style app that relies on Picture in Picture.
PiP starts correctly on device.
Everything works — until another app (e.g. TikTok / Instagram) starts active video recording.
When camera capture begins in the foreground app, iOS terminates my PiP session.
Some teleprompter apps appear to keep PiP active while recording in other apps, so I’m trying to understand the recommended architectural pattern for this scenario.
Is there a documented approach or best practice to keep PiP stable during third-party camera capture?
Looking specifically for guidance on the correct AVKit / AVAudioSession configuration for this use case.
According to the documentation (https://developer.apple.com/documentation/avfoundation/avcontentkeyrequest/originatingrecipient?changes=_3&language=objc), starting with ios 18.4, I can get AVContentKeyRecipient from AVContentKeyRequest. But when I try to get it, I get a crash. What could be the issue?
I want to note that I add the asset to the AVContentKeySession using the addContentKeyRecipient method (https://developer.apple.com/documentation/avfoundation/avcontentkeysession/addcontentkeyrecipient(_:)?changes=_3&language=objc).
VideoMaterial Black Screen on Vision Pro Device (Works in Simulator)
App Overview
App Name: Extn Browser
Bundle ID: ai.extn.browser
Purpose: A visionOS web browser that plays 360°/180° VR videos in an immersive sphere environment
Development Environment & SDK Versions
Component
Version
Xcode
26.2
Swift
6.2
visionOS Deployment Target
26.2
Swift Concurrency
MainActor isolation enabled
App is released in the TestFlight.
Frameworks Used
SwiftUI - UI framework
RealityKit - 3D rendering, MeshResource, ModelEntity, VideoMaterial
AVFoundation - AVPlayer, AVAudioSession
WebKit - WKWebView for browser functionality
Network - NWListener for local proxy server
Sphere Video Mechanism
The app creates an immersive 360° video experience using the following approach:
// 1. Create sphere mesh (10 meter radius for immersive viewing)
let mesh = MeshResource.generateSphere(radius: 10.0)
// 2. Create initial transparent material
var material = UnlitMaterial()
material.color = .init(tint: .clear)
// 3. Create entity and invert sphere (negative X scale)
let sphere = ModelEntity(mesh: mesh, materials: [material])
sphere.scale = SIMD3<Float>(-1, 1, 1) // Inverts normals for inside-out viewing
sphere.position = SIMD3<Float>(0, 1.5, 0) // Eye level
// 4. Create AVPlayer with video URL
let player = AVPlayer(url: videoURL)
// 5. Configure audio session for visionOS
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.playback, mode: .moviePlayback, options: [.mixWithOthers])
try audioSession.setActive(true)
// 6. Create VideoMaterial and apply to sphere
let videoMaterial = VideoMaterial(avPlayer: player)
if var modelComponent = sphere.components[ModelComponent.self] {
modelComponent.materials = [videoMaterial]
sphere.components.set(modelComponent)
}
// 7. Start playback
player.play()
ImmersiveSpace Configuration
// browserApp.swift
ImmersiveSpace(id: appModel.immersiveSpaceID) {
ImmersiveView()
.environment(appModel)
}
.immersionStyle(selection: .constant(.mixed), in: .mixed)
Entitlements
<!-- browser.entitlements -->
<key>com.apple.security.app-sandbox</key>
<true/>
<key>com.apple.security.network.client</key>
<true/>
<key>com.apple.security.network.server</key>
<true/>
Info.plist Network Configuration
<key>NSAppTransportSecurity</key>
<dict>
<key>NSAllowsArbitraryLoads</key>
<true/>
</dict>
The Issue
Behavior in Simulator: Video plays correctly on the inverted sphere surface - 360° video is visible and wraps around the user as expected.
Behavior on Physical Vision Pro: The sphere displays a black screen. No video content is visible, though the sphere entity itself is present.
Important: Not a DRM/Licensing Issue
This issue is NOT related to Digital Rights Management (DRM) or FairPlay. I have tested with:
Unlicensed raw MP4 video files (no DRM protection)
Self-hosted video content with no copy protection
Direct MP4 URLs from CDN without any licensing requirements
The same black screen behavior occurs with all unprotected video sources, ruling out DRM as the cause.
(Plain H.264 MP4, no DRM)
Screen Recording: Working in Simulator
The following screen recording demonstrates playing a 360° YouTube video in the immersive sphere on the visionOS Simulator:
https://cdn.commenda.kr/screen-001.mov
This confirms that the VideoMaterial and sphere rendering work correctly in the simulator, but the same setup shows a black screen on the physical Vision Pro device.
Observations
AVPlayer status reports .readyToPlay - The video appears to load successfully
VideoMaterial is created without errors - No exceptions thrown
Sphere entity renders - The geometry is visible (black surface)
Audio session is configured - No errors during audio session setup
Network requests succeed - The video URL is accessible from the device
Same result with local/unprotected content - DRM is not a factor
Console Logs (Device)
The logging shows:
Sphere created and added to scene
AVPlayer created with correct URL
VideoMaterial created and applied
Player status transitions to .readyToPlay
player.play() called successfully
Rate shows 1.0 (playing)
Despite all success indicators, the rendered output is black.
Questions for Apple
Are there known differences in VideoMaterial behavior between the visionOS Simulator and physical Vision Pro hardware?
Does VideoMaterial(avPlayer:) require specific video codec/format requirements that differ on device? (The test video is a standard H.264 MP4)
Is there a required Metal capability or GPU feature for VideoMaterial that may not be available in certain contexts on device?
Does the immersion style (.mixed) affect VideoMaterial rendering on hardware?
Are there additional entitlements required for video texture rendering in RealityKit on physical hardware?
Attempted Solutions
Configured AVAudioSession with .playback category
Added delay before player.play() to ensure material is applied
Verified sphere scale inversion (-1, 1, 1)
Tested multiple video URLs (including raw, unlicensed MP4 files)
Confirmed network connectivity on device
Ruled out DRM/FairPlay issues by testing unprotected content
Environment Details
Device: Apple Vision Pro
visionOS Version: 26.2
Xcode Version: 26.2
macOS Version: Darwin 25.2.0
Dear Support Team,
I am writing to seek technical assistance regarding a persistent issue with Dolby Vision exporting in DaVinci Resolve 20 on my iPad Pro 12.9-inch (2021, M1 chip) running iPadOS 26.0.1.
The Issue:
Despite correctly configuring the project for a Dolby Vision workflow and successfully completing the dynamic metadata analysis, the "Dolby Vision Profile" dropdown menu (and related embedding options) is completely missing from the Advanced Settings in the Deliver page.
My Current Configuration & Steps Taken:
Software Version: DaVinci Resolve Studio 20 (Studio features like Dolby Vision analysis are active and functional).
Project Settings: Color Science: DaVinci YRGB Color Managed.
Dolby Vision: Enabled (Version 4.0) with Mastering Display set to 1000 nits.
Output Color Space: Rec.2100 ST2084.
Color Page: Dynamic metadata analysis has been performed, and "Trim" controls are functional.
Export Settings:
Format: QuickTime / MP4.
Codec: H.265 (HEVC).
Encoding Profile: Main 10.
The Problem: Under "Advanced Settings," there is no option to select a Dolby Vision Profile (e.g., Profile 8.4) or to "Embed Dolby Vision Metadata."
Potential Variables:
System Version: I am currently running iPadOS 26.
Apple ID: My iPad is currently not logged into an Apple ID. I suspect this might be preventing the app from accessing certain system-level AVFoundation frameworks or Dolby DRM/licensing certificates required for metadata embedding.
Could you please clarify if the "Dolby Vision Profile" option is dependent on a signed-in Apple ID for hardware-level encoding authorization, or if this is a known compatibility issue with the current iPadOS 26 build?
I look forward to your guidance on how to resolve this.
Best regards,
INSOFT_Fred
I am unable to find any clearcut documentation on configuring AVCaptureSession pipeline to capture video with proResRAW codec type, which is 16 bit format. Is it supported only with AVCaptureMovieFileOutput or one can have AVCaptureVideoDataOutput emitting 16-bit sample buffers that can be vended to AVAssetWriter?
Hello,
I have implemented Low-Latency Frame Interpolation using the VTFrameProcessor framework, based on the sample code from https://developer.apple.com/kr/videos/play/wwdc2025/300. It is currently working well for both LIVE and VOD streams.
However, I have a few questions regarding the lifecycle management and synchronization of this feature:
1. Common Questions (Applicable to both Frame Interpolation & Super Resolution)
1.1 Dynamic Toggling
Do you recommend enabling/disabling these features dynamically during playback?
Or is it better practice to configure them only during the initial setup/preparation phase?
If dynamic toggling is supported, are there any recommended patterns for managing VTFrameProcessor session lifecycle (e.g., startSession / endSession timing)?
1.2 Synchronization Method
I am currently using CADisplayLink to fetch frames from AVPlayerItemVideoOutput and perform processing.
Is CADisplayLink the recommended approach for real-time frame acquisition with VTFrameProcessor?
If the feature needs to be toggled on/off during active playback, are there any concerns or alternative approaches you would recommend?
1.3 Supported Resolution/Quality Range
What are the minimum and maximum video resolutions supported for each feature?
Are there any aspect ratio restrictions (e.g., does it support 1:1 square videos)?
Is there a recommended resolution range for optimal performance and quality?
2. Frame Interpolation Specific Questions
2.1 LIVE Stream Support
Is Low-Latency Frame Interpolation suitable for LIVE streaming scenarios where latency is critical?
Are there any special considerations for LIVE vs VOD?
3. Super Resolution Specific Questions
3.1 Adaptive Bitrate (ABR) Stream Support
In ABR (HLS/DASH) streams, the video resolution can change dynamically during playback.
Is VTLowLatencySuperResolutionScaler compatible with ABR streams where resolution changes mid-playback?
If resolution changes occur, should I recreate the VTLowLatencySuperResolutionScalerConfiguration and restart the session, or does the API handle this automatically?
3.2 Small/Square Resolution Issue
I observed that 144x144 (1:1 square) videos fail with error:
"VTFrameProcessorErrorDomain Code=-19730: processWithSourceFrame within VCPFrameSuperResolutionProcessor failed"
However, 480x270 (16:9) videos work correctly.
minimumDimensions reports 96x96, but 144x144 still fails. Is there an undocumented restriction on aspect ratio or a practical minimum resolution?
3.3 Scale Factor Selection
supportedScaleFactors returns [2.0, 4.0] for most resolutions.
Is there a recommended scale factor for balancing quality and performance?
Are there scenarios where 4.0x should be avoided?
The documentation on this specific topic seems limited, so I would appreciate any insights or advice.
Thank you.
Topic:
Media Technologies
SubTopic:
Streaming
Tags:
VideoToolbox
HTTP Live Streaming
AVKit
AVFoundation
On iPhone 16 Pro Max (not tested other devices) there's a noticeable jump in the framing of the preview video when you record in the iOS AVCam Sample App. The same jump in camera framing can be observed by switching to the front facing camera and then back to the rear one.
It looks roughly consistent with switching between the 0.5x and 1x camera (but not quite a match for the same viewable area in the Camera app) - and it's only when it's initially loaded, once recording is started it retains the 'closer' image no matter how many times it's stopped/started thereafter.
I'm relatively new to Swift and haven't done anything with the camera before, so odd 'buggy' behaviour in the sample code isn't helping me understand it! :-)
Is there any way to fix this?
At which point in the image processing pipeline does iOS apply the white balance gains which can be set via AVCaptureDevice.setWhiteBalanceModeLocked(with:completionHandler:)?
Are those gains applied in the analog part of the camera pipeline, before the pixel voltage gets converted via the ADC to digital values? Or does the camera first convert the pixel voltages to digital values and then the gains are applied to the digital values?
Is this consistent across devices or can the behavior vary from device to device?