Apple provides a function to create TTS voice as a file in TTS.
(AVSpeechUtterance/AVSpeechSynthesizer)
Or, if the user records the video of TTS playback and uses that video
I wonder what the scope of use is if I use this TTS voice to make YouTube, TikTok, or commercial videos.
Is it impossible to use it commercially at all?
Can I use it commercially with the source indicated?
Can I use it commercially without a separate source indication?
Is there a difference in commercial use license between Siri voices and regular TTS voices?
AVFoundation
RSS for tagWork with audiovisual assets, control device cameras, process audio, and configure system audio interactions using AVFoundation.
Posts under AVFoundation tag
200 Posts
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hi!
I'd like to share a technical sample app, SKRenderer Demo.
This app demonstrates:
Setting up SKRenderer
Recording SpriteKit scenes to image sequences
Recording SpriteKit scenes to video using IOSurface and AVFoundation
Applying Core Image filters
Exploring SpriteKit's simulation timing and physics determinism
Use Case
Record SpriteKit simulations as video or images for sharing and creating content.
I explored several approaches, including the excellent view.texture(from:crop:) for live recording from SKView. The SKRenderer approach assumes recording happens asynchronously: you capture user interactions as commands during live interaction, then replay those commands through an offline render pass to generate the final output.
I hope this helps others working on replay systems, simulation capture, or SpriteKit projects in general!
Environment
Device: iPhone 15 Pro
iOS: iOS 18.0
Framework: AVFoundation
App type: Custom camera app using AVCaptureSession + AVCaptureVideoPreviewLayer
I’m seeing an intermittent but frequent issue where the camera preview layer briefly flashes empty after certain interruptions, even though the capture session reports itself as running and no errors are emitted.
This happens most often after:
Locking and unlocking the device
Switching cameras (back ↔ front)
The issue is not 100% reproducible, but occurs often enough to be noticeable in normal usage.
What happens
The preview layer briefly flashes as empty (sometimes just a “micro-frame”)
Duration: typically ~0.5–2 seconds before frames resume
session.isRunning == true throughout
No crash, no runtime error, no interruption end failure
Focus/exposure restore correctly once frames resume
Visually it looks like the preview layer loses frames temporarily, even though the session appears healthy.
Repro
Intermittent but frequent after:
Lock → unlock device
Switching camera (front/back)
Timing-dependent and non-deterministic
Happens multiple times per session, but not every time
Key observation
AVCaptureSession.isRunning == true does not guarantee that frames are actually flowing.
To verify this, I added an AVCaptureVideoDataOutput temporarily:
During the blank period, no sample buffers are delivered
Frames resume after ~1–2s without any explicit restart
Session state remains “running” the entire time
What I’ve tried (did NOT fix it)
Adding delays before/after startRunning() (0.1–0.5s)
Calling startRunning() on different queues
Restarting the session in AVCaptureSessionInterruptionEnded
Verifying session.connections (all show isActive == true)
Rebuilding inputs/outputs during interruption recovery
Ensuring startRunning() is never called between beginConfiguration() / commitConfiguration()
(Hit the expected runtime warning when attempted)
None of the above removed the brief blank preview.
Workaround (works visually but expensive)
This visually fixes the issue, but:
Energy impact jumps from Low → High in Xcode Energy Gauge
AVCaptureVideoDataOutput processes 30–60 FPS continuously
The gap only lasts ~1–2s, but toggling the delegate on/off cleanly is difficult
Overall CPU and energy cost is not acceptable for production
Additional notes
CPU usage is already relatively high even without the workaround (this app is camera-heavy by nature)
With the workaround enabled, energy impact becomes noticeably worse
The issue feels like a timing/state desync between session state and actual frame delivery, not a UI issue
Questions
Is this a known behavior where AVCaptureSession.isRunning == true but frames are temporarily unavailable after interruptions?
Is there a recommended way to detect actual frame flow resumption (not just session state)?
Should the AVCaptureVideoPreviewLayer.connection (isActive / isEnabled) be explicitly checked or reset after interruptions?
Is there a lightweight, energy-efficient way to bridge this short “no frames” gap without using AVCaptureVideoDataOutput?
Is rebuilding the entire session the only reliable solution here, or is there a better pattern Apple recommends?
Hello,
I have an iOS camera app that captures exposure brackets and performs custom HDR processing.
On iOS 26, I’m observing a visual difference between:
a single photo captured at –2 EV, and the –2 EV frame from an exposure bracket (–2 / 0 / +2 EV).
On iOS 26:
The single –2 EV image looks natural and consistent.
The –2 EV image from the bracket appears clamped / distorted, most noticeably in high dynamic range scenes (highlight compression and loss of detail).
On iOS 18, both approaches produce visually identical and correct –2 EV images.
The issue only appears for bracketed captures on iOS 26.
Attachments (examples)
iOS 26
Single capture –2 EV (JPEG):
/Users/danilobudimir/Downloads/ios26SingleImage/JPEG image-4006-8B77-51-0.jpeg
Single capture –2 EV — Capture report (dumped settings):
/Users/danilobudimir/Downloads/ios26SingleImage/UnderExposureDebug_CaptureReport_2026-01-09T15-59-20Z.md
Bracket capture –2 EV frame (JPEG):
/Users/danilobudimir/Downloads/bracket_iOS26/JPEG image-45CE-9793-A5-0.jpeg
Bracket capture — Capture report (dumped settings):
/Users/danilobudimir/Downloads/bracket_iOS26/UnderExposureDebug_CaptureReport_2026-01-09T15-55-42Z.md
iOS 18
Single capture –2 EV (JPEG):
/Users/danilobudimir/Downloads/ios18SingleImage/JPEG image-47FD-AF73-28-0.jpeg
Single capture –2 EV — Capture report:
/Users/danilobudimir/Downloads/ios18SingleImage/UnderExposureDebug_CaptureReport_2026-01-09T16-25-27Z.md
Bracket capture — –2 EV frame (JPEG):
/Users/danilobudimir/Downloads/bracket_iOS18/JPEG image-4A4C-9E93-46-0.jpeg
Bracket capture — Capture report:
/Users/danilobudimir/Downloads/bracket_iOS18/UnderExposureDebug_CaptureReport_2026-01-09T16-27-23Z.md
Question
Is there any new behavior in iOS 26 AVFoundation related to:
AVCapturePhotoBracketSettings,
tone mapping / HDR preprocessing,
or internal image processing applied specifically to bracketed frames?
Is there a new flag, format requirement or opt-out mechanism required to preserve linear underexposed frames in exposure brackets?
Looking to implement to UI to tell the user to clean their lens in our app.
Implemented the KVO for the cameraLensSmudgeDetectionStatus but I'm having issues reliably triggering it in, both in our app and the main camera app. Tried to get inventive by putting tupperware over the lens, but I think the model driving this or the LiDAR sensor might be smart enough to detect there is something close to the lens.
Is there any way to trigger this change in a similar way we can trigger thermal changes in debug?
Thanks.
Hello,
I am currently considering developing a Full Space app that enables a shared visionOS experience with nearby users.
Intended Features
A Mixed Full Space app in which dozens of 3D models are placed in the space.
These 3D models may play embedded animations when tapped, be programmatically moved or rotated, or be controlled via Reality Composer Pro timelines.
The app also includes audio, spatial audio, videos with audio, and videos without audio, which are rendered as VideoTextures on planes and played back in the space.
Some media elements play automatically, while others are triggered by user interaction.
However, it is unclear whether AVPlaybackCoordinator supports shared playback across multiple types of media, such as:
audio only
spatial audio
video without audio
video with audio
I am also unsure whether there are alternative or recommended approaches for synchronizing playback in this scenario.
Questions
Is it technically possible to implement the experience described above using visionOS?
Are there any important implementation considerations or limitations that should be taken into account?
For example, when two participants experience the app simultaneously, how is the content positioned for each participant?
Is the spatial placement of content shared across participants, or is it positioned relative to each participant’s viewpoint?
For nearby participants, is it necessary to register a spatial Persona? My understanding is that spatial Personas are not visible for nearby users during the experience; is this correct?
When experiencing SharePlay with nearby users, is it possible to share the experience without registering the other participant’s contact information?
I have watched the following session, but I was unable to fully understand the feasibility of the above use case or the concrete implementation details:
https://developer.apple.com/videos/play/wwdc2025/318/
Thank you.
Hi,
I understand that AVPlayer/AVFoundation doesn’t natively play MPEG-DASH manifests (.mpd) today, while HLS is supported and widely documented by Apple.
I’m not asking for roadmap commitments, but I’d like to understand whether there is any publicly documented rationale for not supporting DASH/MPD in AVFoundation (e.g., technical constraints, platform integration, DRM ecosystem, power/performance considerations, etc.).
Questions:
Is there any Apple statement / documentation explaining why DASH (MPD) isn’t supported in AVFoundation?
Is Apple’s recommended approach still “provide HLS for Apple clients” (potentially sharing CMAF segments and generating separate manifests)?
If there’s no public rationale, is filing Feedback Assistant the best channel for requesting MPD playback support?
Thanks!
I made a CMIOExtension (a virtual camera) which generates its own output, for use in our in-house software testing. I wanted to make a video source with 29.97, 30, 59.94 and 60fps output.
To this end, I created a CMIOExtensionDeviceSource which creates a CMIOExtensionDevice with one CMIOExtensionStreamSource with various stream formats contained in [CMIOExtensionStreamFormat], including one with both maxFrameDuration and minFrameDuration = CMTimeMake(value: 1000, timescale: 30000) and another with both maxFrameDuration and minFrameDuration = CMTimeMake(value: 1001, timescale: 30000)
I've held off on the creation of the 59.94/60fps source for now until this problem is resolved.
my virtual camera works, it produces a signal, but when I examine its associated AVCaptureDevice in the debugger, I find
(lldb) po self.captureDevice?.formats[0].videoSupportedFrameRateRanges[0].maxFrameDuration
▿ Optional<CMTime>
▿ some : CMTime
- value : 1000000
- timescale : 30000000
▿ flags : CMTimeFlags
- rawValue : 1
- epoch : 0
I get the same value, 1000000/30000000, or exactly 30fps, for all the formats of my AVCaptureDevice.
Is there something I'm doing wrong, or do CMIOExtensionDevices always round the frame rates?
I can't force CoreMediaIO to produce frames at exactly my desired frame interval, but I'd like to ensure that the average frame rate is my desired rate. How can I do that? Frame emission is governed by a repeating DispatchSourceTimer with a repeat time specified in nanoseconds with the TimerFlags set to 'strict'.
Hello,
As far as I know and in all of my testing there is no way for a user or a developer to change the frame rate of the video output on iPadOS. If you connect an iPad via a USB Hub or a USB to HDMI Adaptor and then connect it to an external monitor it will output at 59.94fps.
I have a video app where a user monitors live video at 25fps and 30fps, they often output to an external display and there are times when the external display will stutter due to the mismatch in frame rate, ie. using 25fps and outputting at 59.94fps.
I thought it was impossible to change the video output frame rate, then in V3.1 of the Blackmagic Camera App I saw an interesting change in their release notes:
‘Support for HDMI Monitoring at Sensor Rate and Resolution’
This means there is some way to modify it, not sure if this is done via a Private API that Apple has allowed Blackmagic to use. If so, how can we access this or is there a way to enable this that is undocumented?
Thanks!
Hello, I am developing a custom player SDK using AVPlayer to support HLS and LL-HLS live streaming. I have some questions about the internal logic of AVPlayer regarding ABR, as this information is not explicitly covered in the documentation.
ABR Switching Logic: Does AVPlayer trigger bitrate switching primarily based on stall occurrences (buffer starvation)? I am curious if the switching logic is reactive to stalls or if it proactively switches to prevent them based on throughput estimation.
Developer Controls for ABR: To influence or control the ABR selection, are preferredPeakBitRate and preferredForwardBufferDuration the only properties available to developers? Are there any other recommended APIs to assist with ABR decisions?
Thank you for your help.
Hi everyone, does anybody have any resources I could check out regarding the 48->12mp binning behavior on supported sensors? I know the 48mp sensor on iPhone can automatically bin pixels for better low light performance. But not sure how to reliably make this happen in practice.
On iPhone 14 Pro+ with a 48MP sensor, I want the best of both worlds for ProRAW:
∙ Bright light: 48MP full resolution
∙ Low light: 12MP pixel-binned for better noise
`photoOutput.maxPhotoDimensions = CMVideoDimensions(width: 8064, height: 6048)
let settings = AVCapturePhotoSettings(rawPixelFormatType: proRawFormat, processedFormat: [...])
settings.photoQualityPrioritization = .quality
// NOT setting settings.maxPhotoDimensions — always get 12MP`
When I omit maxPhotoDimensions, iOS always returns 12MP regardless of lighting. When I set it to 48MP, I always get 48MP.
Is there an API to let iOS automatically choose the optimal resolution based on conditions, or should I detect low light myself (via device.iso / exposureDuration) and set maxPhotoDimensions accordingly?
Any help or direction would be much appreciated!
Hi everyone,
I'm developing a camera application that requires precise, predictable control over the focus system. I'm encountering unexpected behavior with face-driven autofocus in continuous autofocus mode.
Issue:
When using AVCaptureDevice.FocusMode.continuousAutoFocus, the system continues to prioritize faces for focus even after attempting to disable face-driven autofocus with:
device.automaticallyAdjustsFaceDrivenAutoFocusEnabled = false
device.isFaceDrivenAutoFocusEnabled = false
Observations:
The behavior is inconsistent across different scenes.
In well-lit/properly exposed scenes: focus persistently locks onto faces, ignoring my configuration.
.
In underexposed scenes: the intended focus behavior is more consistently respected.
Has anyone tried to make an ILPD based AIME file?
When I try the resulting AIME switches to USDZ Mesh instead of saving the ILPD Data.
Hi everyone,
I’m seeing recurring internal AVFoundation camera logs on iOS 26.2 and I’m trying to understand whether this is expected behavior or a regression in the capture pipeline.
These logs appear shortly after starting an AVCaptureSession, while video frames are being delivered, and also when the camera is stopped or the capture session is torn down.
<<<< FigXPCUtilities >>>> signalled err=-17281 at <>:302
<<<< FigCaptureSourceRemote >>>> Fig assert: "err == 0 " at bail (FigCaptureSourceRemote.m:569) - (err=-17281)
Even in this clean, minimal setup, the same logs appear on iOS 26.2
The exact same logic did not produce these logs on iOS 18.x.
To rule out issues caused by my own code, GPT created a minimal SwiftUI example from scratch.
My primary interest is to perform real-time processing on the video frames delivered by the camera (via AVCaptureVideoDataOutput), for tasks such as analysis, computer vision, or custom frame handling, while simultaneously displaying the live preview.
Thanks in advance for any insight.
Example Code
I am developing an iOS camera app that can record video directly to external storage connected to an iPhone.
To detect whether an external USB storage device is connected and to obtain its URL, I am considering using AVExternalStorageDeviceDiscoverySession.
However, when checking support using AVExternalStorageDeviceDiscoverySession.isSupported, I observe that it returns true only on Pro model iPhones, and false on non-Pro models in my environment.
I have reviewed Apple’s official documentation, but I could not find any clear description of the supported devices or requirements (for example, whether this API is limited to Pro models or requires specific hardware capabilities).
I would appreciate any information regarding the following points:
①The actual requirements for AVExternalStorageDeviceDiscoverySession to be supported
Device limitations (Pro vs non-Pro models)
Hardware requirements (USB controller, external recording capability, etc.)
iOS version dependencies
②Whether support for non-Pro models is planned in the future
Tested environments
iPhone 16 Pro (iOS 18.7.1) → isSupported == true
iPhone 16e (iOS 26.2) → isSupported == false
iPhone 17 (iOS 26.2) → isSupported == false
iPhone Air (iOS 26.2) → isSupported == false
If anyone has observed similar behavior or has official information from Apple regarding this API, I would greatly appreciate your insights.
I am developing an iOS camera app that can record video directly to external storage connected to an iPhone.
To detect whether an external USB storage device is connected and to obtain its URL, I am considering using AVExternalStorageDeviceDiscoverySession.
However, when checking support using AVExternalStorageDeviceDiscoverySession.isSupported, I observe that it returns true only on Pro model iPhones, and false on non-Pro models in my environment.
I have reviewed Apple’s official documentation, but I could not find any clear description of the supported devices or requirements (for example, whether this API is limited to Pro models or requires specific hardware capabilities).
I would appreciate any information regarding the following points:
●The actual requirements for AVExternalStorageDeviceDiscoverySession to be supported
Device limitations (Pro vs non-Pro models)
Hardware requirements (USB controller, external recording capability, etc.)
iOS version dependencies
●Whether support for non-Pro models is planned in the future
Tested environments
iPhone 16 Pro (iOS 18.7.1) → isSupported == true
iPhone 16e (iOS 26.2) → isSupported == false
iPhone 17 (iOS 26.2) → isSupported == false
iPhone Air (iOS 26.2) → isSupported == false
If anyone has observed similar behavior or has official information from Apple regarding this API, I would greatly appreciate your insights.
I'm receiving output from avcapturesession and capturing an image using Vision, but the image is output in landscape orientation instead of portrait.
Even when I set the orientation to up in ciimage, cgimage, and uiimage, the image is still output in landscape orientation.
On iPhones 16 and below, the image is output in portrait orientation.
But on iPhones 17 and above, the image is output in landscape orientation.
Please help.
Hi everyone,
I'm running into an issue with AVAudioRecorder when handling interruptions such as phone calls or alarms.
Problem:
When the app is recording audio and an interruption occurs:
I handle the interruption with audioRecorder?.pause() inside AVAudioSession.interruptionNotification (on .began).
On .ended, I check for .shouldResume and call audioRecorder?.record() again.
The recorder resumes successfully, but only the audio recorded after the interruption is saved. The audio recorded before the interruption is lost, even though I'm using the same file URL and not recreating the recorder.
Repro:
Start a recording with AVAudioRecorder
Simulate a system interruption (e.g., incoming call)
Resume recording after the interruption
Stop and inspect the output audio file
Expected: Full audio (before and after interruption) should be saved.
Actual: Only the audio after interruption is saved; the earlier part is missing
Notes:
According to the documentation, calling .record() after .pause() should resume recording into the same file.
I confirmed that the file URL does not change, and I do not recreate the recorder instance.
No error is thrown by the system during this process.
This behavior happens consistently when the app is interrupted and resumed.
Question:
Is this a known issue? Is there a recommended workaround for preserving the full recording when interruptions happen?
Thanks in advance!
Facing an issue with audio playback using AVPlayerViewController in iOS application. We are using the native player to play recorded audio files.
When the AVPlayerViewController appears, the native user interface is displayed correctly, including the playback controls and the volume slider.
However, when the user interacts with the volume slider
The slider UI moves and responds to touch events.
The actual audio output volume does not change. The audio continues playing at the initial volume level regardless of the slider position.
We initialize the player and present it modally using the following code:
AVPlayerViewController *avController = [[AVPlayerViewController alloc] init];
avController.player = [AVPlayer playerWithURL:videoURL];
// Setting initial volume
avController.player.volume = 1.0f;
avController.modalPresentationStyle = UIModalPresentationOverFullScreen;
avController.allowsPictureInPicturePlayback = NO;
// Present the controller
[self presentViewController:avController animated:YES completion:nil];
I'm encountering errors while using AVAudioEngine with voice processing enabled (setVoiceProcessingEnabled(true)) in scenarios where the input and output audio devices are not the same. This issue arises specifically with mismatched devices, preventing the application from functioning as expected.
Works: Paired devices (e.g., MacBook Pro mic → MacBook Pro speakers) Fails: Mismatched devices (e.g., AirPods mic → MacBook Pro speakers)
When using paired input and output devices:
The setup works as expected. Example: MacBook Pro microphone → MacBook Pro speakers. When using mismatched devices:
AVAudioEngine setup fails during aggregate device construction. Example: AirPods microphone → MacBook Pro speakers. Error logs indicate a channel count mismatch.
Here are the partial logs. Due to the content limit, I cannot post the entire logs.
AUVPAggregate.cpp:1000 client-side input and output formats do not match (err=-10875)
AUVPAggregate.cpp:1036 err=-10875
AVAEInternal.h:109 [AVAudioEngineGraph.mm:1344:Initialize: (err = PerformCommand(*outputNode, kAUInitialize, NULL, 0)): error -10875
AggregateDevice.mm:329 Failed expectation of constructed aggregate (312): mInput.streamChannelCounts == inputStreamChannelCounts
AggregateDevice.mm:331 Failed expectation of constructed aggregate (312): mInput.totalChannelCount == std::accumulate(inputStreamChannelCounts.begin(), inputStreamChannelCounts.end(), 0U)
AggregateDevice.mm:182 error fetching default pair
AggregateDevice.mm:329 Failed expectation of constructed aggregate (336): mInput.streamChannelCounts == inputStreamChannelCounts
AggregateDevice.mm:331 Failed expectation of constructed aggregate (336): mInput.totalChannelCount == std::accumulate(inputStreamChannelCounts.begin(), inputStreamChannelCounts.end(), 0U)
AUHAL.cpp:1782 ca_verify_noerr: [AudioDeviceSetProperty(mDeviceID, NULL, 0, isInput, kAudioDevicePropertyIOProcStreamUsage, theSize, theStreamUsage), 560227702]
AudioHardware-mac-imp.cpp:3484 AudioDeviceSetProperty: no device with given ID
AUHAL.cpp:1782 ca_verify_noerr: [AudioDeviceSetProperty(mDeviceID, NULL, 0, isInput, kAudioDevicePropertyIOProcStreamUsage, theSize, theStreamUsage), 560227702]
AggregateDevice.mm:182 error fetching default pair
AggregateDevice.mm:329 Failed expectation of constructed aggregate (348): mInput.streamChannelCounts == inputStreamChannelCounts
AggregateDevice.mm:331 Failed expectation of constructed aggregate (348): mInput.totalChannelCount == std::accumulate(inputStreamChannelCounts.begin(), inputStreamChannelCounts.end(), 0U)
Is it possible to use voice processing with different input/output devices?
If yes, are there any specific configurations required to handle mismatched devices? How can we resolve channel count mismatch errors during aggregate device construction?
Are there settings or API adjustments to enforce compatibility between input/output devices? Are there any workarounds or alternative approaches to achieve voice processing functionality with mismatched devices?
For instance, can we force an intermediate channel configuration or downmix input/output formats?