Hi all,
The use of setVoiceProcessingEnabled increases the channel count of my microphone audio from 1 to 5. This has downstream effects, because when I use AVAudioConverter to convert between PCM buffer types the output buffer contains only silence.
Here is a reproduction showing the channel growth from 1 to 5:
let avAudioEngine: AVAudioEngine = AVAudioEngine()
let inputNode = avAudioEngine.inputNode
print(inputNode.inputFormat(forBus: 0))
// Prints <AVAudioFormat 0x600002f7ada0: 1 ch, 48000 Hz, Float32>
do {
try inputNode.setVoiceProcessingEnabled(true)
} catch {
print("Could not enable voice processing \(error)")
return
}
print(inputNode.inputFormat(forBus: 0))
// Prints <AVAudioFormat 0x600002f7b020: 5 ch, 44100 Hz, Float32, deinterleaved>
If it helps, the reason I'm using setVoiceProcessingEnabled because I don't want the mic to pick up output from the speakers. Per wwdc
When enabled, extra signal processing is applied on the incoming audio, and any audio that is coming from the device is taken
Here is my conversion logic from the input PCM format (which in the case above is 5ch, 44.1kHZ, Float 32, deinterleaved) to the target format PCM16 with a single channel:
let outputFormat = AVAudioFormat(
commonFormat: .pcmFormatInt16,
sampleRate: inputPCMFormat.sampleRate,
channels: 1,
interleaved: false
)
guard let converter = AVAudioConverter(
from: inputPCMFormat,
to: outputFormat) else {
fatalError("Demonstration")
}
let newLength = AVAudioFrameCount(outputFormat.sampleRate * 2.0) guard let outputBuffer = AVAudioPCMBuffer(
pcmFormat: outputFormat,
frameCapacity: newLength) else {
fatalError("Demonstration")
}
outputBuffer.frameLength = newLength
try! converter.convert(to: outputBuffer, from: inputBuffer)
// Use the PCM16 outputBuffer
The outputBuffer contains only silence. But if I comment out inputNode.setVoiceProcessingEnabled(true) in the first snippet, the outputBuffer then plays exactly how I would expect it to.
So I have two questions:
Why does setVoiceProcessingEnabled increase the channel count to 5?
How should I convert the resulting format to a single channel PCM16 format?
Thank you,
Lou
AVFoundation
RSS for tagWork with audiovisual assets, control device cameras, process audio, and configure system audio interactions using AVFoundation.
Posts under AVFoundation tag
115 Posts
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
At present, I am using the avfoundation external device API to connect my iPad to a DSLR camera for data collection. On my end, I am using AVCapture Video Data Output to obtain raw data for processing and rendering. However, the pixelbuf returned from the system layer is incomplete, with only a portion cropped in the middle. But using the Mac API is normal. I would like to ask how to obtain the complete pixelbuf of the image on iPad
Hello,
I am trying to read video frames using AVAssetReaderTrackOutput. Here is the sample code:
//prepare assets
let asset = AVURLAsset(url: some_url)
let assetReader = try AVAssetReader(asset: asset)
guard let videoTrack = try await asset.loadTracks(withMediaCharacteristic: .visual).first else {
throw SomeErrorCode.error
}
var readerSettings: [String: Any] = [
kCVPixelBufferIOSurfacePropertiesKey as String: [String: String]()
]
//check if HDR video
var isHDRDetected: Bool = false
let hdrTracks = try await asset.loadTracks(withMediaCharacteristic: .containsHDRVideo)
if hdrTracks.count > 0 {
readerSettings[AVVideoAllowWideColorKey as String] = true
readerSettings[kCVPixelBufferPixelFormatTypeKey as String] =
kCVPixelFormatType_420YpCbCr10BiPlanarFullRange
isHDRDetected = true
}
//add output to assetReader
let output = AVAssetReaderTrackOutput(track: videoTrack, outputSettings: readerSettings)
guard assetReader.canAdd(output) else {
throw SomeErrorCode.error
}
assetReader.add(output)
guard assetReader.startReading() else {
throw SomeErrorCode.error
}
//add writer ouput settings
let videoOutputSettings: [String: Any] = [
AVVideoCodecKey: AVVideoCodecType.hevc,
AVVideoWidthKey: 1920,
AVVideoHeightKey: 1080,
]
let finalPath = "//some URL oath"
let assetWriter = try AVAssetWriter(outputURL: finalPath, fileType: AVFileType.mov)
guard assetWriter.canApply(outputSettings: videoOutputSettings, forMediaType: AVMediaType.video)
else {
throw SomeErrorCode.error
}
let assetWriterInput = AVAssetWriterInput(mediaType: .video, outputSettings: videoOutputSettings)
let sourcePixelAttributes: [String: Any] = [
kCVPixelBufferPixelFormatTypeKey as String: isHDRDetected
? kCVPixelFormatType_420YpCbCr10BiPlanarFullRange : kCVPixelFormatType_32ARGB,
kCVPixelBufferWidthKey as String: 1920,
kCVPixelBufferHeightKey as String: 1080,
]
//create assetAdoptor
let assetAdaptor = AVAssetWriterInputTaggedPixelBufferGroupAdaptor(
assetWriterInput: assetWriterInput, sourcePixelBufferAttributes: sourcePixelAttributes)
guard assetWriter.canAdd(assetWriterInput) else {
throw SomeErrorCode.error
}
assetWriter.add(assetWriterInput)
guard assetWriter.startWriting() else {
throw SomeErrorCode.error
}
assetWriter.startSession(atSourceTime: CMTime.zero)
//prepare tranfer session
var session: VTPixelTransferSession? = nil
guard
VTPixelTransferSessionCreate(allocator: kCFAllocatorDefault, pixelTransferSessionOut: &session)
== noErr, let session
else {
throw SomeErrorCode.error
}
guard let pixelBufferPool = assetAdaptor.pixelBufferPool else {
throw SomeErrorCode.error
}
//read through frames
while let nextSampleBuffer = output.copyNextSampleBuffer() {
autoreleasepool {
guard let imageBuffer = CMSampleBufferGetImageBuffer(nextSampleBuffer) else {
return
}
//this part copied from (https://developer.apple.com/videos/play/wwdc2023/10181) at 23:58 timestamp
let attachment = [
kCVImageBufferYCbCrMatrixKey: kCVImageBufferYCbCrMatrix_ITU_R_2020,
kCVImageBufferColorPrimariesKey: kCVImageBufferColorPrimaries_ITU_R_2020,
kCVImageBufferTransferFunctionKey: kCVImageBufferTransferFunction_SMPTE_ST_2084_PQ,
]
CVBufferSetAttachments(imageBuffer, attachment as CFDictionary, .shouldPropagate)
//now convert to CIImage with HDR data
let image = CIImage(cvPixelBuffer: imageBuffer)
let cropped = "" //here perform some actions like cropping, flipping, etc. and preserve this changes by converting the extent to CGImage first:
//this part copied from (https://developer.apple.com/videos/play/wwdc2023/10181) at 24:30 timestamp
guard
let cgImage = context.createCGImage(
cropped, from: cropped.extent, format: .RGBA16,
colorSpace: CGColorSpace(name: CGColorSpace.itur_2100_PQ)!)
else {
continue
}
//finally convert it back to CIImage
let newScaledImage = CIImage(cgImage: cgImage)
//now write it to a new pixelBuffer
let pixelBufferAttributes: [String: Any] = [
kCVPixelBufferCGImageCompatibilityKey as String: true,
kCVPixelBufferCGBitmapContextCompatibilityKey as String: true,
]
var pixelBuffer: CVPixelBuffer?
CVPixelBufferCreate(
kCFAllocatorDefault, Int(newScaledImage.extent.width), Int(newScaledImage.extent.height),
kCVPixelFormatType_420YpCbCr10BiPlanarFullRange, pixelBufferAttributes as CFDictionary,
&pixelBuffer)
guard let pixelBuffer else {
continue
}
context.render(newScaledImage, to: pixelBuffer) //context is a CIContext reference
var pixelTransferBuffer: CVPixelBuffer?
CVPixelBufferPoolCreatePixelBuffer(kCFAllocatorDefault, pixelBufferPool, &pixelTransferBuffer)
guard let pixelTransferBuffer else {
continue
}
// Transfer the image to the pixel buffer.
guard
VTPixelTransferSessionTransferImage(session, from: pixelBuffer, to: pixelTransferBuffer)
== noErr
else {
continue
}
//finally append to taggedBuffer
}
}
assetWriterInput.markAsFinished()
await assetWriter.finishWriting()
The result video is not in correct color as the original video. It turns out too bright. If I play around with attachment values, it can be either too dim or too bright but not exactly proper as the original video. What am I missing in my setup? I did find that kCVPixelFormatType_4444AYpCbCr16 can produce proper video output but then I can't convert it to CIImage and so I can't do the CIImage operations that I need. Mainly cropping and resizing the CIImage
I want to understand the utility of using AsyncStream when iOS 17 introduced @Observable macro where we can directly observe changes in the value of any variable in the model(& observation tracking can happen even outside SwiftUI view). So if I am observing a continuous stream of values, such as download progress of a file using AsyncStream in a SwiftUI view, the same can be observed in the same SwiftUI view using onChange(of:initial) of download progress (stored as a property in model object). I am looking for benefits, drawbacks, & limitations of both approaches.
Specifically, my question is with regards to AVCam sample code by Apple where they observe few states as follows. This is done in CameraModel class which is attached to SwiftUI view.
// MARK: - Internal state observations
// Set up camera's state observations.
private func observeState() {
Task {
// Await new thumbnails that the media library generates when saving a file.
for await thumbnail in mediaLibrary.thumbnails.compactMap({ $0 }) {
self.thumbnail = thumbnail
}
}
Task {
// Await new capture activity values from the capture service.
for await activity in await captureService.$captureActivity.values {
if activity.willCapture {
// Flash the screen to indicate capture is starting.
flashScreen()
} else {
// Forward the activity to the UI.
captureActivity = activity
}
}
}
Task {
// Await updates to the capabilities that the capture service advertises.
for await capabilities in await captureService.$captureCapabilities.values {
isHDRVideoSupported = capabilities.isHDRSupported
cameraState.isVideoHDRSupported = capabilities.isHDRSupported
}
}
Task {
// Await updates to a person's interaction with the Camera Control HUD.
for await isShowingFullscreenControls in await captureService.$isShowingFullscreenControls.values {
withAnimation {
// Prefer showing a minimized UI when capture controls enter a fullscreen appearance.
prefersMinimizedUI = isShowingFullscreenControls
}
}
}
}
If we see the structure CaptureCapabilities, it is a small structure with two Bool members. These changes could have been directly observed by a SwiftUI view. I wonder if there is a specific advantage or reason to use AsyncStream here & continuously iterate over changes in a for loop.
/// A structure that represents the capture capabilities of `CaptureService` in
/// its current configuration.
struct CaptureCapabilities {
let isLivePhotoCaptureSupported: Bool
let isHDRSupported: Bool
init(isLivePhotoCaptureSupported: Bool = false,
isHDRSupported: Bool = false) {
self.isLivePhotoCaptureSupported = isLivePhotoCaptureSupported
self.isHDRSupported = isHDRSupported
}
static let unknown = CaptureCapabilities()
}
I’m trying to record videos with AvAssetWriter but sometimes my videos exempt audio buffers recorded and when audio is included, the stream of video buffers stops.
the gist below is my code.
https://gist.github.com/kwameaj67/70a3409c84d48cf758b3734c08a46244
Periodically when testing I am running into a situation where the app hangs and beach balls forever when using AVAudioEngine.
This seems to log out when this affect happens:
Now when this happens if I pause the debugger it's hanging at a call to:
[engine connect:playerNode
to:engine.mainMixerNode
format:buffer.format];
#0 0x000000019391ca9c in __psynch_mutexwait ()
#1 0x0000000104d49100 in _pthread_mutex_firstfit_lock_wait ()
#2 0x0000000104d49014 in _pthread_mutex_firstfit_lock_slow ()
#3 0x00000001938928ec in std::__1::recursive_mutex::lock ()
#4 0x00000001ef80e988 in CADeprecated::RealtimeMessenger::_PerformPendingMessages ()
#5 0x00000001ef818868 in AVAudioNodeTap::Uninitialize ()
#6 0x00000001ef7fdc68 in AUGraphNodeBase::Uninitialize ()
#7 0x00000001ef884f38 in AVAudioEngineGraph::PerformCommand ()
#8 0x00000001ef88e780 in AVAudioEngineGraph::_Connect ()
#9 0x00000001ef8b7e70 in AVAudioEngineImpl::Connect ()
#10 0x00000001ef8bc05c in -[AVAudioEngine connect:to:format:] ()
Current all my audio engine related calls are on the main queue (though I am curious about this https://forums.developer.apple.com/forums/thread/123540?answerId=816827022#816827022).
In any case, anyone know where I'm going wrong here?
In an m3u8 manifest, audio EXT-X-MEDIA tags usually contain CHANNELS tag containing the audio channels count like so:
#EXT-X-MEDIA:TYPE=AUDIO,URI="audio_clear_eng_stereo.m3u8",GROUP-ID="default-audio-group",LANGUAGE="en",NAME="stream_5",AUTOSELECT=YES,CHANNELS="2"
Is it possible to get this info from AVPlayer, AVMediaSelectionOption or some related API?
Hi all,
we are in the business of scanning documents and barcodes with the camera system of mobile devices. Since there is a wide variety of use cases, from scanning tiniest barcodes and small business cards to scanning barcodes or large documents from far distances we preferably rely on the triple camera devices, if available, with automatic constituent device switching.
This approach used to be working perfectly fine. Depending on the zoom level (we prefer to use an initial zoom value of 2.0) and the focusing distance the iPhone Pro models switched through the different camera systems at light speed: from ultra-wide to wide, tele and back. No issues at all.
Unfortunately the new iPhone 16 Pro models behave very different when it comes to constituent device switching based on focus distance. The switching is slow and sometimes it does not happen at all when the focusing distance changes. Especially when aiming for a at a distant object for a longer time and then aiming at a very close object that is maybe 2" away. The iPhone 15 Pro here always switches immediately to the ultra-wide camera, while the iPhone 16 Pro takes at least 2-3 seconds, in rare cases up to 10 seconds and sometimes forever to switch to the ultra-wide camera.
Of course we assumed that our code is responsible for these issues. So we experimented with restricting the devices and so on. Then we stripped more and more configuration code but nothing we tried improved the situation.
So we ended up writing a minimal example app that demonstrates the problem. You can find the code below. Execute it on various iPhones and aim at far distance (> 10 feet) and then quickly to very close distance (<5 inches).
Here is a list of devices and our test results:
iPhone 15 Pro, iOS 17.6: very fast and reliable switching
iPhone 15 Pro, iOS 18.1: very fast and reliable switching
iPhone 13 Pro Max, iOS 15.3: very fast and reliable switching
iPhone 16 (dual-wide camera), iOS 18.1: very fast and reliable switching
iPhone 16 Pro, iOS 18.1: slow switching, unreliable
iPhone 16 Pro Max, iOS 18.1: slow switching, unreliable
Questions:
Does anyone else have seen this issue? And possibly found a workaround?
Is this behaviour intended on iPhone 16 Pro models? Can we somehow improve the switching speed?
Further the iPhone 16 Pro models also show a jumping preview in the preview layer when they switch the constituent active device. Not dramatic, but compared to the other phones it looks like a glitch.
Thank you very much!
Kind regards,
Sebastian
import UIKit
import AVFoundation
class ViewController: UIViewController {
var captureSession : AVCaptureSession!
var captureDevice : AVCaptureDevice!
var captureInput : AVCaptureInput!
var previewLayer : AVCaptureVideoPreviewLayer!
var activePrimaryConstituentToken: NSKeyValueObservation?
var zoomToken: NSKeyValueObservation?
override func viewDidLoad() {
super.viewDidLoad()
}
override func viewDidAppear(_ animated: Bool) {
super.viewDidAppear(animated)
checkPermissions()
setupAndStartCaptureSession()
}
func checkPermissions() {
let cameraAuthStatus = AVCaptureDevice.authorizationStatus(for: AVMediaType.video)
switch cameraAuthStatus {
case .authorized:
return
case .denied:
abort()
case .notDetermined:
AVCaptureDevice.requestAccess(for: AVMediaType.video, completionHandler:
{ (authorized) in
if(!authorized){
abort()
}
})
case .restricted:
abort()
@unknown default:
fatalError()
}
}
func setupAndStartCaptureSession() {
DispatchQueue.global(qos: .userInitiated).async{
self.captureSession = AVCaptureSession()
self.captureSession.beginConfiguration()
if self.captureSession.canSetSessionPreset(.photo) {
self.captureSession.sessionPreset = .photo
}
self.captureSession.automaticallyConfiguresCaptureDeviceForWideColor = true
self.setupInputs()
DispatchQueue.main.async {
self.setupPreviewLayer()
}
self.captureSession.commitConfiguration()
self.captureSession.startRunning()
self.activePrimaryConstituentToken = self.captureDevice.observe(\.activePrimaryConstituent, options: [.new], changeHandler: { (device, change) in
let type = device.activePrimaryConstituent!.deviceType.rawValue
print("Device type: \(type)")
})
self.zoomToken = self.captureDevice.observe(\.videoZoomFactor, options: [.new], changeHandler: { (device, change) in
let zoom = device.videoZoomFactor
print("Zoom: \(zoom)")
})
let switchZoomFactor = 2.0
DispatchQueue.main.async {
self.setZoom(CGFloat(switchZoomFactor), animated: false)
}
}
}
func setupInputs() {
if let device = AVCaptureDevice.default(.builtInTripleCamera, for: .video, position: .back) {
captureDevice = device
} else {
fatalError("no back camera")
}
guard let input = try? AVCaptureDeviceInput(device: captureDevice) else {
fatalError("could not create input device from back camera")
}
if !captureSession.canAddInput(input) {
fatalError("could not add back camera input to capture session")
}
captureInput = input
captureSession.addInput(input)
}
func setupPreviewLayer() {
previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
view.layer.addSublayer(previewLayer)
previewLayer.frame = self.view.layer.frame
}
func setZoom(_ value: CGFloat, animated: Bool) {
guard let device = captureDevice else { return }
let maxZoom: CGFloat = captureDevice.maxAvailableVideoZoomFactor
let minZoom: CGFloat = captureDevice.minAvailableVideoZoomFactor
let zoomValue = max(min(value, maxZoom), minZoom)
let deltaZoom = Float(abs(zoomValue - device.videoZoomFactor))
do {
try device.lockForConfiguration()
if animated {
device.ramp(toVideoZoomFactor: zoomValue, withRate: max(deltaZoom * 50.0, 50.0))
} else {
device.videoZoomFactor = zoomValue
}
device.unlockForConfiguration()
} catch {
return
}
}
}
Is there any way to change lens correction on iPad programatically using AVCapture?
Hello,
I'm writing a program to create CMAF compliant HLS files, with encryption.
I have a copy of ISO_IEC_23001-7_2023 to attempt to follow the spec.
I am following the 1:9 pattern encryption using CBCS, so for every 16 bytes of encrypted NAL unit data (of type 1 and 5), there's 144 bytes of clear data.
When testing my output in Safari with 'identity' keys Quickly Diagnosing Content Key and IV Issues, Safari will request the identity key from my test server and first few bytes of the CMAF renditions, but will not play and console gives away no clues to the error.
I am setting the subsample bytesofclear/protected data in the senc boxes. What I'm not sure of, is whether HLS/Safari/iOS acknowledges the senc/saiz/saio boxes of the MP4. There are other third party packagers Bento4, who suggest that they do not:
those clients ignore the explicit encryption
layout metadata found in saio/saiz boxes, and instead rely purely on the
video slice header size to determine the portions of the sample that is
encrypted
So now I'm fairly sure I need to decipher the video slice header size, and apply the protected blocks from that point on.
My question is, is that all there is to it? And is there a better way to debug my output? mediastreamvalidator will only work against unencrypted variants (which I'm outputting okay).
Thanks in advance!
Topic:
Media Technologies
SubTopic:
Streaming
Tags:
FairPlay Streaming
HTTP Live Streaming
AVFoundation
In our app we have implemented a AVAssetResourceLoaderDelegate to handle encrypted downloaded files. We have it working on all iOS versions but we are seeing issues on iOS 15 (15.8.3) with large files (> 1 Gb). We have so far seen two cases where either the load method on the AVURLAsset fails early and throws an unknown error code or starts requesting more data than the device has available RAM. The CPU usage is almost always over 100%, even after pausing playback. The memory issue can happen even though the player has successfully started playback.
When running this on devices running iOS 16 and above we set the isEntireLengthAvailableOnDemand to true on the AVAssetResourceLoadingContentInformationRequest. This seems to be key to solving the issue those devices that support it. If we set the property to false we see the same memory issue as on iOS 15.
So we have a solution for iOS 16 and upwards but are at a loss for how to handle iOS 15. Is there something we have overlooked or is it in fact an issue with that iOS version?
I'm trying to add metadata every second during video capture in the Swift sample App "AVMultiCamPiP". A simple string that changes every second with a write function triggered by a Timer. Can't get it to work, no matter how I arrange it, always ends up with the error "Cannot create a new metadata adaptor with an asset writer input that has already started writing".
This is the setup section:
// Add a metadata input
let assetWriterMetaDataInput = AVAssetWriterInput(mediaType: .metadata, outputSettings: nil, sourceFormatHint: AVTimedMetadataGroup().copyFormatDescription())
assetWriterMetaDataInput.expectsMediaDataInRealTime = true
assetWriter.add(assetWriterMetaDataInput)
self.assetWriterMetaDataInput = assetWriterMetaDataInput
This is the timed metadata creation which gets triggered every second:
let newNoteMetadataItem = AVMutableMetadataItem()
newNoteMetadataItem.value = "Some string" as (NSCopying & NSObjectProtocol)?
let metadataItemGroup = AVTimedMetadataGroup.init(items: [newNoteMetadataItem], timeRange: CMTimeRangeMake( start: CMClockGetTime( CMClockGetHostTimeClock() ), duration: CMTime.invalid ))
movieRecorder?.recordMetaData(meta: metadataItemGroup)
This function is supposed to add the metadata to the track:
func recordMetaData(meta: AVTimedMetadataGroup) {
guard isRecording,
let assetWriter = assetWriter,
assetWriter.status == .writing,
let input = assetWriterMetaDataInput,
input.isReadyForMoreMediaData else {
return
}
let metadataAdaptor = AVAssetWriterInputMetadataAdaptor(assetWriterInput: input)
metadataAdaptor.append(meta)
}
I have an older code example in objc which works OK, but it uses "AVCaptureMetadataInput appendTimedMetadataGroup" and writes to an identifier called "quickTimeMetadataLocationNote". I'd like to do something similar in the above Swift code ...
All suggestions are appreciated !
I'm using AVAudioEngine to play AVAudioPCMBuffers. I'd like to synchronize some events with the playback. For example if the audio's frame position is >= some point && less than some point trigger some code.
So I'm looking at - (void)installTapOnBus:(AVAudioNodeBus)bus bufferSize:(AVAudioFrameCount)bufferSize format:(AVAudioFormat * __nullable)format block:(AVAudioNodeTapBlock)tapBlock;
Now I have frame positions calculated (predetermined before audio is scheduled I already made all necessary computations) . So I just need to fire code at certain points during playback:
[playerNode installTapOnBus:bus
bufferSize:bufferSize
format:format
block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {
//Inspect current audio here and fire...
}];
[playerNode scheduleBuffer:fullbuffer
atTime:startTime
options:0
completionCallbackType:AVAudioPlayerNodeCompletionDataPlayedBack
completionHandler:^(AVAudioPlayerNodeCompletionCallbackType callbackType)
{
// some code is here, not important to this question.
}];
The problem I'm having is figuring out at what point in full buffer I'm at within the tap block. The tap block passes chunks (not the full audio buffer). I tried using the when parameter of the block to calculate the frame position relative to the entire audio but have be unsuccessful so far. I'm assuming the when parameter is relative to the buffer passed in the tap block (not my entire audio buffer I scheduled).
Not installing a tap and just using a timer before scheduling my fullBuffer has given me good results but I'd rather avoid using a timer if possible and use sample time.
Topic:
Media Technologies
SubTopic:
Audio
Tags:
AVAudioNode
AVAudioSession
AVAudioEngine
AVFoundation
Hi everyone,
I am working on a 3D reconstruction project.
Recently I have been able to retrieve the intrinsics from the two cameras on the back of my iPhone.
One consideration is that I want this app to run regardless if there is no LiDAR, but at least two cameras on the back. IF there is a LiDAR that is something I have considered to work later on the course of the project.
I am using a AVCaptureSession with the two cameras AVCaptureDevice:
builtInWideAngleCamera
builtInUltraWideCamera
The intrinsic matrices seem to be correct. However, the when I retrieve the extrinsics, e.g., builtInWideAngleCamera w.r.t. builtInUltraWideCamera the matrix I get looks like this:
Extrinsic Matrix (Ultra-Wide to Wide):
[0.9999968, 0.0008149305, -0.0023960583, 0.0]
[-0.0008256607, 0.9999896, -0.0044807075, 0.0]
[0.002392382, 0.0044826716, 0.99998707, 0.0].
[-14.277955, -8.135408e-10, -0.3359985, 0.0]
The extrinsic matrix of the form: [R | t], seems to be correct for the rotational part, but the translational vector is ALL ZEROS. Which suggests that the cameras are physically overlapped as well the last element not being 1 (homogeneous coordinates).
Has anyone encountered this 'issue' before?
Is there a flaw in my reasoning or something I might be missing?
Any comments are very much appreciated.
I am creating an AVComposition and using it with an AVPlayer. The player works fine and doesn't consume much memory when I do not set playerItem.videoComposition. Here is the code that works without excessive memory usage:
func configurePlayer(composition: AVMutableComposition, videoComposition: AVVideoComposition) {
player.pause()
player.replaceCurrentItem(with: nil)
let playerItem = AVPlayerItem(asset: composition)
player.play()
}
However, when I add playerItem.videoComposition = videoComposition, as in the code below, the memory usage becomes excessive:
func configurePlayer(composition: AVMutableComposition, videoComposition: AVVideoComposition) {
player.pause()
player.replaceCurrentItem(with: nil)
let playerItem = AVPlayerItem(asset: composition)
playerItem.videoComposition = videoComposition
player.play()
}
Issue Details:
The memory usage seems to depend on the number of video tracks in the composition, rather than their duration. For instance, two videos of 30 minutes each consume less memory than 40 videos of just 2 seconds each.
The excessive memory usage is showing up in the Other Processes section of Xcode's debug panel.
For reference, 42 videos, each less than 30 seconds, are using around 1.4 GB of memory.
I'm struggling to understand why adding videoComposition causes such high memory consumption, especially since it happens even when no layer instructions are applied. Any insights on how to address this would be greatly appreciated. Before After
I initially thought the problem might be due to having too many layer instructions in the video composition, but this doesn't seem to be the case. Even when I set a videoComposition without any layer instructions, the memory consumption remains high.
I use a AVplayer in a window view, I found that when I move the window to different positions, the default behavior is that the sound will change according to the window position. However, in some cases, I don't need this default behavior. I hope the sound doesn't change.
I've seen the Multiview feature on tvOS that displays a small grid icon when available. However, I only see this functionality in VisionOS using the AVMultiviewManager. Does a different name refer to this feature on tvOS?
Relevant Links:
https://www.reddit.com/r/appletv/comments/12opy5f/handson_with_the_new_multiview_split_screen/
https://www.pocket-lint.com/how-to-use-multiview-apple-tv/#:~:text=You'll%20see%20a%20grid,running%20at%20the%20same%20time.
I’ve built a custom media player using AVSampleBufferAudioRenderer and AVSampleBufferRenderSynchronizer, and overall, it works great!
However, I’ve noticed some unusual logs popping up:
Domain: NSOSStatusErrorDomain
Error Codes: -16384, -16155, -16512
*That error -16512 keeps happening repeatedly for one of our users, preventing them from playing any media at all.
I’ve searched around but can’t find any documentation explaining what these errors mean.
Has anyone run into this issue or have any suggestions? Any help would be hugely appreciated!
Thanks!
We have developed a simple video player Swift application for macOS, which uses the AVFoundation Framework. A special feature of this app is the ability to play the video backward with speeds like -0.25x, -0.5x, and -1.0x. MP4 video file is played directly from the local file system, video codec is h.264, and audio AAC. Video files are huge, like 10 GB, and a length of 3 hours.
Playing video in reverse direction works well on a Macbook Air with M1 or M2 chip. When we run the same app with the same video on a Macbook Air with M3 chip the reverse playback is much worse. Playback might stutter badly, especially in the latter part of the video. This same behavior also happens in Apple's Quicktime video player when playing in the reverse direction with -1x speed. What's even more strange is that at one point of a time, the video playback is totally smooth, but again, after a while, the playback is stuttering. For example, this morning reverse playback worked 100 % smoothly, then I rebooted the Mac and tried again: the result was stuttering. After this the Mac stayed idle for several hours and I tried to reverse play video again: smooth performance! My conclusion: M3 playback works fine if the stars in the sky are aligned correctly. :-)
So it's not only our app, but also Quicktime player is having exactly the same behavior. And only with the M3 chip. The same symptom appears with another similar M3 Mac, so it can't be a single fault. At the same time, open-source video player iina can reverse play the video well on the same Mac.
All Macs have otherwise identical configuration: 16 GB RAM and macOS 15.1.1.
Have you experienced the same problem? Any chance to solve this problem?
I really hope that the M4 chip Mac is behaving better here.
Hello Developers,
I am working on an app where I need to capture 48MP high-resolution photos using the ultra-wide camera of the iPhone 16 Pro while an AR session is running. The goal is to take these photos without interrupting or impacting the AR session, which uses the main wide-angle camera. Despite extensive testing and various approaches, we have been unable to achieve the desired functionality.
What We Have Tried So Far
1. Using AVCaptureMultiCamSession:
• We attempted to leverage AVCaptureMultiCamSession to simultaneously use the wide-angle camera for ARKit and the ultra-wide camera for photo capture.
• However, this approach resulted in resource conflicts, with errors such as Cannot Record (OSStatus error -16409) and dropped frames.
Additionally, the ultra-wide camera feed would frequently freeze or stop.
2. Dedicated AVCaptureSession for the Ultra-Wide Camera:
• We separated the ultra-wide camera into its own AVCaptureSession while letting ARKit exclusively use the wide-angle camera.
• This setup showed initial promise, but the ultra-wide camera feed would still stop running after a very short time (under one second).
• Debugging logs indicated potential system-level interruptions, possibly due to resource prioritization by iOS.
3. Notification-Based Monitoring:
• We implemented monitoring for session interruptions (AVCaptureSession.wasInterruptedNotification), but this provided limited insights into the exact cause of the session stopping.
• We suspect iOS is de-prioritizing the ultra-wide camera session due to resource management policies or conflicts with ARKit.
4. Adjusting Camera Configurations:
• We attempted to simplify both ARKit and AVCaptureSession configurations by reducing features like depth data and by using lower session presets for video capture. However, the core issue persisted.
The Core Problem
• The ultra-wide camera session frequently stops or freezes when used alongside ARKit.
• Capturing high-resolution 48MP photos during the AR session is critical to the functionality of our app.
Question
Has anyone successfully implemented a similar setup? Specifically:
• Capturing 48MP photos with the ultra-wide camera while ARKit is actively using the main camera.
• Avoiding conflicts between ARKit and AVCaptureSession for the ultra-wide camera.
Any insights, suggestions, or alternative approaches would be greatly appreciated. Thank you in advance for your help! 😊