Hello,
First, some version and software details:
- Software: iOS 18.1
- Hardware: iPhone 14 Pro Max and later
- Xcode: 16.0
Summary: AVAssetReader is not concatenating a video at the beginning of the output video. The output video should contain a scene of me introducing the content, followed by a blue screen with AVSpeechSynthesizer reading out a text that I pasted above the "Generate Video" button.
Details:
Now, let's talk about the app.
Basically, I’m developing an app that generates a video with the following features:
- My app will create an output video that is split into an opening scene followed by a fully blue screen.
- The opening scene will be taken from a video I choose from my gallery.
- I will read the opening video using AVAssetReader as usual.
- After the opening scene, I will use the content of a text read by AVSpeechSynthesizer.write().
- After the opening scene, the synthesized audio will start playing while the blue screen is displayed.
All of this is already defined in the attached project.
Each project file has a comment at the beginning introducing its content.
How to test:
- Write something in the field above the "Generate Video" button. For example, type "Hello, world!"
- Then, press the "Library" button and select a video from the gallery, about 30 seconds long.
- That’s it. Press the "Generate Video" button.
The result I’ve experienced is a crash or failure to generate the video.
Practical example of what I want to achieve:
Suppose I record a 30-second video where I say, "I’m going to tell you the story of Snow White."
Then, I paste the "Snow White" story into the field above the "Generate Video" button.
The output video should contain me saying, "I’m going to tell you the story of Snow White."
After that, the AVSpeechSynthesizer will read the story I pasted, while displaying a blue screen.
I look forward to a solution.
Thank you very much!
// Convert AVAudioPCMBuffer to CMSampleBuffer import AVFoundation extension AVAudioPCMBuffer { func toCMSampleBuffer(presentationTime: CMTime) -> CMSampleBuffer? { var buffer: CMSampleBuffer? let duration = CMTimeMake(value: 1, timescale: Int32(format.sampleRate)) var timing = CMSampleTimingInfo(duration: duration, presentationTimeStamp: presentationTime, decodeTimeStamp: .invalid) guard CMSampleBufferCreate(allocator: kCFAllocatorDefault, dataBuffer: nil, dataReady: false, makeDataReadyCallback: nil, refcon: nil, formatDescription: format.formatDescription, sampleCount: CMItemCount(frameLength), sampleTimingEntryCount: 1, sampleTimingArray: &timing, sampleSizeEntryCount: 0, sampleSizeArray: nil, sampleBufferOut: &buffer) == noErr, let buffer = buffer, CMSampleBufferSetDataBufferFromAudioBufferList(buffer, blockBufferAllocator: kCFAllocatorDefault, blockBufferMemoryAllocator: kCFAllocatorDefault, flags: 0, bufferList: mutableAudioBufferList) == noErr else { return nil } return buffer } }
// Convert UIImage to CVPixelBuffer import UIKit extension UIImage { func toCVPixelBuffer() -> CVPixelBuffer? { var buffer: CVPixelBuffer? let width = Int(size.width) let height = Int(size.height) let dict = [ kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue, kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue ] as CFDictionary guard CVPixelBufferCreate(kCFAllocatorDefault, width, height, kCVPixelFormatType_32ARGB, dict, &buffer) == kCVReturnSuccess, let buffer = buffer, CVPixelBufferLockBaseAddress(buffer, CVPixelBufferLockFlags(rawValue: 0)) == kCVReturnSuccess, let context = CGContext(data: CVPixelBufferGetBaseAddress(buffer), width: width, height: height, bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(buffer), space: CGColorSpaceCreateDeviceRGB(), bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue) else { return nil } context.translateBy(x: 0, y: size.height) context.scaleBy(x: 1, y: -1) UIGraphicsPushContext(context) draw(in: CGRect(x: 0, y: 0, width: size.width, height: size.height)) UIGraphicsPopContext() return CVPixelBufferUnlockBaseAddress(buffer, CVPixelBufferLockFlags(rawValue: 0)) == kCVReturnSuccess ? buffer : nil } }
// Video settings input import AVFoundation func createAudioInput(sampleBuffer: CMSampleBuffer) -> AVAssetWriterInput { let audioSettings = [ AVFormatIDKey: kAudioFormatMPEG4AAC, AVSampleRateKey: 48000, AVEncoderBitRatePerChannelKey: 64000, AVNumberOfChannelsKey: 2 ] as [String : Any] let audioInput = AVAssetWriterInput(mediaType: .audio, outputSettings: audioSettings, sourceFormatHint: sampleBuffer.formatDescription) audioInput.expectsMediaDataInRealTime = false return audioInput } func createVideoInput(videoBuffers: CVPixelBuffer) -> AVAssetWriterInput { let videoSettings = [ AVVideoCodecKey: AVVideoCodecType.h264, AVVideoWidthKey: CVPixelBufferGetWidth(videoBuffers), AVVideoHeightKey: CVPixelBufferGetHeight(videoBuffers) ] as [String: Any] let videoInput = AVAssetWriterInput(mediaType: .video, outputSettings: videoSettings) videoInput.expectsMediaDataInRealTime = false return videoInput } func createPixelBufferAdaptor(videoInput: AVAssetWriterInput) -> AVAssetWriterInputPixelBufferAdaptor { let attributes = [ kCVPixelBufferPixelFormatTypeKey: kCVPixelFormatType_32ARGB, kCVPixelBufferWidthKey: videoInput.outputSettings?[AVVideoWidthKey] as Any, kCVPixelBufferHeightKey: videoInput.outputSettings?[AVVideoHeightKey] as Any ] as [String: Any] return AVAssetWriterInputPixelBufferAdaptor(assetWriterInput: videoInput, sourcePixelBufferAttributes: attributes) }
// Writing output video import AVFoundation extension SampleProvider { func isFinished() -> Bool { return counter == audios.count } func isVideoFinished() -> Bool { return counter == infoVideo.count && qFrame == 0 } } class WriteVideo: NSObject, ObservableObject { private let synthesizer = AVSpeechSynthesizer() var canExport = false var url: URL? private var infoVideo = [(photoIndex: Int, frameTime: Float64)]() var videoURL: URL? var counterImage = 0 // Each word = add one unit let semaphore = DispatchSemaphore(value: 0) override init() { super.init() Misc.obj.selectedPhotos.removeAll() Misc.obj.selectedPhotos.append(createBlueImage(CGSize(width: 1920, height: 1080))) Misc.obj.selectedPhotos.append(createBlueImage(CGSize(width: 1920, height: 1080))) synthesizer.delegate = self } func beginWriteVideo(_ texto: String) { Misc.obj.selectedPhotos.removeAll() Misc.obj.selectedPhotos.append(createBlueImage(CGSize(width: 1920, height: 1080))) Misc.obj.selectedPhotos.append(createBlueImage(CGSize(width: 1920, height: 1080))) DispatchQueue.global().async { do { try self.writtingVideo(texto) print("Video created successful!") } catch { print(error.localizedDescription) } } } func writtingVideo(_ texto: String) throws { infoVideo.removeAll() var audioBuffers = [CMSampleBuffer]() var pixelBuffer = Misc.obj.selectedPhotos[0].toCVPixelBuffer() var audioReaderInput: AVAssetWriterInput? var audioReaderBuffers = [CMSampleBuffer]() var videoReaderBuffers = [(frame: CVPixelBuffer, time: CMTime)]() // Restante do texto let utterance = AVSpeechUtterance(string: texto) utterance.voice = AVSpeechSynthesisVoice(identifier: "pt-BR") // Escreve texto restante synthesizer.write(utterance) { buffer in autoreleasepool { if let buffer = buffer as? AVAudioPCMBuffer, let sampleBuffer = buffer.toCMSampleBuffer(presentationTime: .zero) { audioBuffers.append(sampleBuffer) } } usleep(1000) } semaphore.wait() // Diretório do arquivo url = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0].appendingPathComponent("*****/output.mp4") guard let url = url else { return } try FileManager.default.createDirectory(at: url.deletingLastPathComponent(), withIntermediateDirectories: true) if FileManager.default.fileExists(atPath: url.path()) { try FileManager.default.removeItem(at: url) } // Get CMSampleBuffer of a video asset if let videoURL = videoURL { let videoAsset = AVAsset(url: videoURL) Task { let videoAssetTrack = try await videoAsset.loadTracks(withMediaType: .video).first! let audioTrack = try await videoAsset.loadTracks(withMediaType: .audio).first! let reader = try AVAssetReader(asset: videoAsset) let videoSettings = [ kCVPixelBufferPixelFormatTypeKey: kCVPixelFormatType_32BGRA, kCVPixelBufferWidthKey: videoAssetTrack.naturalSize.width, kCVPixelBufferHeightKey: videoAssetTrack.naturalSize.height ] as [String: Any] let readerVideoOutput = AVAssetReaderTrackOutput(track: videoAssetTrack, outputSettings: videoSettings) let audioSettings = [ AVFormatIDKey: kAudioFormatLinearPCM, AVSampleRateKey: 44100, AVNumberOfChannelsKey: 2 ] as [String : Any] let readerAudioOutput = AVAssetReaderTrackOutput(track: audioTrack, outputSettings: audioSettings) reader.add(readerVideoOutput) reader.add(readerAudioOutput) reader.startReading() // Video CMSampleBuffer while let sampleBuffer = readerVideoOutput.copyNextSampleBuffer() { autoreleasepool { if let imgBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) { let pixBuf = imgBuffer as CVPixelBuffer let pTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer) videoReaderBuffers.append((frame: pixBuf, time: pTime)) } } } // Audio CMSampleBuffer while let sampleBuffer = readerAudioOutput.copyNextSampleBuffer() { audioReaderBuffers.append(sampleBuffer) } semaphore.signal() } } semaphore.wait() let audioProvider = SampleProvider(audios: audioBuffers) let videoProvider = SampleProvider(infoVideo: infoVideo) let audioInput = createAudioInput(sampleBuffer: audioReaderBuffers[0]) let videoInput = createVideoInput(videoBuffers: pixelBuffer!) let adaptor = createPixelBufferAdaptor(videoInput: videoInput) let assetWriter = try AVAssetWriter(outputURL: url, fileType: .mp4) assetWriter.add(videoInput) assetWriter.add(audioInput) assetWriter.startWriting() assetWriter.startSession(atSourceTime: .zero) // Add video buffer and audio buffer in AVAssetWriter var frameCounter = Int64.zero let videoReaderProvider = SampleReaderProvider(frames: videoReaderBuffers) let audioReaderProvider = SampleReaderProvider(audios: audioReaderBuffers) while true { if videoReaderProvider.isFinished() && audioReaderProvider.isFinished() { break // Video continuation beloow the while } autoreleasepool { if videoInput.isReadyForMoreMediaData { if let buffer = videoReaderProvider.getNextVideoBuffer() { adaptor.append(buffer.frame, withPresentationTime: buffer.time) frameCounter += 1 // Used in other while, but add here } } } // Audio buffer autoreleasepool { if audioInput.isReadyForMoreMediaData { if let buffer = audioReaderProvider.getNextAudioBuffer() { audioInput.append(buffer) } } } } // Now, add the audioBuffers and the idnexPhotoBuffers content while true { if videoProvider.isVideoFinished() && audioProvider.isFinished() { videoInput.markAsFinished() audioInput.markAsFinished() break } autoreleasepool { if videoInput.isReadyForMoreMediaData { if let frame = videoProvider.moreFrame() { frameCounter += 1 while !videoInput.isReadyForMoreMediaData { usleep(1000) } adaptor.append(frame, withPresentationTime: CMTimeMake(value: frameCounter, timescale: 30)) } else { videoInput.markAsFinished() } } } if audioInput.isReadyForMoreMediaData { autoreleasepool { if let buffer = audioProvider.getNextAudio() { audioInput.append(buffer) } } } if let error = assetWriter.error { print(error.localizedDescription) fatalError() } } assetWriter.finishWriting { switch assetWriter.status { case .completed: print("Operation completed successfully: \(url.absoluteString)") self.canExport = true case .failed: if let error = assetWriter.error { print("Error description: \(error.localizedDescription)") } else { print("Error not found.") } default: print("Error not found.") } } } } extension WriteVideo: AVSpeechSynthesizerDelegate { func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, willSpeakRangeOfSpeechString characterRange: NSRange, utterance: AVSpeechUtterance) { infoVideo.append((photoIndex: 0, frameTime: 1.0)) } func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) { infoVideo.append((photoIndex: counterImage, frameTime: 1.0)) self.semaphore.signal() } }
// Contains the creation of the blue image import UIKit // Global variables func createBlueImage(_ tamanho: CGSize) -> UIImage { UIGraphicsBeginImageContext(tamanho) if let contexto = UIGraphicsGetCurrentContext() { contexto.setFillColor(UIColor.blue.cgColor) contexto.fill(CGRect(origin: .zero, size: tamanho)) } let imagemAzul = UIGraphicsGetImageFromCurrentImageContext() UIGraphicsEndImageContext() return imagemAzul ?? UIImage() }
// Save output video import Photos func saveVideo(url: URL, completion: @escaping (Error?) -> Void) { PHPhotoLibrary.shared().performChanges({ let assetChangeRequest = PHAssetChangeRequest.creationRequestForAssetFromVideo(atFileURL: url) let assetPlaceholder = assetChangeRequest?.placeholderForCreatedAsset guard let albumChangeRequest = PHAssetCollectionChangeRequest(for: PHAssetCollection.fetchAssetCollections(with: .album, subtype: .any, options: nil).firstObject!) else { return } albumChangeRequest.addAssets([assetPlaceholder] as NSArray) }, completionHandler: { success, error in if success { print("Video saved successfully") completion(nil) } else { print("Error: \(error?.localizedDescription ?? "")") completion(error) } }) }
import SwiftUI @main struct TaleTeller: App { var body: some Scene { WindowGroup { editingProject() } } }
// Buttons helpful import SwiftUI import PhotosUI struct Movie: Transferable { let url: URL static var transferRepresentation: some TransferRepresentation { FileRepresentation(contentType: .movie) { movie in SentTransferredFile(movie.url) } importing: { received in let copy = URL.documentsDirectory.appending(path: "tmpmovie.mp4") if FileManager.default.fileExists(atPath: copy.path()) { try FileManager.default.removeItem(at: copy) } try FileManager.default.copyItem(at: received.file, to: copy) return Self.init(url: copy) } } } struct editingProject: View { @State private var text = "" @ObservedObject private var writeVideo = WriteVideo() @State private var selectedImage: PhotosPickerItem? @State private var image: UIImage? @State private var selectedVideo: PhotosPickerItem? @State private var selectedVideoURL: URL? enum LoadState { case unknown, loading, loaded(Movie), failed } @State private var loadState = LoadState.unknown var body: some View { VStack { TextEditor(text: $text) Button("Generate Video") { Task(priority: .background) { if selectedVideoURL != nil { writeVideo.videoURL = selectedVideoURL } else { writeVideo.videoURL = nil } await writeVideo.beginWriteVideo(text) } } .disabled(text == "") if writeVideo.canExport { Button("Save Vídeo") { if let url = writeVideo.url { saveVideo(url: url) { error in if let error = error { print(error.localizedDescription) } else { print("Video exported successful!") } } } } } Button("Request Permission to Create Video") { let solicitou = PHPhotoLibrary.requestAuthorization(for: .readWrite) { status in switch status { case .authorized: Text("OK, autorized.") default: Text("You is not authorized.") } } } if writeVideo.canExport, let url = writeVideo.url { PhotosPicker("Library", selection: $selectedVideo, matching: .videos) switch loadState { case .unknown: EmptyView() case .loading: ProgressView() case .loaded(let movie): Text("Vídeo carregado: \(movie.url)") case .failed: Text("Importação flahou.") } if let videoURL = selectedVideoURL { Text("Vídeo disponível.") } //---------- } } // Fim VSTack { .onChange(of: selectedVideo) { newValue in Task { do { loadState = .loading if let movie = try await selectedVideo?.loadTransferable(type: Movie.self) { selectedVideoURL = movie.url loadState = .loaded(movie) } else { loadState = .failed } } catch { loadState = .failed } } } } }
// Writing the galery video in output video import AVFoundation class SampleReaderProvider { private var videoBuffers: [(frame: CVPixelBuffer, time: CMTime)] = [] private var audioBuffers: [CMSampleBuffer] = [] private var counter = 0 init(audios: [CMSampleBuffer]) { self.audioBuffers = audios } init(frames: [(frame: CVPixelBuffer, time: CMTime)]) { self.videoBuffers = frames } func getNextVideoBuffer() -> (frame: CVPixelBuffer, time: CMTime)? { if counter == videoBuffers.count { return nil } let buffer = (frame: videoBuffers[counter].frame, time: videoBuffers[counter].time) counter += 1 return buffer } func getNextAudioBuffer() -> CMSampleBuffer? { if counter == audioBuffers.count { return nil } let buffer = audioBuffers[counter] counter += 1 return buffer } func isFinished() -> Bool { if videoBuffers.count > 0 { return counter == videoBuffers.count } else if audioBuffers.count > 0 { return counter == audioBuffers.count } else { return false } } }
import UIKit class Misc: ObservableObject { @Published var selectedPhotos: [UIImage] = [] static let obj = Misc() private init(){} }
// Output video provider writing the get content in AVSpeechSynthesizer and Misc.selecctedPhotos import UIKit import AVFoundation class SampleProvider { var audios = [CMSampleBuffer]() var infoVideo = [(photoIndex: Int, frameTime: Float64)]() private var frame: CVPixelBuffer? var qFrame = 0 let frameDuration = CMTime(value: 1, timescale: 30) private var frameTime = CMTime.zero var extraFrameTime = Float64.zero var counter = 0 init(audios: [CMSampleBuffer]) { self.audios = audios } init(infoVideo: [(photoIndex: Int, frameTime: Float64)]) { self.infoVideo = infoVideo } func getNextAudio() -> CMSampleBuffer? { if audios.count == 0 { return nil } if counter == audios.count { return nil } let tmp = audios[counter] counter += 1 return tmp } private func startNewFrame() -> Int? { if infoVideo.count == 0 { return nil } if counter == infoVideo.count { return nil } // Get the image index guard let photoIndex = infoVideo[counter].photoIndex as? Int else { return nil } let image = Misc.obj.selectedPhotos[photoIndex] frame = image.toCVPixelBuffer() counter += 1 return 30 // 30 fps } func moreFrame() -> CVPixelBuffer? { if qFrame == 0 { // Get 30 fps of a blue image if let newQuantFrame = startNewFrame() { qFrame = newQuantFrame } else { return nil } } qFrame -= 1 return frame } }