Writing video using AVAssetWriter, AVAssetReader, and AVSPEECHSYNTHESIZER

Hello,

First, some version and software details:

  • Software: iOS 18.1
  • Hardware: iPhone 14 Pro Max and later
  • Xcode: 16.0

Summary: AVAssetReader is not concatenating a video at the beginning of the output video. The output video should contain a scene of me introducing the content, followed by a blue screen with AVSpeechSynthesizer reading out a text that I pasted above the "Generate Video" button.

Details:

Now, let's talk about the app.

Basically, I’m developing an app that generates a video with the following features:

  • My app will create an output video that is split into an opening scene followed by a fully blue screen.
  • The opening scene will be taken from a video I choose from my gallery.
  • I will read the opening video using AVAssetReader as usual.
  • After the opening scene, I will use the content of a text read by AVSpeechSynthesizer.write().
  • After the opening scene, the synthesized audio will start playing while the blue screen is displayed.

All of this is already defined in the attached project.

Each project file has a comment at the beginning introducing its content.

How to test:

  1. Write something in the field above the "Generate Video" button. For example, type "Hello, world!"
  2. Then, press the "Library" button and select a video from the gallery, about 30 seconds long.
  3. That’s it. Press the "Generate Video" button.

The result I’ve experienced is a crash or failure to generate the video.

Practical example of what I want to achieve:

Suppose I record a 30-second video where I say, "I’m going to tell you the story of Snow White."
Then, I paste the "Snow White" story into the field above the "Generate Video" button.

The output video should contain me saying, "I’m going to tell you the story of Snow White."
After that, the AVSpeechSynthesizer will read the story I pasted, while displaying a blue screen.

I look forward to a solution.

Thank you very much!

// Convert AVAudioPCMBuffer to CMSampleBuffer

import AVFoundation

extension AVAudioPCMBuffer {
func toCMSampleBuffer(presentationTime: CMTime) -> CMSampleBuffer? {
var buffer: CMSampleBuffer?

let duration = CMTimeMake(value: 1, timescale: Int32(format.sampleRate))
var timing = CMSampleTimingInfo(duration: duration, presentationTimeStamp: presentationTime, decodeTimeStamp: .invalid)

guard CMSampleBufferCreate(allocator: kCFAllocatorDefault,
              dataBuffer: nil,
              dataReady: false,
              makeDataReadyCallback: nil,
              refcon: nil,
              formatDescription: format.formatDescription,
              sampleCount: CMItemCount(frameLength),
              sampleTimingEntryCount: 1,
              sampleTimingArray: &timing,
              sampleSizeEntryCount: 0,
              sampleSizeArray: nil,
              sampleBufferOut: &buffer) == noErr,
let buffer = buffer,
CMSampleBufferSetDataBufferFromAudioBufferList(buffer,
                                        blockBufferAllocator: kCFAllocatorDefault,
                                        blockBufferMemoryAllocator: kCFAllocatorDefault,
                                        flags: 0,
                                        bufferList: mutableAudioBufferList) == noErr else { return nil }

return buffer
}
}


// Convert UIImage to CVPixelBuffer

import UIKit

extension UIImage {
func toCVPixelBuffer() -> CVPixelBuffer? {
var buffer: CVPixelBuffer?

let width = Int(size.width)
let height = Int(size.height)

let dict = [
kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue,
kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue
] as CFDictionary

guard CVPixelBufferCreate(kCFAllocatorDefault, width, height, kCVPixelFormatType_32ARGB, dict, &buffer) == kCVReturnSuccess,
let buffer = buffer,
CVPixelBufferLockBaseAddress(buffer, CVPixelBufferLockFlags(rawValue: 0)) == kCVReturnSuccess,
let context = CGContext(data: CVPixelBufferGetBaseAddress(buffer),
                 width: width,
                 height: height,
                 bitsPerComponent: 8,
                 bytesPerRow: CVPixelBufferGetBytesPerRow(buffer),
                 space: CGColorSpaceCreateDeviceRGB(),
                 bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue) else { return nil }

context.translateBy(x: 0, y: size.height)
context.scaleBy(x: 1, y: -1)

UIGraphicsPushContext(context)
draw(in: CGRect(x: 0, y: 0, width: size.width, height: size.height))
UIGraphicsPopContext()

return CVPixelBufferUnlockBaseAddress(buffer, CVPixelBufferLockFlags(rawValue: 0)) == kCVReturnSuccess ? buffer : nil
}
}


// Video settings input

import AVFoundation

func createAudioInput(sampleBuffer: CMSampleBuffer) -> AVAssetWriterInput {
let audioSettings = [
AVFormatIDKey: kAudioFormatMPEG4AAC,
AVSampleRateKey: 48000,
AVEncoderBitRatePerChannelKey: 64000,
AVNumberOfChannelsKey: 2
] as [String : Any]

let audioInput = AVAssetWriterInput(mediaType: .audio, outputSettings: audioSettings, sourceFormatHint: sampleBuffer.formatDescription)

audioInput.expectsMediaDataInRealTime = false

return audioInput
}

func createVideoInput(videoBuffers: CVPixelBuffer) -> AVAssetWriterInput {
let videoSettings = [
AVVideoCodecKey: AVVideoCodecType.h264,
AVVideoWidthKey: CVPixelBufferGetWidth(videoBuffers),
AVVideoHeightKey: CVPixelBufferGetHeight(videoBuffers)
] as [String: Any]

let videoInput = AVAssetWriterInput(mediaType: .video, outputSettings: videoSettings)
videoInput.expectsMediaDataInRealTime = false

return videoInput
}

func createPixelBufferAdaptor(videoInput: AVAssetWriterInput) -> AVAssetWriterInputPixelBufferAdaptor {
let attributes = [
kCVPixelBufferPixelFormatTypeKey: kCVPixelFormatType_32ARGB,
kCVPixelBufferWidthKey: videoInput.outputSettings?[AVVideoWidthKey] as Any,
kCVPixelBufferHeightKey: videoInput.outputSettings?[AVVideoHeightKey] as Any
] as [String: Any]

return AVAssetWriterInputPixelBufferAdaptor(assetWriterInput: videoInput, sourcePixelBufferAttributes: attributes)
}

// Writing output video

import AVFoundation

extension SampleProvider {
func isFinished() -> Bool {
return counter == audios.count
}

func isVideoFinished() -> Bool {
return counter == infoVideo.count && qFrame == 0
}
}

class WriteVideo: NSObject, ObservableObject {
private let synthesizer = AVSpeechSynthesizer()
var canExport = false
var url: URL?
private var infoVideo = [(photoIndex: Int, frameTime: Float64)]()
var videoURL: URL?
var counterImage = 0 // Each word = add one unit

let semaphore = DispatchSemaphore(value: 0)

override init() {
super.init()
Misc.obj.selectedPhotos.removeAll()
Misc.obj.selectedPhotos.append(createBlueImage(CGSize(width: 1920, height: 1080)))
Misc.obj.selectedPhotos.append(createBlueImage(CGSize(width: 1920, height: 1080)))
synthesizer.delegate = self
}

func beginWriteVideo(_ texto: String) {
Misc.obj.selectedPhotos.removeAll()
Misc.obj.selectedPhotos.append(createBlueImage(CGSize(width: 1920, height: 1080)))
Misc.obj.selectedPhotos.append(createBlueImage(CGSize(width: 1920, height: 1080)))
DispatchQueue.global().async {
do {
try self.writtingVideo(texto)
print("Video created successful!")
} catch {
print(error.localizedDescription)
}
}
}

func writtingVideo(_ texto: String) throws {
infoVideo.removeAll()
var audioBuffers = [CMSampleBuffer]()
var pixelBuffer = Misc.obj.selectedPhotos[0].toCVPixelBuffer()
var audioReaderInput: AVAssetWriterInput?
var audioReaderBuffers = [CMSampleBuffer]()
var videoReaderBuffers = [(frame: CVPixelBuffer, time: CMTime)]()

// Restante do texto
let utterance = AVSpeechUtterance(string: texto)
utterance.voice = AVSpeechSynthesisVoice(identifier:  "pt-BR")

// Escreve texto restante
synthesizer.write(utterance) { buffer in
autoreleasepool {
if let buffer = buffer as? AVAudioPCMBuffer,
let sampleBuffer = buffer.toCMSampleBuffer(presentationTime: .zero) {
audioBuffers.append(sampleBuffer)
}
}
usleep(1000)
}

semaphore.wait()

// Diretório do arquivo
url = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0].appendingPathComponent("*****/output.mp4")

guard let url = url
else { return }

try FileManager.default.createDirectory(at: url.deletingLastPathComponent(), withIntermediateDirectories: true)

if FileManager.default.fileExists(atPath: url.path()) {
try FileManager.default.removeItem(at: url)
}

// Get CMSampleBuffer of a video asset
if let videoURL = videoURL {
let videoAsset = AVAsset(url: videoURL)
Task {
let videoAssetTrack = try await videoAsset.loadTracks(withMediaType: .video).first!
let audioTrack = try await videoAsset.loadTracks(withMediaType: .audio).first!
let reader = try AVAssetReader(asset: videoAsset)
let videoSettings = [
kCVPixelBufferPixelFormatTypeKey: kCVPixelFormatType_32BGRA,
kCVPixelBufferWidthKey: videoAssetTrack.naturalSize.width,
kCVPixelBufferHeightKey: videoAssetTrack.naturalSize.height
] as [String: Any]
let readerVideoOutput = AVAssetReaderTrackOutput(track: videoAssetTrack, outputSettings: videoSettings)

let audioSettings = [
AVFormatIDKey: kAudioFormatLinearPCM,
AVSampleRateKey: 44100,
AVNumberOfChannelsKey: 2
] as [String : Any]
let readerAudioOutput = AVAssetReaderTrackOutput(track: audioTrack,
outputSettings: audioSettings)
reader.add(readerVideoOutput)
reader.add(readerAudioOutput)
reader.startReading()

// Video CMSampleBuffer
while let sampleBuffer = readerVideoOutput.copyNextSampleBuffer() {
autoreleasepool {
if let imgBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) {
let pixBuf = imgBuffer as CVPixelBuffer
let pTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
videoReaderBuffers.append((frame: pixBuf, time: pTime))
}
}
}

// Audio CMSampleBuffer
while let sampleBuffer = readerAudioOutput.copyNextSampleBuffer() {
audioReaderBuffers.append(sampleBuffer)
}

semaphore.signal()

}
}

semaphore.wait()

let audioProvider = SampleProvider(audios: audioBuffers)
let videoProvider = SampleProvider(infoVideo: infoVideo)

let audioInput = createAudioInput(sampleBuffer: audioReaderBuffers[0])
let videoInput = createVideoInput(videoBuffers: pixelBuffer!)
let adaptor = createPixelBufferAdaptor(videoInput: videoInput)

let assetWriter = try AVAssetWriter(outputURL: url, fileType: .mp4)

assetWriter.add(videoInput)
assetWriter.add(audioInput)
assetWriter.startWriting()
assetWriter.startSession(atSourceTime: .zero)

// Add video buffer and audio buffer in AVAssetWriter
var frameCounter = Int64.zero
let videoReaderProvider = SampleReaderProvider(frames: videoReaderBuffers)
let audioReaderProvider = SampleReaderProvider(audios: audioReaderBuffers)
while true {
if videoReaderProvider.isFinished() && audioReaderProvider.isFinished() {
break // Video continuation  beloow the while
}

autoreleasepool {
if videoInput.isReadyForMoreMediaData {
if let buffer = videoReaderProvider.getNextVideoBuffer() {
adaptor.append(buffer.frame, withPresentationTime: buffer.time)
frameCounter += 1 // Used in other while, but add here
}
}
}

// Audio buffer
autoreleasepool {
if audioInput.isReadyForMoreMediaData {
if let buffer = audioReaderProvider.getNextAudioBuffer() {
audioInput.append(buffer)
}
}
}
}

// Now, add the audioBuffers and the idnexPhotoBuffers content
while true {
if videoProvider.isVideoFinished() && audioProvider.isFinished() {
videoInput.markAsFinished()
audioInput.markAsFinished()
break
}

autoreleasepool {
if videoInput.isReadyForMoreMediaData {
if let frame = videoProvider.moreFrame() {
frameCounter += 1

while !videoInput.isReadyForMoreMediaData {
usleep(1000)
}
adaptor.append(frame, withPresentationTime: CMTimeMake(value: frameCounter, timescale: 30))
} else {
videoInput.markAsFinished()
}
}
}

if audioInput.isReadyForMoreMediaData {
autoreleasepool {
if let buffer = audioProvider.getNextAudio() {
audioInput.append(buffer)
}
}
}

if let error = assetWriter.error {
print(error.localizedDescription)
fatalError()
}
}

assetWriter.finishWriting {
switch assetWriter.status {
case .completed:
print("Operation completed successfully: \(url.absoluteString)")
self.canExport = true
case .failed:
if let error = assetWriter.error {
print("Error description: \(error.localizedDescription)")
} else {
print("Error not found.")
}
default:
print("Error not found.")
}
}
}
}

extension WriteVideo: AVSpeechSynthesizerDelegate {
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, willSpeakRangeOfSpeechString characterRange: NSRange, utterance: AVSpeechUtterance) {
infoVideo.append((photoIndex: 0, frameTime: 1.0))
}

func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
infoVideo.append((photoIndex: counterImage, frameTime: 1.0))
self.semaphore.signal()
}
}

// Contains the creation of the blue image

import UIKit

// Global variables
func createBlueImage(_ tamanho: CGSize) -> UIImage {
UIGraphicsBeginImageContext(tamanho)
if let contexto = UIGraphicsGetCurrentContext() {
contexto.setFillColor(UIColor.blue.cgColor)
contexto.fill(CGRect(origin: .zero, size: tamanho))
}
let imagemAzul = UIGraphicsGetImageFromCurrentImageContext()
UIGraphicsEndImageContext()
return imagemAzul ?? UIImage()
}

// Save output video

import Photos

func saveVideo(url: URL, completion: @escaping (Error?) -> Void) {
PHPhotoLibrary.shared().performChanges({
let assetChangeRequest = PHAssetChangeRequest.creationRequestForAssetFromVideo(atFileURL: url)
let assetPlaceholder = assetChangeRequest?.placeholderForCreatedAsset
guard let albumChangeRequest = PHAssetCollectionChangeRequest(for: PHAssetCollection.fetchAssetCollections(with: .album, subtype: .any, options: nil).firstObject!) else {
    return
}
albumChangeRequest.addAssets([assetPlaceholder] as NSArray)
}, completionHandler: { success, error in
if success {
    print("Video saved successfully")
    completion(nil)
} else {
    print("Error: \(error?.localizedDescription ?? "")")
    completion(error)
}
})
}

import SwiftUI

@main
struct TaleTeller: App {
var body: some Scene {
WindowGroup {
editingProject()
}
}
}

// Buttons helpful

import SwiftUI
import PhotosUI

struct Movie: Transferable {
let url: URL

static var transferRepresentation: some TransferRepresentation {
FileRepresentation(contentType: .movie) { movie in
SentTransferredFile(movie.url)
} importing: { received in
let copy = URL.documentsDirectory.appending(path: "tmpmovie.mp4")

if FileManager.default.fileExists(atPath: copy.path()) {
try FileManager.default.removeItem(at: copy)
}

try FileManager.default.copyItem(at: received.file, to: copy)

return Self.init(url: copy)
}
}
}

struct editingProject: View {
@State private var text = ""
@ObservedObject private var writeVideo = WriteVideo()
@State private var selectedImage: PhotosPickerItem?
@State private var image: UIImage?
@State private var selectedVideo: PhotosPickerItem?
@State private var selectedVideoURL: URL?

enum LoadState {
case unknown, loading, loaded(Movie), failed
}

@State private var loadState = LoadState.unknown

var body: some View {
VStack {

TextEditor(text: $text)
Button("Generate Video") {
Task(priority: .background) {
if selectedVideoURL != nil {
writeVideo.videoURL = selectedVideoURL
} else {
writeVideo.videoURL = nil
}

await writeVideo.beginWriteVideo(text)
}
}
.disabled(text == "")

if writeVideo.canExport {
Button("Save Vídeo") {
if let url = writeVideo.url {
saveVideo(url: url) { error in
if let error = error {
print(error.localizedDescription)
} else {
print("Video exported successful!")
}
}
}
}
}

Button("Request Permission to Create Video") {
let solicitou = PHPhotoLibrary.requestAuthorization(for: .readWrite) {
status in
switch status {
case .authorized:
Text("OK, autorized.")
default:
Text("You is not authorized.")
}
}
}

if writeVideo.canExport,
let url = writeVideo.url {

PhotosPicker("Library", selection: $selectedVideo, matching: .videos)

switch loadState {
case .unknown:
EmptyView()
case .loading:
ProgressView()
case .loaded(let movie):
Text("Vídeo carregado: \(movie.url)")
case .failed:
Text("Importação flahou.")
}

if let videoURL = selectedVideoURL {
Text("Vídeo disponível.")
}

//----------
}
} // Fim VSTack {

.onChange(of: selectedVideo) { newValue in
Task {
do {
loadState = .loading
if let movie = try await selectedVideo?.loadTransferable(type: Movie.self) {
selectedVideoURL = movie.url
loadState = .loaded(movie)
} else {
loadState = .failed
}
} catch {
loadState = .failed
}
}
}




}
}

// Writing the galery video in output video

import AVFoundation

class SampleReaderProvider {
private var videoBuffers: [(frame: CVPixelBuffer, time: CMTime)] = []
private var audioBuffers: [CMSampleBuffer] = []
private var  counter = 0

init(audios: [CMSampleBuffer]) {
self.audioBuffers = audios
}

init(frames: [(frame: CVPixelBuffer, time: CMTime)]) {
self.videoBuffers = frames
}

func getNextVideoBuffer() -> (frame: CVPixelBuffer, time: CMTime)? {
if counter == videoBuffers.count {
return nil
}

let buffer = (frame: videoBuffers[counter].frame, time: videoBuffers[counter].time)

counter += 1

return buffer
}

func getNextAudioBuffer() -> CMSampleBuffer? {
if counter == audioBuffers.count {
return nil
}

let buffer = audioBuffers[counter]

counter += 1

return buffer
}

func isFinished() -> Bool {
if videoBuffers.count > 0 {
return counter == videoBuffers.count
} else if audioBuffers.count > 0 {
return counter == audioBuffers.count
} else {
return false
}
}

}

import UIKit

class Misc: ObservableObject {
@Published var selectedPhotos: [UIImage] = []
static let obj = Misc()

private init(){}
}

// Output video provider writing the get content in AVSpeechSynthesizer and Misc.selecctedPhotos

import UIKit
import AVFoundation

class SampleProvider {
var audios = [CMSampleBuffer]()
var infoVideo = [(photoIndex: Int, frameTime: Float64)]()
private var frame: CVPixelBuffer?
var qFrame = 0
let frameDuration = CMTime(value: 1, timescale: 30)
private var frameTime = CMTime.zero
var extraFrameTime = Float64.zero
var counter = 0

init(audios: [CMSampleBuffer]) {
self.audios = audios
}

init(infoVideo: [(photoIndex: Int, frameTime: Float64)]) {
self.infoVideo = infoVideo
}

func getNextAudio() ->  CMSampleBuffer? {
if audios.count == 0 {
return nil
}

if counter == audios.count {
return nil
}

let tmp = audios[counter]

counter += 1

return tmp
}

private func startNewFrame() -> Int? {
if infoVideo.count == 0 {
return nil
}

if counter == infoVideo.count {
return nil
}

// Get the image index
guard let photoIndex =  infoVideo[counter].photoIndex as? Int
else { return nil }

let image = Misc.obj.selectedPhotos[photoIndex]
frame = image.toCVPixelBuffer()

counter += 1

return 30 // 30 fps
}

func moreFrame() -> CVPixelBuffer? {
if qFrame == 0 {
// Get 30 fps of a blue image
if let newQuantFrame = startNewFrame() {
qFrame = newQuantFrame
} else {
return nil
}
}

qFrame -= 1
return frame
}

}

Hello @Paulo_DEV01,

Thanks for bringing this question to the community!

A couple of follow-up questions:

  1. Can you provide a link to download your example project? Assembling it from the swift files would be quite cumbersome, and might miss something essential.

  2. You mentioned that your app crashes, do you have a crash log, or an error message?

Best regards,

Greg

Hello!

I have a few questions' I don't use GitHub since I am deafblind and a Braille display user. I avoid overloading my usage with extra resources, as accessibility tends to complicate things' If I provide a link, should the project be temporary, or must I leave it there permanently?

As for a crash log, I don't have one, but the project is very small' I don’t have an error message either'

Thank you!

Hello!

See if this link works .

Hello Paulo,

Unfortunately that link did not work, it says I am either not authorized, or the content has been removed.

I avoid overloading my usage with extra resources, as accessibility tends to complicate things' If I provide a link, should the project be temporary, or must I leave it there permanently?

I understand, feel free to submit this as a code-level support request. When you submit your request, please mention this thread as well! Once you submit your request, you will be able to attach your project in a follow-up.

Best regards,

Greg

Hello,
I tried to open the code-level support request twice, but I am not receiving the automatic email.

Hello,

I am not receiving automatic email when I send messages to the Developer Technical support.

Help me.

Hello,

I have already sent the attachment via email. Now I am just waiting for a response.

Hello,

I am deaf-blind and use a wheelchair. I received guidance from Apple Developer Support that is beyond my capacity. It involved viewing a panel in Xcode to fix a bug in my app, but this panel is not accessible. I need special attention from Apple Support. I’m worried they may not assist me fully in this. I can only follow instructions related to source code, not Xcode panels very well. Please inform Support that I am deaf-blind, use a Braille display, and am mainly limited to working with source code text. Panels like Instruments > Allocations are challenging in Braille. Please help me!

Writing video using AVAssetWriter, AVAssetReader, and AVSPEECHSYNTHESIZER
 
 
Q