Streaming via h264 encoded RTP

Hello,

I am attempting to simultaneously stream video to a remote client and run inference on a neural network (on the same video frame) locally. I have done this in other platforms, using Gstreamer on linux and on android using libstreaming for compression and packetization. I've attempted this now on iPhone using ffmpeg to stream and a capture session to feed the neural network but I run into the problem of multiple camera access.

Most of the posts I see are concerned with receiving RTP streams in iOS but I need to do the opposite. As I am new to iOS and Swift I was hoping someone could provide method for RTP packetization? Any library recommendations or example code for something similar?

Best,

Accepted Reply

Replying to my own post since this doesn't seem to be getting much traction. I was able to successfully stream video to my client via FFmpeg, but it's a bit of a duct-tape solution. Using the ffmpeg pipeline, I write my pixelBuffers via FileHandle (updatingAtPath) to a temporary location, then using the -stream_loop -1 option in ffmpeg to constantly read the file. FFMPEG pretty much takes care of the rest (encoding, packetizing, etc). Some other things to note, I write just the raw bytes to the file handle and read in the format 'rawvideo'. If anyone can think of a better way for realtime video streaming I am all ears. Right now my latency is in the order of ~170 ms.

Replies

Replying to my own post since this doesn't seem to be getting much traction. I was able to successfully stream video to my client via FFmpeg, but it's a bit of a duct-tape solution. Using the ffmpeg pipeline, I write my pixelBuffers via FileHandle (updatingAtPath) to a temporary location, then using the -stream_loop -1 option in ffmpeg to constantly read the file. FFMPEG pretty much takes care of the rest (encoding, packetizing, etc). Some other things to note, I write just the raw bytes to the file handle and read in the format 'rawvideo'. If anyone can think of a better way for realtime video streaming I am all ears. Right now my latency is in the order of ~170 ms.

Coming back after a few months to update: this solution was terrible and consumed entirely too much CPU. As I required my app to fulfill more tasks concurrently it was evident I could not continue to use FFMPEG. I instead take the compressed camera frames from the VTCompressionSession and manually encode them into h264. I could not find any built in // path of least resistance way of doing this. After the packet is encoded I send it over via a socket. It's quite simple but not as intensive as the FFMPEG was (since I was essentially writing a file every frame). If anyone finds this post let me know of some other ways of doing this that might be more performant.