"Baking together" two audio tracks into one for drag-and-drop

Question

m.gansrigler OP

Created Jan ’25

Replies 2

Boosts 0

Participants 1

Hi all,

with my app ScreenFloat, you can record your screen, along with system- and microphone audio.
Those two audio feeds are recorded into separate audio tracks in order to individually remove or edit them later on.

Now, these recordings you create with ScreenFloat can be drag-and-dropped to other apps instantly. So far, so good, but some apps, like Slack, or VLC, or even websites like YouTube, do not play back multiple audio tracks, just one.

So what I'm trying to do is, on dragging the video recording file out of ScreenFloat, instantly baking together the two individual audio tracks into one, and offering that new file as the drag and drop file, so that all audio is played in the target app.

But it's slow. I mean, it's actually quite fast, but for drag and drop, it's slow.
My approach is this:

"Bake together" the two audio tracks into a one-track m4a audio file using AVMutableAudioMix and AVAssetExportSession
Take the video track, add the new audio file as an audio track to it, and render that out using AVAssetExportSession

For a quick benchmark, a 3'40'' movie, step 1 takes ~1.7 seconds, and step two adds another ~1.5 seconds, so we're at ~3.2 seconds. That's an eternity for a drag and drop, where the user might cancel if there's no immediate feedback.
I could also do it in one step, but then I couldn't use the AV*Passthrough preset, and that makes it take around 32 seconds then, because I assume it touches the video data (which is unnecessary in this case, so I think the two-step approach here is the fastest).

So, my question is, is there a faster way?
The best idea I can come up with right now is, when initially recording the screen with system- and microphone audio as separate tracks, to also record both of them into a third, muted, "hidden" track I could use later on, basically eliminating the need for step one and just ripping the two single audio tracks out of the movie and only have the video and the "hidden" track (then unmuted), but I'd still have a ~1.5 second delay there. Also, there's the processing and data overhead (basically doubling the movie's audio data).

All this would be great for an export operation (where one expects it to take a little time), but for a drag-and-drop operation, it's not ideal.

I've discarded the idea of doing a promise file drag, because many apps do not accept those, and I want to keep wide compatibility with all sorts of apps.

I'd appreciate any ideas or pointers.

Thank you kindly,
Matthias

Boost

Answer 1

m.gansrigler OP

Feb ’25

Any pointers or ideas for different approaches would be much appreciated - thank you : )

0

Answer 2

m.gansrigler OP

5d

Or is there some metadata I could attach to the audio tracks to indicate that they're not alternates, but should all be played together?

0