FxPlug - Turning a command line imaging app to a generator or filter plugin, how to access an image’s absolute pixel values?

Question

Created Jan ’22

Replies 9

Boosts 0

Views 1.8k

Participants 2

Hi,

Am new to FxPlug development, and interested in turning a Rust command line image generating app, to either a generator or filter.

Have been fiddling with creating different example filters, getting to know Swift, and have tested several working color filters... So far, these have been rather basic, e.g. using the FxBrightness template, adding custom code to edit pixel values 1:1 from source.

Am trying to dive in a little deeper, and am finding figuring out the next steps a little overwhelming, challenging... First example:

In my CL app, one of the result options evaluates a new pixel output value based on the absolute pixel position in an image. That means using the X and Y position of a pixel to determine what value to generate and render. However, in my tests so far, have been unable to figure out how to get the current image pixel position for use in the fragment shader in metal, as have only managed to get the tiled image’s position, and not the image’s absolute position. This results in seeing a rendered image with 4x the same tiled output images within it, rather than a single image with the desired output. How can I access the absolute image pixel position, and not the absolute tile’s position in the fragment shader in metal (since the image framework is depreciated in favor of tiled images)?

In other words, if am looking at my HD output in Motion, and where the cursor points to a pixel at position x: 200, y: 200, am trying to get that value for use in metal, and not the value of the tiled image which returns something like x: -280, y: -70. Any ideas?

Thank you for your suggestions.

Note: Am not a professional developer, more an experienced digital artist, trying to make tools others may find interesting to use...

Boost

Answer 1

Engineer OP

Apple

Mar ’22

In the FxImageTile object there is a property called imagePixelBounds and another called tilePixelBounds. These tell you the image's overall bounds and the bounds of the tile you're currently being asked to render. You should be able to use the difference between their lower-left corners as an offset for pixel positions in your fragment shader.

0

Answer 2

JanMichael OP

Apr ’22

Hi Darrin,

Thanks for your reply. Without going in a too-long winded reply, will just say that have only partly been able to achieve the desired results, as per your suggestion. In theory, this should work, of course. But in practice, have run into several issues that am perhaps unqualified to address, that don’t produce the results am expecting. Will list a few of the issues have run into in the past weeks:

The plugin am trying to write, requires evaluating very high sin() function values (much greater than millions, rather 10 with dozens of 0s behind it) to produce the results am aiming to render. Alas, once the numbers hit beyond 1,000,000 or so (don’t have an exact number to share with you), it starts rendering unexpected results, when the sin() evaluation is called through the metal fragment shader rather than Swift or Rust... And don’t know why. Perhaps sin() math in Metal prioritizes speed over accuracy?

So to test an alternative method, changed the structure of the plugin to generate the pixel data from calling sin() from Swift instead, and passing along the results as a texture into the fragment shader of an FxGenerator plugin. This then renders correctly, meaning, renders images as expected, seeming to match the output gotten from the command line app developed with Rust. However, this method is incredible slow to render and update (maybe 1-2 secs with each change of parameter and texture update, where the GPU is almost instantaneous), especially with higher resolution images, as the entire image is passed as a texture to the fragment shader, which then does absolutely nothing with it, but to sample and return the source as the output... Don’t have a clue how to bypass using Metal all together, and rendering a pixel array straight to the plugin output, which would be easiest option. To test this, added a parameter to the plugin to switch between evaluating sin() from CPU or Metal, see screenshots (1.a., 1.b.) below. In the CL line in Rust, rendering output images as PNGs and EXRs takes tiny fractions of a second, but does it without API overhead.

Screen Shot 1A - Swift sin function example without errors - SLOW copy.png

Screen Shot 1B - Metal sin function example with errors - FAST copy.png

This plugin would ideally be an FxGenerator, rather than FxFilter, however, have encountered an issue with evaluating the pixel position in the former, as there are no source image bounds in the FxGenerator (or did I miss this?) from which to evaluate which pixel to render, instead, use the destination image/tile bounds instead, but these evaluate pixel positions differently when pixels go off screen. For example, create a solid color layer that 2100x1080 in project that’s only 1920x1080, when you evaluate the destination bounds, it evaluates the image as 1920 wide, not 2100! This is a problem if my image needs to render an absolute co-ordinate pixel position based on 2100 and not 1920 width, especially if the user then changes the layers position in the properties. Have noticed, may need to evaluate this from 0,0 at the center of the image, rather than from the bottom left corner, in order for this to work properly... But haven’t gotten around to solving this yet, have been busy with other work these past few weeks too.
Perhaps it’s my own lack of expertise, but have encountered what appeared to be render errors from evaluating vertex positions, am guessing, because, am assuming, these are evaluated as triangles within a quad, rather than evaluating a quad that matches an actual pixel... Would much prefer being able to evaluate a pixel position interpreted as a quad by default, but don’t know if it’s possible to do this, as am used to from handling pixels in arrays on the CPU. Even if it means a slight performance hit on the GPU, would imagine it’s probably be worth to get an absolutely correct position that matches the quad originally. (This is speculative, as am by no means qualified to judge this... Working with pixel squares is just much easier, it seems). Also mention this, because in testing, when passing the pixel position through as a pixel value, when using the picker to evaluate the pixel value in the viewport, do get changing pixel values dragging the picker across the image, rather than the absolute or fixed values expected. For example, across a pixel line, where would expect the pixel value to remain identical for each pixel, will observe variations of ±0.003, which should be ±0.00000000. So am not sure what to do with that yet... Or what it means, but it’s unexpected...

If that doesn’t make sense... Understand, am needing absolute pixel position values to properly sample, evaluate in a sin() function, and if these are calculated as vertexes, and then “reconstituted” to a rounded value, it’s no good, I get errors (see screenshot 2). Where instead of smooth gradient forms, these are broken in triangular rendering errors along the lines. Of course, this was from an early attempt at developing this, so may have made so programming errors to cause this... But this is what came out at the time.

Vertex Errors copy.png

For info: Am on a 2020 MacBook Pro, Intel Quad-core i5 with Iris Plus Graphics 645, so not the latest M1 with big memory texture.

If you have any further insights, would greatly appreciate it. As am doing this during free time, and having to learn as I go along, am rather limited by my ability to try a lot of options... Responding takes time in between other projects... Thank you for your help!

0

Answer 3

Engineer OP

Apple

Apr ’22

Yes, Metal supports both "fast" and "precise" variants of its math functions. See page 139 of the Metal Shader Language Spec. You can write out precise::sin(x) instead of just sin(x) to get the precise version. I suspect that will help with that part of the problem.

I'm not entirely sure I'm following everything you've asked. In Motion, when playing back in the canvas, if it's set so that the scene's size is bigger than the canvas's size (so only a portion of the output is shown), we'll only render the area in the canvas. We should tell your plug-in that's what we're doing by setting the output image's imagePixelBounds and tilePixelBounds appropriately. If we're not doing that, that sounds like a bug and you should file feedback about it. Final Cut Pro does not do this optimization and always renders the entire frame. (It has to because it's writing out the rendered frames to disk so that playback the next time around can use the cached frames.)

Your fragment shader should get properly interpolated fragment coordinates regardless of how many triangles make up the quad you're drawing. For example, if you set the texture coords for the vertices of a quad to be 1920x1080 and are displaying it at 1920x1080, the texture coordinates received by your fragment shader should correspond 1:1 to the pixel locations you'd expect. There's no weirdness because it's divided into triangles. We have hundreds of shaders that work this way, so if you're experiencing issues with that, maybe post some examples of what you're doing vs. what you're seeing and we can help debug it further.

0

Answer 4

JanMichael OP

Apr ’22

Hi Darrin,

Thanks again for you reply, appreciate your help. Since the first part of your answer is the easiest to test quickly, recompiled the plugin which generated the 1A & 1B screenshots, this time, changing the code in the fragment shader to use precise::sin() instead of sin(), and then took a screenshot (1C) of the result you can see below. Alas, the problem of rendering errors persists, causing the large black and white banding seen on the right. (Don’t know how to check if the render settings are overriding precise with fast.)

Screen Shot 1C - Metal PRECISE sin function example with errors - FAST copy.png

Let me re-explain the process am using, perhaps there’s something else that could be causing this issue that’s being overlooked?

CPU method: Generate an array of pixels at render time using SWIFT, where red values represent a horizontal gradient, green values a vertical gradient, and blue values are the result of sin(red * green), that’s it. The array is sent as a texture through the fragment shader, where it gets sampled, but otherwise nothing else changes. The output appears correctly, matches CL app.

GPU method: Generate the same array as above, but this time, in the fragment shader, replace the blue values with the result of sin(red_sample * green_sample) as float values from the texture sample (Screen shot 1B). Screenshot 1C uses the precise::sin method instead... The output doesn’t match the CPU generated output, instead rendering large black and white bands.

The values that get inputted into the sin equation are at least 1,000,000 and more before errors become visible. What am not sure how to test, would be to simply compare the numerical output of Swift’s sin function with Metal’s fast and precise sin functions. Was able to compare values between Swift and Rust’s sin functions, and the numerical results appear to be a consistent match.

So, if metal’s sin or precise::sin function accurately matches the results of Swift, then it must be something else, but then, what?

Will get back to testing the bounding area once again, when have more time to develop the plugin further, have to focus on other things for a while, unfortunately. Will test anything you suggest that can be done fairly quickly—without being a genius... Thanks!

0

Answer 5

Engineer OP

Apple

Apr ’22

My apologies. I asked the Metal team about this, and only now discovered that Metal uses 24-bit floats, not 32-bit. I was not aware of that. That explains the problem.

So that leaves the question of what to do about it. Is it possible in your calculations to keep your values small by reducing them modulo 2 * pi where necessary? Like could they be reduced before they get to be over 1,000,000?

Alternatively, there are (CPU-based) libraries that use multiple floats or doubles to simulate higher precision. If you have the time and patience, you might be able to reimplement some of the basic operations of those libraries on the GPU. (For example, the quad_float type from NTL, the Number Theory Library does this. I'd add a link, but the forum software is telling me I can't.) (And as always, make sure you understand the license of software like that before copying it.)

If none of those solutions are acceptable, then you'll have to resort to what you've done in your experiments above and do the implementation on the CPU. Note that you can speed up CPU implementations using vector instructions if you aren't already using them, and also by having multiple threads do your work. That can go a long way to improving the performance as well.

0

Answer 6

Engineer OP

Apple

Apr ’22

Sorry, I have conflated 2 unrelated things in my above response. Metal uses full 32-bit floats, but all 32-bit floats only have 24 non-exponent bits, so you won't get as much precision after ~1,000,000 on any data that uses 32-bit floats. Sorry for the confusion. It's likely that Rust and Swift are using Doubles under the hood and that's why you're seeing a difference.

0

Answer 7

JanMichael OP

Apr ’22

Hi Darrin,

Thank you for your answer(s). This probably explains a few observations encountered while first learning about writing FxPlugins and using Metal, such as the lack of double support, only float precision, had already wondered if it would be an issue at the time.

The plugin am writing renders a simulation of a simple equation, with sin() input values tested all the way up to 10 with 46 zeros... Why so many? Because it’s where this simulation does something unusual, it suddenly reveals “the end” toward these values, and I don’t actually know if it’s because that’s where the double precision ends, or whether that’s where the equation reaches its limit!

Unfortunately, as am not mathematically well-versed, am kind of stuck making use of what’s available, at least for the time being. Having taken a peak at other people’s attempts to implement a custom sin function, seems it’s a rabbit hole best ignored for me.

Am going to try other routes at improving performance, have already succeeded with using a single channel r23float metal texture instead of four channel rgba32float as the result is essentially black & white, so don’t need color channels. Haven’t been able to compare speed results, but copying the red channel to green, blue and 1.0 for alpha from/to the fragment shader sample feels a little faster than creating a four channel texture and round-tripping it. Although it’s still not nearly as fast as would like it to be... At least the result is accurate and predictable. Though for the life of me, still don’t get why it’s necessary to round-trip via the GPU.

Note: Tried sending a 16bit UInt texture instead of float to see if this would be speed things up too, but it only rendered black...

Had noticed in a reply to another forum thread, you mentioned using/creating an iosurface, is this worth looking into customizing? In other words, is there a way to create an iosurface directly, that then bypasses all the overhead and renders straight to Motion?

Ideally, would simply like to populate an array with float pixel values that’s instantly used as image data inside the app, without doing anything over GPU, thus maximizing the CPU result without using additional vector libraries, etc. Is there a way to do this?

Again, simply creating and saving an image on the CL app takes microseconds, even at high resolution, so the CPU is already fast! What am trying to avoid, is having to render out thousands of images to create video sequences that then have to get edited, but then without the benefit of making changes on the fly without re-rendering everything, and also, requiring many TBs of storage...

The whole idea of making a Motion plugin is to be able to edit in Motion/FCP, rendering results on the fly in realtime, on my laptop!

Hoping this idea doesn’t go poof!

What do you think? Thanks again for your feedback and answers, much appreciate the time you’re taking to help me with this now.

0

Answer 8

Engineer OP

Apple

Apr ’22

Well floating point is strange, to say the least! While even doubles don't top out at 10^42 (the maximum double is closer to 10^308), they are not evenly distributed like integers. (There's an interesting website called "Float Exposed" that lets you experiment, if you're interested.) Near zero, the values are closer together, and the farther away you get from zero, the more spread out they become such that around 10^53 you can no longer represent consecutive integers with a double precision floating point value. So 10^42 is certainly plausible as cutoff point for getting reasonable values out of the sin() function.

Regarding the GPU — internally Motion and FCP do almost all of their rendering on the GPU, so regardless of where you render your images, they need to end up on the GPU to go through the rendering pipeline in those apps. They should be fairly efficient in how they operate. For an FxPlug, they'll allocate an IOSurface (wrapped in an FxImageTile, as you've seen), and pass that over to the plug-in's process. You can write to it on the CPU by calling the IOSurface's -lockWithOptions:seed: method. You can then get the base address and bytes per row for writing your data to the image. The when you unlock the IOSurface, it will copy the data to the GPU for you. Just be aware that the data is expected to be in 16-bit per channel half-float format. There are many good references on the net for how to convert between single or double-precision float and half float.

0

Answer 9

JanMichael OP

May ’22

Hi Darrin,

Thanks again for you answers, very appreciated. Must apologize for some of my earlier comments and attempts at optimizing this plugin, as am not a CS major, approach the idea of optimization from a more app-user centric view, than the programmer’s reality. Forgive me if some of my suggestions are dead-ends, foolish, ignorant, or not valid in the available context of this developer API... For example, my trying to use an r32Float type for MTL instead of rgba32Float was from thinking less channel data would be faster.

Thanks for validating that the number ranges am using in the sin() function are within an acceptable margin of floating precision, means that as long as the texture data is populated with values within this range, feel more confident that the results are correct. What you describe about these numbers becoming ever more spread out farther from zero, and closer when near, makes sense.

Understood, everything in Motion/FCP eventually has to end up on the GPU. Guess the big issue for me, is how to get the data to the GPU as efficiently as possible. Current attempts at going by way of texture is generally way too slow for fast, interactive use... Reminds me of the “old days” of working creatively on Silicon Graphics refrigerators compositing video in 8-bit/channel with limited texture memory, bus with limited bandwidth, etc. Meant interactivity was SLOW! Even it was cutting edge technology at the time!

Sure everything’s gotten faster and the technology’s improved significantly, but this feels like am back in those days, decades later. Feels like a similar issue as back then, not having texture memory, or enough of it, to render what the CPU was wanting to send it...

So have tried a few of your suggestions, but these raised more questions as well, off the top of my head, this is what’s coming up:

Am using an Intel MBP Pro 2020, and when looking into using Float16 data, get a message saying this type in SWIFT on MacOS isn’t (yet) supported, meaning, that I can’t directly generate half-float data unless using an M compatible chip. That’s a problem... Found a few solutions to the problem, which import the Accelerate framework, making use of vImageConvert_PlanarFtoPlanar16F to convert an array from float to half. (Other solutions which directly convert from float to half at a bytes level are a little too much for me to work through at this time) This seems to work, but then, brings up another question entirely, of “why bother doing this?”

In the SDK GradientCheckerboard example, the plugin creates a MTLPixelFormat.rgba32Float and then replaces the texture data, which to me means, that it’s replacing the iosurface data inside the FxImageTile wrapper. If it’s doing this, isn’t it then having to translate the float data to half to be compatible with the iosurface anyway? Q: “Is there any benefit to my doing this manually ahead of replacing the texture data?” As this is how I approached my own plugin by reinterpreting the example code in Swift from Obj-C.

In other words, would it really make any difference to go this route of replacing half float data in iosurface directly vs sending float?

Started fresh again with a new plugin template from the latest FxPlug SDK 4.2.4, which have modified to be an FxGenerator based on evaluating the code from the GradientCheckerboard example, and trying new approaches from scratch, including sending half float data by converting a float array (using the Accelerate framework) into MTLPixelFormat.rgba16Float and replacing the texture. But no matter if am sending half into rgba16Float, or float into rgba32Float, am getting a short series of errors such as the following one turning up, when I build and run the plugin with the scheme set to launch Motion instead of the wrapper app, no idea why (yet):

2022-05-03 10:35:47.677780-0700 G Word XPC Service[8941:670499] Got an unexpected pixel format in the IOSurface: 0x52476841

This error is generated from the MetalDeviceCache.swift file, where it assesses the MTLPixelFormat and sends this error is the format is rgba32Float for example, but then why am I getting this error if am creating a rgba16Float format texture?

Confusing matters even more, went back to my original test generator plugin, and tried building and running it with Motion set as the app in the scheme, to see if the same error were happening, and in this manner, the plugin won’t even build, rather stops at the breakpoint 1.1 on line 122: “let deviceCache = MetalDeviceCache.deviceCache” when the plugin is used on a layer in Motion... But when I build and run it using the wrapper app itself, launch Motion separately, then the plugin renders the image as expected to!

Will continue experimenting... As would be great to get this working, even if it means accepting the slower speed, which am sure would be improved by upgrading to a faster computer with M chips later, only, would rather get the code working optimally first.

On another note, a simpler question...

Is there any way to access the width and height parameter data inside the generator plugin? For example, applying this generator to my composition, would like to be able to read width and height data it uses, so to manage the texture size independently, rather than rely on the destinationImage’s tile, image bounds, and having to recalculate them, as generators don’t have sourceImage info?

Thanks again for you time and willingness to help. And apologies in advance for my lack of experience in developing and this SDK.

0