Core Image for depth maps & segmentation masks: numeric fidelity issues when rendering CIImage to CVPixelBuffer (looking for Architecture suggestions)

Hello All,

I’m working on a computer-vision–heavy iOS application that uses the camera, LiDAR depth maps, and semantic segmentation to reason about the environment (object identification, localization and measurement - not just visualization).

Current architecture

I initially built the image pipeline around CIImage as a unifying abstraction. It seemed like a good idea because:

  • CIImage integrates cleanly with Vision, ARKit, AVFoundation, Metal, Core Graphics, etc.
  • It provides a rich set of out-of-the-box transforms and filters.
  • It is immutable and thread-safe, which significantly simplified concurrency in a multi-queue pipeline.

The LiDAR depth maps, semantic segmentation masks, etc. were treated as CIImages, with conversion to CVPixelBuffer or MTLTexture only at the edges when required.

Problem

I’ve run into cases where Core Image transformations do not preserve numeric fidelity for non-visual data.

Example:

Rendering a CIImage-backed segmentation mask into a larger CVPixelBuffer can cause label values to change in predictable but incorrect ways.

This occurs even when:

  • using nearest-neighbor sampling
  • disabling color management (workingColorSpace / outputColorSpace = NSNull)
  • applying identity or simple affine transforms

I’ve confirmed via controlled tests that:

  • Metal → CVPixelBuffer paths preserve values correctly
  • CIImage → CVPixelBuffer paths can introduce value changes when resampling or expanding the render target

This makes CIImage unsafe as a source of numeric truth for segmentation masks and depth-based logic, even though it works well for visualization, and I should have realized this much sooner.

Direction I’m considering

I’m now considering refactoring toward more intent-based abstractions instead of a single image type, for example:

  • Visual images: CIImage (camera frames, overlays, debugging, UI)
  • Scalar fields: depth / confidence maps backed by CVPixelBuffer + Metal
  • Label maps: segmentation masks backed by integer-preserving buffers (no interpolation, no transforms)

In this model, CIImage would still be used extensively — but primarily for visualization and perceptual processing, not as the container for numerically sensitive data.

Thread safety concern

One of the original advantages of CIImage was that it is thread-safe by design, and that was my biggest incentive.

For CVPixelBuffer / MTLTexture–backed data, I’m considering enforcing thread safety explicitly via:

  • Swift Concurrency (actor-owned data, explicit ownership)

Questions

For those may have experience with CV / AR / imaging-heavy iOS apps, I was hoping to know the following:

  • Is this separation of image intent (visual vs numeric vs categorical) a reasonable architectural direction?
  • Do you generally keep CIImage at the heart of your pipeline, or push it to the edges (visualization only)?
  • How do you manage thread safety and ownership when working heavily with CVPixelBuffer and Metal? Using actor-based abstractions, GCD, or adhoc?
  • Are there any best practices or gotchas around using Core Image with depth maps or segmentation masks that I should be aware of?

I’d really appreciate any guidance or experience-based advice. I suspect I’ve hit a boundary of Core Image’s design, and I’m trying to refactor in a way that doesn't involve too much immediate tech debt, remains robust and maintainable long-term.

Thank you in advance!

Core Image for depth maps & segmentation masks: numeric fidelity issues when rendering CIImage to CVPixelBuffer (looking for Architecture suggestions)
 
 
Q