Can `MTLTexture` be used to store 5-D input tensor?

I'm trying to implement a pytorch custom layer [grid_sampler] (https://pytorch.org/docs/1.9.1/generated/torch.nn.functional.grid_sample.html) on GPU. Both of its inputs, input and grid can be 5-D. My implementation of encodeToCommandBuffer, which is MLCustomLayer protocol's function, is shown below. According to my current attempts, both value of id<MTLTexture> input and id<MTLTexture> grid don't meet expectations. So i wonder can MTLTexture be used to store 5-D input tensor as inputs of encodeToCommandBuffer? Or can anybody help to show me how to use MTLTexture correctly here? Thanks a lot!


- (BOOL)encodeToCommandBuffer:(id<MTLCommandBuffer>)commandBuffer
            inputs:(NSArray<id<MTLTexture>> *)inputs
           outputs:(NSArray<id<MTLTexture>> *)outputs
            error:(NSError * _Nullable *)error {
  NSLog(@"Dispatching to GPU");
  NSLog(@"inputs count %lu", (unsigned long)inputs.count);
  NSLog(@"outputs count %lu", (unsigned long)outputs.count);
  id<MTLComputeCommandEncoder> encoder = [commandBuffer
      computeCommandEncoderWithDispatchType:MTLDispatchTypeSerial];
    assert(encoder != nil);
   
  id<MTLTexture> input = inputs[0];
  id<MTLTexture> grid = inputs[1];
  id<MTLTexture> output = outputs[0];

  NSLog(@"inputs shape %lu, %lu, %lu, %lu", (unsigned long)input.width, (unsigned long)input.height, (unsigned long)input.depth, (unsigned long)input.arrayLength);
  NSLog(@"grid shape %lu, %lu, %lu, %lu", (unsigned long)grid.width, (unsigned long)grid.height, (unsigned long)grid.depth, (unsigned long)grid.arrayLength);
  if (encoder)
  {
    [encoder setTexture:input atIndex:0];
    [encoder setTexture:grid atIndex:1];
    [encoder setTexture:output atIndex:2];
     
    NSUInteger wd = grid_sample_Pipeline.threadExecutionWidth;
    NSUInteger ht = grid_sample_Pipeline.maxTotalThreadsPerThreadgroup / wd;
    MTLSize threadsPerThreadgroup = MTLSizeMake(wd, ht, 1);
    MTLSize threadgroupsPerGrid = MTLSizeMake((input.width + wd - 1) / wd, (input.height + ht - 1) / ht, input.arrayLength);

    [encoder setComputePipelineState:grid_sample_Pipeline];
    [encoder dispatchThreadgroups:threadgroupsPerGrid threadsPerThreadgroup:threadsPerThreadgroup];
    [encoder endEncoding];
     
  }
  else
    return NO;

  *error = nil;
  return YES;
}

Replies

Hi stx,

The MPSCustomLayer API is actually pretty old and MPS graph would be the preferred approach to solving this.

However, to answer your question; Metal supports 3D and 2D array textures. For the purpose of storing generic data like a tensor , 3D and 2D Array textures have the same capabilities (3D textures are better for storing image data). Metal's 4 channel pixel formats such as RGBAFloat32 can be used to store an extra dimension (so long as the dimension is less than 4).

You can combine the 1st and 2nd dimensions and increase the number of layers in an array texture. The height of each 2D slice in the array would be the 3rd Dimension and the width would be the 4th dimension. Finally, the 4 channels of each pixel can serve as the 5th dimension.

So for instance if you have 5D data of 10x6x8x3x4 you can create a 2D array texture using an RGBA32Float with 10x6 (60) slices with each 2D slice being 8x3 and store the last dimension in each of the 4 channels of pixel.

As mentioned, MPSGraph would probably be a better API to implement this so please lets us know if you need help adopting that instead.

  • Thank you very much for you reply!

    Firstly, according to step 4 in the guide, the Swift implementation must provide the API endpoints specified in the MLCustomLayer interface. So i guess MPS graph can't be used here? Meanwhile, there is no Declaration introduction in MPS graph. It seems little hard to figure out how to use it.

    Secondly, id<MTLTexture> in encodeToCommandBuffer is pre-defined. Although Metal supports 3D and 2D array textures, i think we can only use 2D array here, whose pixelFormat = MTLPixelFormatRGBA16Float and depth = 1. I do some tests, the results show that encodeToCommandBuffer can convert the 5-D input to 4-D data automatically. For instance a 5D input data of 16x8x8x32 x32, the converted 4-D data inside the function is width = 32, height = 32, arrayLength=256. However, i'm not fully understand how the convert process happens except the arrayLength seems be calculated by 16x8x8/4. Besides, the grid_sampler function calculation process needs the 5-D input dimension information according to the [official implementation] (https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/GridSampler.cu). I'm not sure how to used the converted 4-D data inside my kernel since the i don't know the relationship before and after tensor conversion.

Add a Comment