MPSGraph randomTensor works for inference but crashes when training

I'm trying to use the randomTensor function from MPS graph to initialize the weights of a fully connected layer. I can create the graph and run inference using the randomly initialized values, but when I try to train and update these randomly initialized weights, I'm hitting a crash:

Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 578.

I can train the graph if I instead initialize the weights myself on the CPU, but I thought using the randomTensor functions would be faster/allow initialization to occur on the GPU.

Here's my code for building the graph including both methods of weight initialization:

  func buildGraph(variables: inout [MPSGraphTensor]) -> (MPSGraphTensor, MPSGraphTensor, MPSGraphTensor, MPSGraphTensor) {
    let inputPlaceholder = graph.placeholder(shape: [2], dataType: .float32, name: nil)
    let labelPlaceholder = graph.placeholder(shape: [1], name: nil)
    
    // This works for inference but not training
    let descriptor = MPSGraphRandomOpDescriptor(distribution: .uniform, dataType: .float32)!
    let weightTensor = graph.randomTensor(withShape: [2, 1], descriptor: descriptor, seed: 2, name: nil)

    // This works for inference and training
    // let weights = [Float](repeating: 1, count: 2)
    // let weightTensor = graph.variable(with: Data(bytes: weights, count: 2 * MemoryLayout<Float32>.size), shape: [2, 1], dataType: .float32, name: nil)

    variables += [weightTensor]
    let output = graph.matrixMultiplication(primary: inputPlaceholder, secondary: weightTensor, name: nil)
    let loss = graph.softMaxCrossEntropy(output, labels: labelPlaceholder, axis: -1, reuctionType: .sum, name: nil)
    return (inputPlaceholder, labelPlaceholder, output, loss)
  }

And to run the graph I have the following in my sample view controller:

  override func viewDidLoad() {
    super.viewDidLoad()
    var variables: [MPSGraphTensor] = []
    let (inputPlaceholder, labelPlaceholder, output, loss) = buildGraph(variables: &variables)
    let gradients = graph.gradients(of: loss, with: variables, name: nil)
    let learningRate = graph.constant(0.001, dataType: .float32)
    var updateOps: [MPSGraphOperation] = []
    for (key, value) in gradients {
      let updates = graph.stochasticGradientDescent(learningRate: learningRate, values: key, gradient: value, name: nil)
      let assign = graph.assign(key, tensor: updates, name: nil)
      updateOps += [assign]
    }
    
    let commandBuffer = MPSCommandBuffer(commandBuffer: Self.commandQueue.makeCommandBuffer()!)
    let executionDesc = MPSGraphExecutionDescriptor()
    executionDesc.completionHandler = { (resultsDictionary, nil) in
      for (key, value) in resultsDictionary {
        var output: [Float] = [0]
        value.mpsndarray().readBytes(&output, strideBytes: nil)
        print(output)
      }
    }
    let inputDesc = MPSNDArrayDescriptor(dataType: .float32, shape: [2])
    let input = MPSNDArray(device: Self.device, descriptor: inputDesc)
    var inputArray: [Float] = [1, 2]
    input.writeBytes(&inputArray, strideBytes: nil)
    let source = MPSGraphTensorData(input)
    let labelMPSArray = MPSNDArray(device: Self.device, descriptor: MPSNDArrayDescriptor(dataType: .float32, shape: [1]))
    var labelArray: [Float] = [1]
    labelMPSArray.writeBytes(&labelArray, strideBytes: nil)
    let label = MPSGraphTensorData(labelMPSArray)
    

    // This runs inference and works
//    graph.encode(to: commandBuffer, feeds: [inputPlaceholder: source], targetTensors: [output], targetOperations: [], executionDescriptor: executionDesc)
//
//    commandBuffer.commit()
//    commandBuffer.waitUntilCompleted()
    
    // This trains but does not work
    graph.encode(
      to: commandBuffer,
      feeds: [inputPlaceholder: source, labelPlaceholder: label], targetTensors: [], targetOperations: updateOps, executionDescriptor: executionDesc)

    commandBuffer.commit()
    commandBuffer.waitUntilCompleted()
  }

And a few other relevant variables are created at the class scope:

  let graph = MPSGraph()
  static let device = MTLCreateSystemDefaultDevice()!
  static let commandQueue = device.makeCommandQueue()!

How can I use these randomTensor functions on MPSGraph to randomly initialize weights for training?

Accepted Reply

The assign op you are using is meant to be used with the output tensor of some variable op, and unfortunately MPSGraph is not providing any kind of helpful error message to indicate that. In order to make use of the assign op you will have to load your weights into a variable op, and then use the assign op on that result tensor. This is why it is working in your alternate path. There are a couple options to get this working for you.

You can make use of control flow ops in order to have different paths in your graph for the first encode vs subsequent encodes. You can achieve this by adding a placeholder which indicates if this is the first encode, and using the if control flow op. You'll need to add a variable to your graph which you can init with zeros or any dummy data. In the first encode you can use the result of the random op to assign to this variable, and then proceed with execution of the graph using the read variable op. At the end of the graph execution you can assign to the variable as you are doing already. In the subsequent encodes you can skip the call to the random op and read directly from the variable to get your weights tensor.

Something like the following should hopefully get you started:

let encodePlaceholder = graph.placeholder(shape: [1], dataType:MPSDataTypeBool name: nil)

let weightsData = [Float](repeating: 1, count: 2) // Dummy data, not used in computation 
let weightsVariableTensor = graph.variable(with: Data(bytes: weightsData, count: 2 * MemoryLayout<Float32>.size), shape: [2, 1], dataType: .float32, name: nil)
let controlFlowOutputTensors = 
    graph.if(encodePlaceholder,
             then: MPSGraphIfThenElseBlock in {
                 let descriptor = MPSGraphRandomOpDescriptor(distribution: .uniform, dataType: .float32)!
                 let weightsTensor = graph.randomTensor(withShape: [2, 1], descriptor: descriptor, seed: 2, name: nil)
                 graph.assign(weightsVariableTensor, value: weightsTensor, name : nil)
                 return [graph.read(weightsVariableTensor, name: nil)]
             }
             else: MPSGraphIfThenElseBlock in {
                 return [graph.read(weightsVariableTensor, name: nil)]
             })
let weightsTensor = controlFlowOutputTensors[0]
variables += [weightsVariableTensor]
// Proceed with graph using weights tensor for your weights in computations

Alternatively, you can avoid using the random op and load some initialized random data into the NSData and use your alternate path using variables as you have it already. This is simpler, and isn't likely to change your performance by much, since it really only changes how your first training iteration operates. The downside is that if you want to tweak how your data is initialized it will have to happen outside of the MPSGraph random op. Or if you would like to use the MPSGraph random op APIs still, you can first create a graph which just generates the random values and execute this. You can then use that data to initialize your NSData rather than using some cpu-side random APIs to generate the data. And of course once you have some initial random data you can simply serialize it to a file and load it into the NSData, if that suits your use case, and regenerate this data as needed.

Replies

The assign op you are using is meant to be used with the output tensor of some variable op, and unfortunately MPSGraph is not providing any kind of helpful error message to indicate that. In order to make use of the assign op you will have to load your weights into a variable op, and then use the assign op on that result tensor. This is why it is working in your alternate path. There are a couple options to get this working for you.

You can make use of control flow ops in order to have different paths in your graph for the first encode vs subsequent encodes. You can achieve this by adding a placeholder which indicates if this is the first encode, and using the if control flow op. You'll need to add a variable to your graph which you can init with zeros or any dummy data. In the first encode you can use the result of the random op to assign to this variable, and then proceed with execution of the graph using the read variable op. At the end of the graph execution you can assign to the variable as you are doing already. In the subsequent encodes you can skip the call to the random op and read directly from the variable to get your weights tensor.

Something like the following should hopefully get you started:

let encodePlaceholder = graph.placeholder(shape: [1], dataType:MPSDataTypeBool name: nil)

let weightsData = [Float](repeating: 1, count: 2) // Dummy data, not used in computation 
let weightsVariableTensor = graph.variable(with: Data(bytes: weightsData, count: 2 * MemoryLayout<Float32>.size), shape: [2, 1], dataType: .float32, name: nil)
let controlFlowOutputTensors = 
    graph.if(encodePlaceholder,
             then: MPSGraphIfThenElseBlock in {
                 let descriptor = MPSGraphRandomOpDescriptor(distribution: .uniform, dataType: .float32)!
                 let weightsTensor = graph.randomTensor(withShape: [2, 1], descriptor: descriptor, seed: 2, name: nil)
                 graph.assign(weightsVariableTensor, value: weightsTensor, name : nil)
                 return [graph.read(weightsVariableTensor, name: nil)]
             }
             else: MPSGraphIfThenElseBlock in {
                 return [graph.read(weightsVariableTensor, name: nil)]
             })
let weightsTensor = controlFlowOutputTensors[0]
variables += [weightsVariableTensor]
// Proceed with graph using weights tensor for your weights in computations

Alternatively, you can avoid using the random op and load some initialized random data into the NSData and use your alternate path using variables as you have it already. This is simpler, and isn't likely to change your performance by much, since it really only changes how your first training iteration operates. The downside is that if you want to tweak how your data is initialized it will have to happen outside of the MPSGraph random op. Or if you would like to use the MPSGraph random op APIs still, you can first create a graph which just generates the random values and execute this. You can then use that data to initialize your NSData rather than using some cpu-side random APIs to generate the data. And of course once you have some initial random data you can simply serialize it to a file and load it into the NSData, if that suits your use case, and regenerate this data as needed.