Basic neural network subroutines (BNNS) is a collection of functions that you use to implement and run neural networks, using previously obtained training data.
Creating a Neural Network for Inference
The Accelerate framework’s new basic neural network subroutines (BNNS) is a collection of functions that you can use to construct neural networks. It is supported in macOS, iOS, tvOS, and watchOS, and is optimized for all CPUs supported on those platforms.
BNNS supports implementation and operation of neural networks for inference, using input data previously derived from training. BNNS does not do training, however. Its purpose is to provide very high performance inference on already trained neural networks.
This document introduces BNNS in general terms. For further details, consult the header file
bnns in the Accelerate framework.
Structure of a Neural Network
A neural network is a sequence of layers, each layer performing a filter operation on its input and passing the result as input to the next layer. The output of the last layer is an inference drawn from the initial input: for example, the initial input might be an image and the inference might be that it’s an image of a dinosaur.
A layer consists of a filter plus data derived from training, and an activation function. The filter is designed to use the training data in transforming the input. For example, in a convolution layer the filter would use training data as weights in the convolution kernel.
The activation function is applied to the output data after the filter. You select the activation function from a small set of simple functions given by the
BNNSActivation enumeration type in
bnns. Some of the activation functions accept one or two
float parameters which you specify.
BNNS provides functions for creating, applying, and destroying three kinds of layers:
A convolution layer, for each pixel in an input image, takes that pixel and its neighboring pixels and combines their values with weights from the training data to compute the corresponding pixel in the output image.
A pooling layer produces a smaller output image from its input image by breaking the input image into smaller rectangular subimages; each pixel in the output is the maximum or average (your choice) of the pixels in the corresponding subimage. A pooling layer does not use training data.
A fully connected layer takes its input as a vector; this vector is multiplied by a matrix of weights from training data. The resulting vector is updated by the activation function.