A convolution kernel with binary weights and an input image using binary approximations.

SDKs

- iOS 11.0+
- macOS 10.13+
- Mac Catalyst 13.0+
- tvOS 11.0+

Framework

- Metal Performance Shaders

## Declaration

## Overview

The `MPSCNNBinary`

optionally first binarizes the input image and then convolves the result with a set of binary-valued filters, each producing one feature map in the output image (which is a normal image).

The output is computed as follows:

where the *sum over* *dx,dy* is over the spatial filter kernel window defined by `kernel`

and `kernel`

, *sum over* *f* is over the input feature channel indices within group, *B* contains the binary weights, interpreted as `{-1, 1}`

or `{0, 1}`

and *scale[c]* is the `output`

array and bias is the `output`

array. Above *i* is the image index in batch the sum over input channels *f* runs through the group indices. The convolution operator ⊗ is defined by `MPSCNNBinary`

passed in at initialization time of the filter:

`MPSCNNBinary`

Convolution Type .binary Weights The input image is not binarized at all and the convolution is computed interpreting the weights as

`[0, 1] -> {-1, 1}`

with the given scaling terms.`MPSCNNBinary`

Convolution Type .XNOR The convolution is computed by first binarizing the input image using the sign function

`bin(x) = x < 0 ? -1 : 1`

and the convolution multiplication is done with the XNOR-operator:`!(x ^ y) = delta`

_xy = { (x == y) ? 1 : 0 } and scaled according to the optional scaling operations.

Note that we output the values of the bitwise convolutions to interval

`{-1, 1}`

, which means that the output of the XNOR-operator is scaled implicitly as follows:`r = 2 * ( !(x ^ y) ) - 1 = { -1, 1 }`

This means that for a dot-product of two 32-bit words the result is:

`r = 2 * popcount(!(x ^ y) ) - 32 = 32 - 2 * popcount( x ^ y ) = { -32, -30, ..., 30, 32 }`

`MPSCNNBinary`

Convolution Type .AND The convolution is computed by first binarizing the input image using the sign function

`bin(x) = x < 0 ? -1 : 1`

and the convolution multiplication is done with the AND-operator:`(x & y) = delta`

_xy * delta _x1 = { (x == y == 1) ? 1 : 0 } and scaled according to the optional scaling operations.

Note that we output the values of the AND-operation is assumed to lie in

`{0, 1}`

interval and hence no more implicit scaling takes place.This means that for a dot-product of two 32-bit words the result is:

`r = popcount(x & y) = { 0, ..., 31, 32 }`

The input data can be pre-offset and scaled by providing the `input`

and `input`

parameters for the initialization functions and this can be used for example to accomplish batch normalization of the data. The scaling of input values happens before possible beta-image computation.

The parameter `beta`

above is an optional image which is used to compute scaling factors for each spatial position and image index. For the XNOR-Net based networks this is computed as follows:

where *(dx,dy)* are summed over the convolution filter window.

where *in* is the original input image (in full precision) and *Nc* is the number of input channels in the input image. Parameter `beta`

is not passed as input and to enable beta-scaling the user can provide `MPSCNNBinary`

in the flags parameter in the initialization functions.

Finally the normal activation neuron is applied and the result is written to the output image.

Note

`MPSCNNBinary`

does not currently support `groups`

greater than 1.