A description of a gated recurrent unit block or layer.

SDKs

- iOS 11.0+
- macOS 10.13+
- Mac Catalyst 13.0+Beta
- tvOS 11.0+

Framework

- Metal Performance Shaders

## Declaration

## Overview

The recurrent neural network (RNN) layer initialized with a `MPSGRUDescriptor`

transforms the input data (image or matrix) and previous output with a set of filters. Each produces one feature map in the output data according to the gated recurrent unit (GRU) unit formula detailed below.

You may provide the GRU unit with a single input or a sequence of inputs. The layer also supports p-norm gating.

### Description of Operation

Let

`x`

be the input data (at time index_j `t`

of sequence,`j`

index containing quadruplet: batch index,`x,y`

and feature index (`x = y = 0`

for matrices)).Let

`h0_`

j be the recurrent input (previous output) data from previous time step (at time index`t-1`

of sequence).Let

`h`

be the proposed new output._i Let

`h1`

be the output data produced at this time step._i Let

`Wz`

,_ij `Uz`

be the input gate weights for input and recurrent input data, respectively._ij Let

`bi`

be the bias for the input gate._i Let

`Wr`

,_ij `Ur`

be the recurrent gate weights for input and recurrent input data, respectively._ij Let

`br`

be the bias for the recurrent gate._i Let

`Wh`

,_ij `Uh`

,_ij `Vh`

be the output gate weights for input, recurrent gate, and input gate, respectively._ij Let

`bh`

be the bias for the output gate._i Let

`gz(x`

`)`

,`gr(x)`

,`gh(x)`

be the neuron activation function for the input, recurrent, and output gates.Let

`p > 0`

be a scalar variable (typical`p >= 1`

) that defines the p-norm gating norm value..0

The output of the GRU layer is computed as follows:

The `*`

stands for convolution (see `MPSRNNImage`

) or matrix-vector/matrix multiplication (see `MPSRNNMatrix`

).

Summation is over index `j`

(except for the batch index), but there's no summation over repeated index `i`

,` `

the output index.

Note that for validity, all intermediate images must be of same size, and all `U`

and `V`

matrices must be square (that is, `output`

` ==`

`input`

). Also, the bias terms are scalars with regard to spatial dimensions. The conventional GRU block is achieved by setting `Vh = 0`

(nil), and the Minimal Gated Unit is achieved with `Uh = 0`

.