multi_head_attention

    description: 'layer multi_head_attention for configurable component'
    ---

    # multi_head_attention layer

    Adds an attention layer with multiple heads. Uses softmax. Ends in a linear

    ## required arguments

        `size` - output size

    ## optional arguments

        `heads` - Number of heads to use. Defaults to 4

    ## input size

    Any 2-d tensor

    ## output size

    First argument

    ## syntax

    ```json
      "ez_norm CHANNELS heads=HEADS"
    ```

    ## examples

    ```json
      "multi_head_attention 1024 heads=4"
    ```

Last updated