How do I apply a padding mask(BT) variable into a training of a self-attention transformer decoder, from a 3D Matrix(BT*C)

Question

0 votes

While attempting to train my neural network that uses a self attention layer in its transformer block. I have been struggling to implement a padding mask into my neural network.

-At first all full zero vectors( channels with complete zeroes) have been created into a logical (B*T*1) matrix, and I used the array datastore function import mutiple variable into trainnet(), however while training this error appears

Error using trainnet (line 54)

Error during read from datastore.

Caused by:

Error using horzcat

Dimensions of arrays being concatenated are not consistent.

My train set is a (B*T*C) with 88 channels,as wellas the target set, while the padding mask is a (B*T*1). Would I have to expand the padding in some way to make it conistent to 88 channels or is there another method to incorporating a padding mask.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Aravind on 1 Apr 2025

0 votes

Hi @Sai,

From your question, it seems you have created a transformer network and are attempting to apply a padding mask to the self-attention layer, but are encountering difficulties.

I assume you have used the "selfAttentionLayer" function to create the self-attention layer in the transformer. According to the documentation at https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.selfattentionlayer.html#mw_3dcae9f1-aa13-493a-a625-f9900b63288b, to provide a padding input, you must set the "HasPaddingMaskInput" property of the layer to "true." Doing this exposes an additional port named "mask" to which the padding mask should be supplied. While the padding mask can have multiple channels, the software only considers the first channel to indicate the padding values. The padding mask must match the batch (B) and time (T) dimensions of the input.

To address your issue, first set the "HasPaddingMaskInput" parameter of the self-attention layer to "true." Then, introduce an additional input in the transformer network architecture to connect directly to the "mask" port of the self-attention layer. This setup enables the neural network (transformer) to accept two inputs: the sequence input and the padding mask. You can use the "Deep Network Designer App" to modify the neural network architecture easily through a GUI. More information about this app is available at: https://www.mathworks.com/help/deeplearning/ref/deepnetworkdesigner-app.html.

Once configured this way, you can use the array datastore to pass both the sequence input and the mask to the transformer, thus preventing the error.

I hope this resolves your issue. If you can provide more details about your specific use case, I can offer more targeted advice.

1 Comment
Show -1 older comments Hide -1 older comments

Sai on 6 Apr 2025

Thank you so much, it pretty much fixed the error.

Sign in to comment.

How do I apply a padding mask(BT) variable into a training of a self-attention transformer decoder, from a 3D Matrix(BT*C)

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

1 Comment
Show -1 older comments Hide -1 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

How do I apply a padding mask(B*T) variable into a training of a self-attention transformer decoder, from a 3D Matrix(B*T*C)

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

1 Comment Show -1 older comments Hide -1 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

How do I apply a padding mask(BT) variable into a training of a self-attention transformer decoder, from a 3D Matrix(BT*C)

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments