How do I apply a padding mask(B*T) variable into a training of a self-attention transformer decoder, from a 3D Matrix(B*T*C)
Show older comments
While attempting to train my neural network that uses a self attention layer in its transformer block. I have been struggling to implement a padding mask into my neural network.
-At first all full zero vectors( channels with complete zeroes) have been created into a logical (B*T*1) matrix, and I used the array datastore function import mutiple variable into trainnet(), however while training this error appears
Error during read from datastore.
Caused by:
Error using horzcat
Dimensions of arrays being concatenated are not consistent.
My train set is a (B*T*C) with 88 channels,as wellas the target set, while the padding mask is a (B*T*1). Would I have to expand the padding in some way to make it conistent to 88 channels or is there another method to incorporating a padding mask.
Accepted Answer
More Answers (0)
Categories
Find more on Deep Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!