Hi Sharith,
It is my understanding that you want to add and customize self-attention in the CNN network for detecting wafer defects.
You can define a CNN-based architecture and add a self-attention layer in the end using ‘selfAttentionLayer’. The function takes in two parameters, i.e, ‘NumHeads’ and ‘NumKeyChannels’ using which you can change the number of heads and the dimensions of key vector.
Below is a reference code for the model architecture:
imageInputLayer([28 28 1], 'Name', 'input')
convolution2dLayer(3, 16, 'Padding', 'same', 'Name', 'conv1')
batchNormalizationLayer('Name', 'bn1')
reluLayer('Name', 'relu1')
maxPooling2dLayer(2, 'Stride', 2, 'Name', 'maxpool1')
convolution2dLayer(3, 32, 'Padding', 'same', 'Name', 'conv2')
batchNormalizationLayer('Name', 'bn2')
reluLayer('Name', 'relu2')
flattenLayer('Name', 'flatten')
selfAttentionLayer(4, 32, 'Name', 'self_attention')
fullyConnectedLayer(10, 'Name', 'fc')
softmaxLayer('Name', 'softmax')
classificationLayer('Name', 'output')
The above code defines a CNN based architecture incorporating Multi headed self-attention (MHSA) for ten class classification.
Refer to the below MathWorks documentation for more information: