Working on a personal project, I am trying to learn about CNN's. I have been using the "transfered training" method to train a few CNN's on "Labeled faces in the wild" and at&t database combination, and I want to discuss the results.
I took 100 individuals LFW and all 40 from the AT&T database and used 75% for training and the rest for validation.
I also lack proper understanding in the relationship between CNN parameters and layers, so can someone please clarify it. I think you will be able to understand where I am getting confused after I explain the data I have.
I first trained Alexnet on it and I got this plot
So Alexnet has very few layers and is a small light net (even though it has alot of parameters) which is why I think it underfit the data?
I trained resnet50 on it and I get a similar result so I believe it also underfit the data? But this one flucuates and sometimes reaches 100% training accuracy, so maybe not underfit?
I also trained inceptionresnetv2 on the data and I get this result. I am not sure about what is going on here.
I wanted to take a closer look and so I trained it again and with a lower learning rate just to make sure it wasn't that. Could this be attributed to the mini batch size?
I also trained the efficientnet with this data and reached and pretty much stayed at 100% training accuracy and a constant 70% accuracy. Maybe that was overfitting or just alright?
The last ones which gave the best results was xception and densenet CNN which had 100% training accuracy and 80% validation accuracy. Densenet overfit I think but am not sure. Perhaps xception did too?
Can someone explain the data and suggest improvements please
I forgot to mention that the LFW database sometimes has 2 faces in a picture (very few pictures tho) and a good number of people who look similar. The validation accuracy is most likely around 80% because of that. During my testing, I figured out that in a few images, it gave an output based on the face in the background. Sometimes it couldn't distinguish between two different people who looked similar
pixelRange = [-10 10];
scaleRange = [0.5 1.5];
imageAugmenter = imageDataAugmenter( ...
'RandXScale',scaleRange, 'RandYScale',scaleRange, 'RandRotation', [-45 45]);
inputSize = g.Layers(1).InputSize;
augimdsTrain = augmentedImageDatastore(inputSize(1:2),imda, ...
augimdsValidation = augmentedImageDatastore(inputSize(1:2),imda2);
miniBatchSize = 20;
valFrequency = 80;
'every-epoch','ValidationData',augimdsValidation, 'ValidationFrequency',valFrequency, ...
'MaxEpochs',200,'MiniBatchSize',miniBatchSize,'Plots','training-progress', 'CheckpointPath', './DCHK');
This was my code when training all the networks.
I only adjusted the learning rates, and also batch sizes but that only so it works with my gpu.
The learning rates above were for alexnet.
I increased the learning rate nad drop period for the deeper nets a bit like Initialrate was 0.001 for xception net and drop period was 10.