How can I train a deep CNN network to classify a subset of the classes defined in a dataset?

1 view (last 30 days)
SOLVED: removecat(A) does work. I applied it after creating my augmentedDataStore. Apologies to all for wasting your time. For anyone with the same issue, make sure you remove the categories from the imageDataStore using removecat() BEFORE deriving your augmentedDataStore from that imageDataStore. Thanks Mohammad.
Hi Everyone,
I have a dataset consisting of 1443 images and a csv file which contains the labels for each image. There are 27 classes or 'types of label'. I have successfully trained a network to classify all 27 classes.
Here is the problem, there is a serious bias in this dataset by which I mean there is a large difference in the numbers of images which represent the top 10 classes and the rest. Therefore, I would like to train a model which classifies only the top 10 classes, not the entire 27.
So far, I have tried to use splitEachLabel along with the ...'include' , {'...top 10 labels...'} option. This successfuly creates new image datastores (training and testing) which only include the images with the specified labels. I've gone on to create the augmentedDataStore version of these with no issue.
The problem is, the computer still thinks there are 27 classes in my imageDataStore.
Let me show you a few lines of code:
markingsTrain_10.Labels
Returns a 955x1 categorical array including ONLY the categories/labels/classes which I've specified.
To check, I can write this code:
unique(categorical(markingsTrain_10.Labels))
which returns the 10 classes I asked for:
ans = 10×1 categorical
35
40
bike
forward
leftturn
ped
rail
rightturn
stop
xing
BUT THEN... when I check to see the actual labels/classes the computer thinks are in this datastore:
categories(categorical(markingsTrain_10.Labels))
I get the whole 27 classes (shown in the screenshot).
My question is, how can this be if the only labels which appear in the markingsTrain_10.Labels are the 10 which I've specified?
And if this method is not possible, how can I train my network to classify only the 10 specified classes out of 27?
If you need more code/results etc please let me know and I'll provide it right away! Forgive me if what I've provided is far from what is needed to answer my question.
  3 Comments
Daniel Suarez-Mash
Daniel Suarez-Mash on 22 Jul 2020
Edited: Daniel Suarez-Mash on 22 Jul 2020
Dear Mohammad,
Thanks very much for your response.
I have used the removecat function as shown below:
markingsTrain_10.Labels = removecats(markingsTrain_10.Labels, {'25', 'forward&right', 'yield', 'X-crossing', 'clear', 'forward&left', 'keep', 'hump', 'school', 'stripe', '30', 'slow', 'speed', 'car', 'diamond', 'lane', 'pool'});
Which has the desired effect (output):
ans = 10×1 categorical
35
40
bike
forward
leftturn
ped
rail
rightturn
stop
xing
This line of code below also returns the desired 10 classes:
categories(categorical(markingsTrain_10.Labels))
Output:
ans = 10×1 cell
'35'
'40'
'bike'
'forward'
'leftturn'
'ped'
'rail'
'rightturn'
'stop'
'xing'
So I thought this would fix the problem. Surely, I should definitely now have 10 classes. Not quite, I still get the same error when training the network:
Error using trainNetwork (line 170)
Invalid training data. The output size (10) of the last layer does not match the number of classes (27).
Where is it getting 27 from? Do we have to make changes to the augmented datastore too, which is derived from the markingsTrain_10 imageDataStore?
If this method doesn't work, which is seeming more likely, then is there official way to train my network on 10 classes out of the 27 provided in the dataset and annotations csv file?
Please let me know.
Also, let me know if you need more lines of code/outputs. I can run whatever code you need me to.
Thanks.
Daniel
Edit:
This is the trainNetwork line of code I'm running:
mySRMnetwork_10_classes = trainNetwork(augsTrain_10, myLayers_10, myOptions_10)
And augsTrain_10 is derived from markingsTrain_10 as shown below:
augsTrain_10 = augmentedImageDatastore([227 227], markingsTrain_10, 'colorPreprocessing', "gray2rgb")
Mohammad Sami
Mohammad Sami on 23 Jul 2020
If you want to reduce the number of classes, you will have to edit the last fullyConnectedLayer layer in your network.
The number of classes needs to match the number of outputs of the last fullyConnectLayer
fullyConnectedLayer(10)
softmaxLayer
classificationLayer

Sign in to comment.

Answers (0)

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!