I've been following through this Matlab regression example where the network is trained to recognise the varying rotations in handwritten digits. I wanted to explore the results further and so found the root files in the program directory - inside, there are 10 subfolders (0-9) containing a total of 10,000 images and 2 excel sheets, "digitTest" and "digitTrain". These excel sheets are each 5000 rows of 3 columns for image file name, digit and rotated angle respectively. After running the code in the example myself and comparing the results, I can see that the response YTrain is the same as the excel file "digitTrain" and the response YValidation is the same as the excel file "digitTest". Later on in the post-processing of the data, a YPrediction and hence prediction error is calculated as so:
YPredicted = predict(net,XValidation);
predictionError = YValidation - YPredicted;
These 3 seperate responses have confused me and I'm looking for some clarification. From my understanding, validation data consists of the true values that are used to compare against the responses during training, in order to gain a rough estimation of how accurate a given network is. This makes sense as prediction error is the difference between the true and predicted values. I am not altogether sure what YTrain is; if this is meant to respresent the training responses, then why is there an excel sheet with pre-defined responses within the program directory already? What is YTrain representing, and if I were to train my own network would I need to generate a similar YTrain alongside my YValidation?