Basically, I'm doing tree species classification from hyperspectral data (so we have 114 features, which represent different spectral bands at different wavelengths, and 6596 observations, representing the image at the pixel level of a tree.) The end goal is to train a Random Forest to classify the species of each pixel of the tree, but at this point in time, I'm just working on distinguishing between broadleaf and conifer type trees, but I'm stuck on feature selection, and ultimately am unsure of what the best overall approach using matlab is with Random Forest, as I've only ever coded in python before and this is my first step into machine learning.
Here is the basis of the code I am using:
Basically, with feature selection, I've tried several different processes that haven't worked out, mainly I think due to the nature of my data. The hyperspectral data goes from the visible light wavelength, all the way up to the infrared wavelengths. This means that 2 features next to each other may be very important in terms of tree species classification, but since they're bands that are similar wavelengths, meaning they're similar colors, they're very correlated. For example, feature 69 and feature 70, which are two of the "best" features based on OOBPredictorImportance, have a 99.79% correlation rate.
PCA, sequentialfs forward and sequentialfs backward all gave me the same, if not lower, accuracy than when I run all 114 features. And every time I've run them, I've done a 1:1:114 loop adding in the next best ranked feature from the algorithm each time, plotting the loss function for each, and it still has unsatisfactory results. If anybody could lead me in the proper direction for feature selection, especially anybody that has worked with hyperspectral data before, that would be great.
Also, just if anybody could look at the raw version of my code that I put up and tell me if there's anything else I could be changing around to increase the accuracy, that would be great. I've researched these forums for weeks, I just feel as if some of the answers to my issues that are specific towards this type of data, which is why I'm starting my own thread.
Thank you in advance for any and all help/advice.