Hi there, I am using Neural Network for making predictions. I have been using the default 'dividerand' with 70%, 15%, 15% of the data for training, validation and testing, respectively. Here is the simple code I use: net = newff (x,y,20); net.perform

1 view (last 30 days)
Hi there, I am using Neural Network for making predictions. I have been using the default 'dividerand' with 70%, 15%, 15% of the data for training, validation and testing, respectively. Here is the simple code I use: net = newff (x,y,20); net.performFcn = 'mse'; net.trainParam.goal = 1e-6; net.trainParam.min_grad = 1e-20; net.trainParam.epochs = 1000; net.trainParam.max_fail =50; net=init(net); net=train(net,inputs,target); I use geophysical data (spans over low, high and moderate solar activity). I tried to simulate using new data and I found that the Network was biased towards low solar activity. I thought the problem could be that the percentage for validation and testing does not cover the entire solar cycle. I want to divide the data myself so that all the 3 solar cycles are well represented in the training, validation and testing sets. How can I do this? Found online that I could use the 'divideblock' but how does it work? Thank you Racheal
  3 Comments
rkelly
rkelly on 2 Jul 2015
Edited: rkelly on 2 Jul 2015
Hi Greg, What, exactly, are you predicting? I am predicting critical frequency of the F2 layer (foF2) (ionospheric stuff). What, exactly, are you predicting it with? I have about 20 years of historical foF2 data (target). The input vectors are parameters that influence foF2 e.g sunspot number, hour of day, etc
Is there any time delay between input and output? If so, you should be using a timeseries function. I have used feedforward backprop because several journal authors have used it successfully to predict foF2 How many input/target pairs, N ? targets: 1x12627 What and how many inputs? inputs:8x12627 (include sunspot number, day of the year, hour of day...etc Why aren't the default values sufficient? My problem is, foF2 depends of the activity of the sun (high, moderate and low solar activity). I have been presenting all the inputs and targets and they have been divided randomly into training, testing and validation set (using dividerand). When I used the network to simulate with new inputs, it performed well with correlation above 0.7 for a particular years (with low solar activity) but it was below 0.4 for the other years. So, I thought that during the data division, some years are not well represented in either the training or validation sets... My question is: Is there a way to manually or using programs to distribute the data without bias in all the 3 sets?
I tried to answer all the questions Thanks
Greg Heath
Greg Heath on 4 Jul 2015
What are the other 5 inputs?
How many measurements per day?
12627/365.25/24 = 1.44 ?
Is there any time delay between input and output?
You may need to have several models.
If sunspotnum <= ss1 then ..
etc
If so, try no data division with dividetrain to determine the minimum number of hidden nodes.
Then plot error rate vs
a. each of the 8 inputs
b. each of the 8 principal components (PCA)
c. each of the 8 principal coordinates (PLS)
>> lookfor principal
Hope this helps
Greg

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!