Asked by pliz
on 24 Mar 2013

This is probably easy for most people, but not me. I separately ran each of these "P" sections (P=1 to P = 4 for offline learning - neural networks) in their own loop and successfully reached an error (E) of less than 0.001, pretty quickly.

But the combined error of P = 1,2,3,and 4 together don't decrease to below 0.004 as I would like, instead they're stuck on a summed error of around 0.64 and even plotting the error of combining only P=1 and 2 (like I've done in the copy of my code below - which is why P = 3 and P = 4 are so jumbled) gets stuck at around 0.32. That's the extent of my ability to debug this code.

Can anyone see what obvious mistake I'm making (or otherwise optimize my crude coding)? Because I can't.

Thanks in advance!

clc; clear all; close all; eta = -1; x1 = 0.1; x2 = 0.1; x3 = 1; w1(1)=1*(rand(1)-0.5); w2(1)=-1*(rand(1)-0.5); w3(1)=2.3*(rand(1)-0.5); w4(1)=2.1*(rand(1) - 0.5); w5(1)=-2*(rand(1) - 0.5); w6(1)=-2*(rand(1)-0.5); w7(1)=1*(rand(1) - 0.5); w8(1)=2*(rand(1) - 0.5); w9(1)=2*(rand(1) - 0.5); i=1; for icount=1:10000

%P = 1 x1 = 0.1; x2 = 0.1; alpha1 = w1(i)*x1 + w3(i)*x2 + w5(i)*x3; z1 = 1./(1 + exp(-alpha1)); alpha2 = w2(i)*x1 + w4(i)*x2 + w6(i)*x3; z2 = 1./(1 + exp(-alpha2)); alpha3 = w7(i)*z1 + w8(i)*z2 + w9(i)*x3; y1 = 1./(1 + exp(-alpha3)); %Hidden layer gate z1 changew11 = eta*x1*z1*(1-z1)*w7(i)*y1*(1-y1)*(y1-0.1); changew31 = eta*x2*z1*(1-z1)*w8(i)*y1*(1-y1)*(y1-0.1); changew51 = eta*x3*z1*(1-z1)*w9(i)*y1*(1-y1)*(y1-0.1); %Hidden layer gate z2 changew21 = eta*x1*z2*(1-z2)*w7(i)*y1*(1-y1)*(y1-0.1); changew41 = eta*x2*z2*(1-z2)*w8(i)*y1*(1-y1)*(y1-0.1); changew61 = eta*x3*z2*(1-z2)*w9(i)*y1*(1-y1)*(y1-0.1); %Output layer changew71 = eta*z1*y1*(1-y1)*(y1-0.1); changew81 = eta*z2*y1*(1-y1)*(y1-0.1); changew91 = eta*x3*y1*(1-y1)*(y1-0.1); E1(i) = (y1-0.1)^2;

%P = 2 x1 = 0.1; x2 = 0.9; alpha1 = w1(i)*x1 + w3(i)*x2 + w5(i)*x3; z1 = 1./(1 + exp(-alpha1)); alpha2 = w2(i)*x1 + w4(i)*x2 + w6(i)*x3; z2 = 1./(1 + exp(-alpha2)); alpha3 = w7(i)*z1 + w8(i)*z2 + w9(i)*x3; y2 = 1./(1 + exp(-alpha3)); %Hidden layer gate z1 changew12 = eta*x1*z1*(1-z1)*w7(i)*y2*(1-y2)*(y2-0.9); changew32 = eta*x2*z1*(1-z1)*w8(i)*y2*(1-y2)*(y2-0.9); changew52 = eta*x3*z1*(1-z1)*w9(i)*y2*(1-y2)*(y2-0.9); %Hidden layer gate z2 changew22 = eta*x1*z2*(1-z2)*w7(i)*y2*(1-y2)*(y2-0.9); changew42 = eta*x2*z2*(1-z2)*w8(i)*y2*(1-y2)*(y2-0.9); changew62 = eta*x3*z2*(1-z2)*w9(i)*y2*(1-y2)*(y2-0.9); %Output layer changew72 = eta*z1*y2*(1-y2)*(y2-0.9); changew82 = eta*z2*y2*(1-y2)*(y2-0.9); changew92 = eta*x3*y2*(1-y2)*(y2-0.9); E2(i) = (y2-0.9)^2;

% %P = 3 % x1 = 0.9; % x2 = 0.1; % alpha1 = w1(i)*x1 + w3(i)*x2 + w5(i)*x3; % z1 = 1./(1 + exp(-alpha1)); % alpha2 = w2(i)*x1 + w4(i)*x2 + w6(i)*x3; % z2 = 1./(1 + exp(-alpha2)); % alpha3 = w7(i)*z1 + w8(i)*z2 + w9(i)*x3; % y3 = 1./(1 + exp(-alpha3)); % %Hidden layer gate z1 % changew13 = eta*x1*z1*(1-z1)*w7(i)*y3*(1-y3)*(y3-0.9); % changew33 = eta*x2*z1*(1-z1)*w8(i)*y3*(1-y3)*(y3-0.9); % changew53 = eta*x3*z1*(1-z1)*w9(i)*y3*(1-y3)*(y3-0.9); % %Hidden layer gate z2 % changew23 = eta*x1*z2*(1-z2)*w7(i)*y3*(1-y3)*(y3-0.9); % changew43 = eta*x2*z2*(1-z2)*w8(i)*y3*(1-y3)*(y3-0.9); % changew63 = eta*x3*z2*(1-z2)*w9(i)*y3*(1-y3)*(y3-0.9); % %Output layer % changew73 = eta*z1*y3*(1-y3)*(y3-0.9); % changew83 = eta*z2*y3*(1-y3)*(y3-0.9); % changew93 = eta*x3*y3*(1-y3)*(y3-0.9); % E3(i) = (y3-0.9)^2;

% %P = 4 % x1 = 0.9; x2 = 0.9; x3 = 1; % alpha1 = w1(i)*x1 + w3(i)*x2 + w5(i)*x3; % z1 = 1./(1 + exp(-alpha1)); % alpha2 = w2(i)*x1 + w4(i)*x2 + w6(i)*x3; % z2 = 1./(1 + exp(-alpha2)); % alpha3 = w7(i)*z1 + w8(i)*z2 + w9(i)*x3; % y4 = 1./(1 + exp(-alpha3)); % %Hidden layer gate z1 % changew14 = eta*x1*z1*(1-z1)*w7(i)*y4*(1-y4)*(y4-0.1); % changew34 = eta*x2*z1*(1-z1)*w8(i)*y4*(1-y4)*(y4-0.1); % changew54 = eta*x3*z1*(1-z1)*w9(i)*y4*(1-y4)*(y4-0.1); % %Hidden layer gate z2 % changew24 = eta*x1*z2*(1-z2)*w7(i)*y4*(1-y4)*(y4-0.1); % changew44 = eta*x2*z2*(1-z2)*w8(i)*y4*(1-y4)*(y4-0.1); % changew64 = eta*x3*z2*(1-z2)*w9(i)*y4*(1-y4)*(y4-0.1); % %Output layer % changew74 = eta*z1*y4*(1-y4)*(y4-0.1); % changew84 = eta*z2*y4*(1-y4)*(y4-0.1); % changew94 = eta*x3*y4*(1-y4)*(y4-0.1); % E4(i) = (y4-0.1)^2;

sumE(i) = E1(i) + E2(i); %+ E3(i) + E4(i); if sumE(i)<=0.004 break end i=i+1; w1(i) = w1(i-1) + changew11+changew12;%+changew13+changew14; w2(i) = w2(i-1) + changew21+changew22;%+changew23+changew24; w3(i) = w3(i-1) + changew31+changew32;%+changew33+changew34; w4(i) = w4(i-1) + changew41+changew42;%+changew43+changew44; w5(i) = w5(i-1) + changew51+changew52;%+changew53+changew54; w6(i) = w6(i-1) + changew61+changew62;%+changew63+changew64; w7(i) = w7(i-1) + changew71+changew72;%+changew73+changew74; w8(i) = w8(i-1) + changew81+changew82;%+changew83+changew84; w9(i) = w9(i-1) + changew91+changew92;%+changew93+changew94; end figure(1); grid on; title('W values 1-9 Vs Iteration Number'); hold on; plot(w1,'red'); plot(w2,'green'); plot(w3,'blue'); plot(w4,'cyan'); plot(w5,'magenta'); plot(w6,'yellow'); plot(w7,'black'); plot(w8,':red'); plot(w9,'-.green'); legend('w1','w2','w3','w4','w5','w6','w7','w8','w9','Location','Best'); figure(2); grid on; title('Error Vs Iteration Number'); hold on; plot(sumE);

*No products are associated with this question.*

Answer by Greg Heath
on 25 Mar 2013

Accepted answer

1. If the number of unknowns is greater than the number of equations, then a solution is not unique (How many solutions to x1 + x2 = 1?).

2. A net with more unknown weights than the number of training equations is said to be OVERFIT. The nonuniqueness of exact solutions is typically mitigated by various techniques mentioned below.

3. If an overfit net is trained with data consisting of signal + random contamination (noise, measurement error, roundoff and/or truncation error). A LMSE (least-mean-square-error) solution obtained from a signal with a particular set of contamination may yield a large MSE for the same signal with different contamination.

4. A net that performs well on nontraining data, that can be assumed to be drawn from the same source as the training data, is said to have good generalization, i.e., it generalizes well to nontraining data.

5. If a net is overfit but the signal to contamination power ratio is sufficiently high, iterative solutions tend to pass though regions of good generalization on the way to minimizing the training MSE. Such nets are said to be OVERTRAINED.

6. There are several methods to mitigate overtraining an overfit net. See the comp.ai.neural-nets FAQ and search for overfit, overfitting and/or generalization.

7. For a single hidden layer MLP with H hidden nodes and an I-H-O node topology trained by Ntrn pairs of I-dimensional inputs and O-dimensional outputs:

Ntrneq = Ntrn*O % No. of training equations

Nw = (I+1)*H+(H+1)*O % No. of unknown weights.

Typically, Ntrn, I and O are given and a choice of H has to be made. To avoid overfitting, choose H to be less than the upperbound

Hub = -1 + ceil( Ntrneq-O)/(I+O+1) ).

Sometimes this can be achieved by reducing I, O, and/or H by pruning connections.

8. If avoiding overfitting does not yield an acceptable solution, then there are other mitigation techniques for not overtraining an overfit net(See the comp.ai.neural-nets FAQ):

a. Validation set stopping b. Regularization of the minimization objective 1. Weight decay 2. Weight elimination 3. Bayesian regularization c. Jittering(Training with added noise)

Bottom Line:

If you have 9 unknown weights you might want at least 45 or 90 equations or else use a mitigation technique.

Hope this helps.

**Thank you for formally accepting ny answer**

Greg

Opportunities for recent engineering grads.

## 4 Comments

## per isakson (view profile)

Direct link to this comment:http://www.mathworks.com/matlabcentral/answers/68385#comment_138456

Make one more effort to mark-up the code

## Image Analyst (view profile)

Direct link to this comment:http://www.mathworks.com/matlabcentral/answers/68385#comment_138501

Your last edit, after per's comment, didn't work. Please review this: http://www.mathworks.com/matlabcentral/answers/13205-tutorial-how-to-format-your-question-with-markup Please put

one statement per line, anddon't double space lines. Your goal is that if you copy your Answers code, and paste your uploaded code back into MATLAB, that it looks just like it did before you pasted it into the Answers forum. Then we will be able to run your code.## Walter Roberson (view profile)

Direct link to this comment:http://www.mathworks.com/matlabcentral/answers/68385#comment_138502

Do you really have large blocks of commented-out code just before sumE(i) calculation?

## pliz (view profile)

Direct link to this comment:http://www.mathworks.com/matlabcentral/answers/68385#comment_139884

I tried to respond to the comments. Per and Image Analyst, I followed the tutorial on editing the code and could copy and paste the above revision into my file editor and the output was as I intended (meaning it shows the problems I'm having).

Walter - yes I commented out the blocks to show the troubleshooting I've done. Again, each individual "P" reaches the required error. And if I only comment-out "Ps" sharing the same error calculation ((y-0.1) or y-0.9)), the required error is reached. It's only when I run epochs with different error calculations (which includes running all 4 "P" values together) that no solution is reached.

Greg - thank you for the background. Unfortunately I'm not familiar with a lot of the subject matter. I'll try and upload some of the notes we were given for motivation on this problem.

Again, thank you everyone for your help. I'll respond more quickly this time, and I appreciate any further assistance.