How to further debug this code?

Question

0 votes

This is probably easy for most people, but not me. I separately ran each of these "P" sections (P=1 to P = 4 for offline learning - neural networks) in their own loop and successfully reached an error (E) of less than 0.001, pretty quickly.

But the combined error of P = 1,2,3,and 4 together don't decrease to below 0.004 as I would like, instead they're stuck on a summed error of around 0.64 and even plotting the error of combining only P=1 and 2 (like I've done in the copy of my code below - which is why P = 3 and P = 4 are so jumbled) gets stuck at around 0.32. That's the extent of my ability to debug this code.

Can anyone see what obvious mistake I'm making (or otherwise optimize my crude coding)? Because I can't.

Thanks in advance!

clc; 
clear all; 
close all;
eta = -1; 
x1 = 0.1; 
x2 = 0.1; 
x3 = 1;
w1(1)=1*(rand(1)-0.5); 
w2(1)=-1*(rand(1)-0.5); 
w3(1)=2.3*(rand(1)-0.5);
w4(1)=2.1*(rand(1) - 0.5); 
w5(1)=-2*(rand(1) - 0.5); 
w6(1)=-2*(rand(1)-0.5);
w7(1)=1*(rand(1) - 0.5); 
w8(1)=2*(rand(1) - 0.5); 
w9(1)=2*(rand(1) - 0.5);
i=1;
for icount=1:10000
%P = 1
x1 = 0.1; 
x2 = 0.1;
alpha1 = w1(i)*x1 + w3(i)*x2 + w5(i)*x3;
z1 = 1./(1 + exp(-alpha1));
alpha2 = w2(i)*x1 + w4(i)*x2 + w6(i)*x3;
z2 = 1./(1 + exp(-alpha2));
alpha3 = w7(i)*z1 + w8(i)*z2 + w9(i)*x3;
y1 = 1./(1 + exp(-alpha3));
%Hidden layer gate z1
changew11 = eta*x1*z1*(1-z1)*w7(i)*y1*(1-y1)*(y1-0.1);
changew31 = eta*x2*z1*(1-z1)*w8(i)*y1*(1-y1)*(y1-0.1);
changew51 = eta*x3*z1*(1-z1)*w9(i)*y1*(1-y1)*(y1-0.1);
%Hidden layer gate z2
changew21 = eta*x1*z2*(1-z2)*w7(i)*y1*(1-y1)*(y1-0.1);
changew41 = eta*x2*z2*(1-z2)*w8(i)*y1*(1-y1)*(y1-0.1);
changew61 = eta*x3*z2*(1-z2)*w9(i)*y1*(1-y1)*(y1-0.1);
%Output layer
changew71 = eta*z1*y1*(1-y1)*(y1-0.1);
changew81 = eta*z2*y1*(1-y1)*(y1-0.1);
changew91 = eta*x3*y1*(1-y1)*(y1-0.1);
E1(i) = (y1-0.1)^2;
%P = 2
x1 = 0.1; x2 = 0.9;
alpha1 = w1(i)*x1 + w3(i)*x2 + w5(i)*x3;
z1 = 1./(1 + exp(-alpha1));
alpha2 = w2(i)*x1 + w4(i)*x2 + w6(i)*x3;
z2 = 1./(1 + exp(-alpha2));
alpha3 = w7(i)*z1 + w8(i)*z2 + w9(i)*x3;
y2 = 1./(1 + exp(-alpha3));
%Hidden layer gate z1
changew12 = eta*x1*z1*(1-z1)*w7(i)*y2*(1-y2)*(y2-0.9);
changew32 = eta*x2*z1*(1-z1)*w8(i)*y2*(1-y2)*(y2-0.9);
changew52 = eta*x3*z1*(1-z1)*w9(i)*y2*(1-y2)*(y2-0.9);
%Hidden layer gate z2
changew22 = eta*x1*z2*(1-z2)*w7(i)*y2*(1-y2)*(y2-0.9);
changew42 = eta*x2*z2*(1-z2)*w8(i)*y2*(1-y2)*(y2-0.9);
changew62 = eta*x3*z2*(1-z2)*w9(i)*y2*(1-y2)*(y2-0.9);
%Output layer
changew72 = eta*z1*y2*(1-y2)*(y2-0.9);
changew82 = eta*z2*y2*(1-y2)*(y2-0.9);
changew92 = eta*x3*y2*(1-y2)*(y2-0.9);
E2(i) = (y2-0.9)^2;
% %P = 3 
% x1 = 0.9; 
% x2 = 0.1; 
% alpha1 = w1(i)*x1 + w3(i)*x2 + w5(i)*x3; 
% z1 = 1./(1 + exp(-alpha1)); 
% alpha2 = w2(i)*x1 + w4(i)*x2 + w6(i)*x3; 
% z2 = 1./(1 + exp(-alpha2)); 
% alpha3 = w7(i)*z1 + w8(i)*z2 + w9(i)*x3; 
% y3 = 1./(1 + exp(-alpha3)); 
% %Hidden layer gate z1 
% changew13 = eta*x1*z1*(1-z1)*w7(i)*y3*(1-y3)*(y3-0.9); 
% changew33 = eta*x2*z1*(1-z1)*w8(i)*y3*(1-y3)*(y3-0.9); 
% changew53 = eta*x3*z1*(1-z1)*w9(i)*y3*(1-y3)*(y3-0.9); 
% %Hidden layer gate z2 
% changew23 = eta*x1*z2*(1-z2)*w7(i)*y3*(1-y3)*(y3-0.9); 
% changew43 = eta*x2*z2*(1-z2)*w8(i)*y3*(1-y3)*(y3-0.9); 
% changew63 = eta*x3*z2*(1-z2)*w9(i)*y3*(1-y3)*(y3-0.9); 
% %Output layer 
% changew73 = eta*z1*y3*(1-y3)*(y3-0.9); 
% changew83 = eta*z2*y3*(1-y3)*(y3-0.9); 
% changew93 = eta*x3*y3*(1-y3)*(y3-0.9); 
% E3(i) = (y3-0.9)^2;
% %P = 4 % x1 = 0.9; x2 = 0.9; x3 = 1; 
% alpha1 = w1(i)*x1 + w3(i)*x2 + w5(i)*x3; 
% z1 = 1./(1 + exp(-alpha1)); 
% alpha2 = w2(i)*x1 + w4(i)*x2 + w6(i)*x3; 
% z2 = 1./(1 + exp(-alpha2)); 
% alpha3 = w7(i)*z1 + w8(i)*z2 + w9(i)*x3; 
% y4 = 1./(1 + exp(-alpha3)); 
% %Hidden layer gate z1 
% changew14 = eta*x1*z1*(1-z1)*w7(i)*y4*(1-y4)*(y4-0.1); 
% changew34 = eta*x2*z1*(1-z1)*w8(i)*y4*(1-y4)*(y4-0.1); 
% changew54 = eta*x3*z1*(1-z1)*w9(i)*y4*(1-y4)*(y4-0.1); 
% %Hidden layer gate z2 
% changew24 = eta*x1*z2*(1-z2)*w7(i)*y4*(1-y4)*(y4-0.1); 
% changew44 = eta*x2*z2*(1-z2)*w8(i)*y4*(1-y4)*(y4-0.1); 
% changew64 = eta*x3*z2*(1-z2)*w9(i)*y4*(1-y4)*(y4-0.1); 
% %Output layer 
% changew74 = eta*z1*y4*(1-y4)*(y4-0.1); 
% changew84 = eta*z2*y4*(1-y4)*(y4-0.1); 
% changew94 = eta*x3*y4*(1-y4)*(y4-0.1); 
% E4(i) = (y4-0.1)^2;
sumE(i) = E1(i) + E2(i); %+ E3(i) + E4(i);
if sumE(i)<=0.004
  break
  end
i=i+1;
w1(i) = w1(i-1) + changew11+changew12;%+changew13+changew14;
w2(i) = w2(i-1) + changew21+changew22;%+changew23+changew24;
w3(i) = w3(i-1) + changew31+changew32;%+changew33+changew34;
w4(i) = w4(i-1) + changew41+changew42;%+changew43+changew44;
w5(i) = w5(i-1) + changew51+changew52;%+changew53+changew54;
w6(i) = w6(i-1) + changew61+changew62;%+changew63+changew64;
w7(i) = w7(i-1) + changew71+changew72;%+changew73+changew74;
w8(i) = w8(i-1) + changew81+changew82;%+changew83+changew84;
w9(i) = w9(i-1) + changew91+changew92;%+changew93+changew94;
end
figure(1); 
grid on; 
title('W values 1-9 Vs Iteration Number'); 
hold on; 
plot(w1,'red'); 
plot(w2,'green'); 
plot(w3,'blue'); 
plot(w4,'cyan'); 
plot(w5,'magenta'); 
plot(w6,'yellow'); 
plot(w7,'black'); 
plot(w8,':red'); 
plot(w9,'-.green');
legend('w1','w2','w3','w4','w5','w6','w7','w8','w9','Location','Best');
figure(2); 
grid on; 
title('Error Vs Iteration Number'); 
hold on; 
plot(sumE);

4 Comments
Show 2 older comments Hide 2 older comments

Walter Roberson on 24 Mar 2013

Do you really have large blocks of commented-out code just before sumE(i) calculation?

pliz on 29 Mar 2013

Edited: pliz on 29 Mar 2013

I tried to respond to the comments. Per and Image Analyst, I followed the tutorial on editing the code and could copy and paste the above revision into my file editor and the output was as I intended (meaning it shows the problems I'm having).

Walter - yes I commented out the blocks to show the troubleshooting I've done. Again, each individual "P" reaches the required error. And if I only comment-out "Ps" sharing the same error calculation ((y-0.1) or y-0.9)), the required error is reached. It's only when I run epochs with different error calculations (which includes running all 4 "P" values together) that no solution is reached.

Greg - thank you for the background. Unfortunately I'm not familiar with a lot of the subject matter. I'll try and upload some of the notes we were given for motivation on this problem.

Again, thank you everyone for your help. I'll respond more quickly this time, and I appreciate any further assistance.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Greg Heath on 25 Mar 2013

Open in MATLAB Online

2 votes

1. If the number of unknowns is greater than the number of equations, then a solution is not unique (How many solutions to x1 + x2 = 1?).

2. A net with more unknown weights than the number of training equations is said to be OVERFIT. The nonuniqueness of exact solutions is typically mitigated by various techniques mentioned below.

3. If an overfit net is trained with data consisting of signal + random contamination (noise, measurement error, roundoff and/or truncation error). A LMSE (least-mean-square-error) solution obtained from a signal with a particular set of contamination may yield a large MSE for the same signal with different contamination.

4. A net that performs well on nontraining data, that can be assumed to be drawn from the same source as the training data, is said to have good generalization, i.e., it generalizes well to nontraining data.

5. If a net is overfit but the signal to contamination power ratio is sufficiently high, iterative solutions tend to pass though regions of good generalization on the way to minimizing the training MSE. Such nets are said to be OVERTRAINED.

6. There are several methods to mitigate overtraining an overfit net. See the comp.ai.neural-nets FAQ and search for overfit, overfitting and/or generalization.

7. For a single hidden layer MLP with H hidden nodes and an I-H-O node topology trained by Ntrn pairs of I-dimensional inputs and O-dimensional outputs:

 Ntrneq = Ntrn*O      % No. of training equations
 Nw = (I+1)*H+(H+1)*O   % No. of unknown weights.

Typically, Ntrn, I and O are given and a choice of H has to be made. To avoid overfitting, choose H to be less than the upperbound

Hub = -1 + ceil( Ntrneq-O)/(I+O+1) ).

Sometimes this can be achieved by reducing I, O, and/or H by pruning connections.

8. If avoiding overfitting does not yield an acceptable solution, then there are other mitigation techniques for not overtraining an overfit net(See the comp.ai.neural-nets FAQ):

 a. Validation set stopping
 b. Regularization of the minimization objective
    1. Weight decay
    2. Weight elimination
    3. Bayesian regularization
 c. Jittering(Training with added noise)

Bottom Line:

If you have 9 unknown weights you might want at least 45 or 90 equations or else use a mitigation technique.

How to further debug this code?

4 Comments
Show 2 older comments Hide 2 older comments

Accepted Answer

0 Comments
Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

How to further debug this code?

4 Comments Show 2 older comments Hide 2 older comments

Accepted Answer

0 Comments Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

4 Comments
Show 2 older comments Hide 2 older comments

0 Comments
Show -2 older comments Hide -2 older comments