why doesnt my simple XOR gate using backpropagation work?

Question

Hossein HasanPour on 7 Sep 2015

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/240486-why-doesnt-my-simple-xor-gate-using-backpropagation-work

Commented: Ahmed Atef on 8 Jun 2021

Hello everyone, I have been trying to create a simple neural network for solving XOR problems, without any success for the couple of days now. I have tried different flavors, with biases, without biases, with biases as weights , not a single one worked! Here is my first try:

% XOR calculator using BackPropagation
clear all;clc;
%specify input samples (creating trainings set)
pattern = [1 1;
           0 0;
           1 0;
           0 1];
[patternRows,patternCols] = size(pattern);
%creating targets or desired answers for each set
target = [0,0,1,1];
%initializing weights with random values
w1 = [-1 1 ;    %neuron 1
      -1 2];    %neuron two
w2 = [-2,0.5];  %outputneuron
%bias
b12 = [1 -1]; % for neuron one and two respectively
b3 = -1.5;    % for neuron three (output neuron)
lr  = 0.01;     %learning rate
a12 = [2 1];    %layer 1 outputs (neuron one and twos output respectively)
a3  = 1;        %layer 2 output  (neuron 3 output)
delta = 0;
i=1;
while (i<1000 || error ~=0)
      for row_number = 1: patternRows
          for neuron = 1:2
              %calculate the output of each neuron
              a12(neuron) = logsig( pattern(row_number,:) * w1(neuron,:)' + b12(neuron) );
          end
          %calculate the output
          a3 = hardlim(w2(:)'*a12(:)+ b3 );
          %calculate the error
          error = target(row_number) - a3;
          %calculate the local gradient (sensivity) 
          %which is f'(n) * e(n) for output layer
          SM = a3*error;
          %calculating delta which is learningrate * localgradient*inputs
          delta = lr * SM * a12;
          %updating the weights of last layer
          w2 = w2 + delta;
          %updating the last layers bias           
          b3 = b3 + lr * SM;
          %calculating local gradients(sensivities) for hiddent layers
          %which is f(n)*SumOfNextLayers(localgradient*weights)
          Sm = [a12(1)*(1-a12(1)) 0;
                0  a12(2)*(1-a12(2)) ] * w2' * SM ;
          %calculating delta which is the same as before 
          delta = (lr .* Sm)* pattern(row_number,:);
          %updating the weights and biases of hidden layer
          w1 = w1 + delta;
          b12 = b12 + (lr * Sm)';
           sss = sprintf('%d ) a3= %d # e= %d # w1=%d,%d # w2 =%d,%d \n',i,a3 ,error,w1(1),w1(2),w2(1),w2(2));
           fprintf(' %s ',sss);
      end
     i= i+1;
  end

------------------- This is another implementation (without any biases)

%XOR without any biases
% training set
ts=[0 0;
    1 1;
    1 0;
    0 1];
d=[0 0 1 1];
% weights for hidden layers (each row represents wiegths for the corrosponding neuron
wh=[rand() rand();
    rand() rand()];
% wiegths from hidden layer to output layer
wo=[rand() rand()];
% output for neuron 1 and 2 and bias for neuron3
a=[1 1];
% output for neuron 3
a3=0;   
% learning rate
n=0.1;  
iteration=10;
i=0;
while (i<iteration || e~=0)
    for tindex=1:4
          for neuron=1:2
              a(neuron)= logsig(wh(neuron,:)*ts(neuron,:)');
          end
          a3 = hardlim(wo(1)*a(1)+wo(2)*a(2))
          e = d(tindex) - a3
          grad_out = a3*e;
          deltaW = n*grad_out*a;
          wo = wo+deltaW;
          grad_h = a(1)*(1-a(1)) * grad_out*wo(1)'
          deltaW = n*grad_h*wh(1,:);
          wh(1,:)= wh(1,:) +deltaW
          grad_h = a(2)*(1-a(2)) * grad_out*wo(2)'
          deltaW = n*grad_h*wh(2,:)
          wh(2,:) = wh(2,:) +deltaW
      end
      i=i+1;
  end

---------------------------- And this is the last implementation which treats biases like weights

%XOR calculator ( biases as weights )
% training set
% (the last number is bias coefficent which is always 1
ts=[0 0 1;1 1 1;
    1 0 1;0 1 1];
% targets or desired outputs
d=[0 0 1 1];
% weights for hidden layers
% bias ,weight1, weigth2
wh=[rand() rand() rand() ;
    rand() rand() rand()];
% wiegths from hidden layer to output layer
% bias, weight1, weigth2
wo=[rand() rand() rand()];
%output for neuron 1 and 2 and bias for neuron3
a=[1 1];
%output for neuron 3
a3=0;
%learning rate
n=0.1;
iteration=100;
i=0;
while (i<iteration || e~=0)
    for tindex=1:4
          for neuron=1:2
              a(neuron)= logsig(wh(neuron,:)*ts(neuron,:)');
          end
          output_neuronsInput = wo(1)*a(1)+wo(2)*a(2)+wo(3)*1;
          a3 = hardlim(output_neuronsInput);
          e = d(tindex) - a3
          % neuron 3(output)
          localgrad_out = a3*e;
          deltaW = n * localgrad_out * output_neuronsInput;
          wo = wo + deltaW;
          % neuron 1
          grad_h = a(1)*(1-a(1)) * localgrad_out * wo(1)'
          deltaW = n * grad_h * wh(1,:);
          wh(1,:) = wh(1,:) + deltaW
          % neuron 2
          grad_h = a(2)*(1-a(2)) * localgrad_out*wo(2)'
          deltaW = n * grad_h * wh(2,:)
          wh(2,:) = wh(2,:) + deltaW
      end
      i=i+1;
  end