How can i implement the NN algorithm ?

2 views (last 30 days)
I have a database http://archive.ics.uci.edu/ml/machine-learning-databases/wine/
There are 3 classes in my database, 1,2 and 3. 178 total instances. I want to be able to choose a single test vector and calculate the euclidian distance ( example: sqrt((x2-x1)^2+(y2-y1)^2) ) between the mean vector from each class and this test vector. Then i aught to compare each calculated distance and choose de smallest. The smallest distance will point to the class where our test vector belongs.
Is there anyone that can give me a piece of advice or some link to a NP example usage?

Accepted Answer

Mohammad Abouali
Mohammad Abouali on 23 Jan 2015
Edited: Mohammad Abouali on 23 Jan 2015
Some code like this would do the job:
%%Loading data
load('wine.data');
% first column stores the wine class according to wine.names file
nClass=max(wine(:,1));
%%Getting the mean of each class for the 13 parameters
meanEachClass=arrayfun(@(x) mean( wine( wine(:,1)==x ,2:end) ), 1:nClass,'UniformOutput',false);
%%Now checking the euclidean distance of a sample
% relative to the mean of each class
nSampleToTest=10;
for i=1:nSampleToTest
% Randomly choosing a sample
sampleNo=randi(size(wine,1));
sample=wine(sampleNo,2:end);
% calculate the Eudlidian distance to each class.
distances=arrayfun(@(x) norm(sample-meanEachClass{x}), 1:nClass, 'UniformOutput',true);
disp(sprintf('Sample #%d',sampleNo))
disp(sprintf('Distance: \n Class 1: %f \n Class 2: %f \n Class 3: %f \n',distances(1),distances(2),distances(3)));
disp(sprintf('Based on distance, Sample seems to belong to class %d\n', find(distances==min(distances))))
disp(sprintf('According to the database, sample belongs to class %d\n',wine(sampleNo,1)))
end
Sample output of the above code:
Sample #171
Distance:
Class 1: 605.816060
Class 2: 10.319480
Class 3: 119.987114
Based on distance, Sample seems to belong to class 2
According to the database, sample belongs to class 3
Sample #117
Distance:
Class 1: 621.072378
Class 2: 26.007955
Class 3: 135.695649
Based on distance, Sample seems to belong to class 2
According to the database, sample belongs to class 2
Sample #7
Distance:
Class 1: 174.614487
Class 2: 770.521566
Class 3: 660.160087
Based on distance, Sample seems to belong to class 1
According to the database, sample belongs to class 1
Sample #152
Distance:
Class 1: 635.785710
Class 2: 43.956064
Class 3: 150.475079
Based on distance, Sample seems to belong to class 2
According to the database, sample belongs to class 3
Sample #167
Distance:
Class 1: 420.824925
Class 2: 176.469688
Class 3: 66.248364
Based on distance, Sample seems to belong to class 3
According to the database, sample belongs to class 3
Sample #121
Distance:
Class 1: 490.840691
Class 2: 105.514335
Class 3: 8.175599
Based on distance, Sample seems to belong to class 3
According to the database, sample belongs to class 2
Sample #135
Distance:
Class 1: 466.213558
Class 2: 130.910111
Class 3: 25.163417
Based on distance, Sample seems to belong to class 3
According to the database, sample belongs to class 3
Sample #133
Distance:
Class 1: 555.828809
Class 2: 40.964218
Class 3: 69.989176
Based on distance, Sample seems to belong to class 2
According to the database, sample belongs to class 3
Sample #70
Distance:
Class 1: 400.230442
Class 2: 206.399030
Class 3: 102.402864
Based on distance, Sample seems to belong to class 3
According to the database, sample belongs to class 2
Sample #117
Distance:
Class 1: 621.072378
Class 2: 26.007955
Class 3: 135.695649
Based on distance, Sample seems to belong to class 2
According to the database, sample belongs to class 2
  5 Comments
Mohammad Abouali
Mohammad Abouali on 23 Jan 2015
Edited: Mohammad Abouali on 23 Jan 2015
Q!: is there a way to display the points of each class and the points of the chosen vector to describe visually more clear that the chosen vector is nearer to a class than another?
Remember that each sample has 13 properties. So you are working in 13 dimensional space. At most I think you can take 3 of the 13 properties and plot them using plot3 command. But not all the dimension at the same time. (unless if you come up with a nice creative approach to show the 13 dimensional space in 3D).
2D Scatter plot are also very commonly used. There are 78 possible combination though in your case.
Post this as a new question; let's see what other people respond. We can both learn from other people's responses.
Q2: How can i calculate the percentage of the algorithm precision?
There are multiple approach to determine the accuracy of a classification method. Perhaps in your case, you can use the Confusion Matrix approach. You can classify all your samples, you get a column which has 1, 2, or 3 in it for each sample determining how it was classified. Then compare this vector with the first column of your data which also has 1,2, and 3; and get the number of false positive, false negative, true positive, true negatives. If you look at the link above you can find some more statistics that you can calculate from this. MATLAB also has a function to calculate confusion matrix called confusionmat()
Note that the above link is for the binary classification, but the same concept can be used for more classes. So, you can have Confusion matrix for more classes. For an example click here.
Stefan Olaru
Stefan Olaru on 23 Jan 2015
Thank you v much, Mohammad!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!