Hi Honza,
if you are referring to the case where data is not linearly separable, then it is true that the computation of the boundary and support vectors may seem strange.
Typically, we would construct an SVM with some tolerance for misclassified data; for example, we could compute a 'soft margin' (See Vapnik and Cortes, 1995), which optimises a trade-off between the maximum margin decision boundary and a small penalty for misclassification. In my program, I have set the misclassification penalty to be very large - since this can affect the position of the decision boundary even when data is linearly separable. I do this so that you can see the optimal separating hyperplane.
Obviously this is not representative of how you would solve a real non linearly separable problem - if you suspect that data is noisy, it would be appropriate to change the 'C' parameter in a soft-margin SVM. Alternatively, if you suspect that data is complex and non-linear, it may be better to use a non-linear kernel (e.g. rbf).
Regardless of where data is linear and/or separable, you should find that some data points that are correctly classified will be support vectors. The support vectors correspond to the data points that actually define where the decision boundary is. They comprise any misclassified datapoints (which will correspond to penalty terms in the soft-margin problem), and all the correctly classified datapoints which lie exactly the minimum distance away from the margin.
I hope this answers your question.
Comment only
18 Nov 2011
SVM Demo
An interactive demo of how an SVM works, with comparison to a perceptron
Thanks for the demo. However, I'm not sure that the margin is estimated correctly in the non-linear case. Is it allright that correctly classified points (on the right side of the margin) act as support vectors? Thanks for clarification
Comment only