I need to find the center points of a clusters. I used dbscan for clustering. Now I need to find the core points of these clusters. I used the corepts,but it gives the logical array. How can I find the core points of those clusters or atleast a point contained in those clusters. Anybody please help me.
[idx, corepts] = dbscan(asc,epsilon,minpts);

7 Comments

Rik
Rik on 25 Feb 2020
Edited: Rik on 25 Feb 2020
(probably a follow-up to this previous question)
Have you read the documentation? I don't have the stat toolbox myself so I can't test it, but it looks like the logical array should be easy to use. Have you ever used logical indexing?
I read the documentation. The corepts is only contained zeros and ones from that how can I find get the core points in the numerical form.I never used the logical indexing before.
Logical indexing works like this:
v=[9 6 3 8];
L=[true false true false];
v(L)
You could use find to convert the logical array into indices, but that step in not necessary:
find(L) %returns [1 3]
Thank you sir. But how can I get the values of true,like C=[9,3].
I am getting some numbers repeating. I am getting numbers like 0 0 0 1 1 1 2 2 2 2 2 3 3 3 4 4 up to 9. Is a true value contains the same numbers?
Sorry for the late doubt.
I used a data set glass and I get some core points that are the 1st column of the dataset. The core point is the center of the clusters and how is it? Please give me an answer.
data=xlsread('glass.xlsx');
minpts=6;
epsilon=4;
[idx, corepts] = dbscan(data,epsilon,minpts);
gscatter(data(:,1),data(:,2),idx);
core=data(corepts);
Thank you.

Sign in to comment.

 Accepted Answer

As discussed here, https://stackoverflow.com/questions/52364959/how-to-find-center-points-of-dbscan-clusrering-in-sklearn and here https://www.quora.com/Is-there-anything-equivalent-to-a-centroid-in-DBSCAN, dbscan does not have a center of the cluster. However, it does generate core points. You can get the core points by modifying the line in your code
core = data(corepts, :);
It will give you all rows conntaining core points. Similarly you can get the cluster number of these core points
corr_idx = idx(corepts, :);
As an example, try this
data=xlsread('glass.xlsx');
minpts=6;
epsilon=4;
[idx, corepts] = dbscan(data,epsilon,minpts);
fig1 = figure();
gscatter(data(:,1),data(:,2),idx);
fig2 = figure();
core=data(corepts, :);
corr_idx = idx(corepts, :);
gscatter(core(:,1),core(:,2),corr_idx);

4 Comments

Thank you sir.
Is there is any method to find the center of clusters?
I used the above code. After that, I tried to find the 5 nearest elements based on the smallest distance and group it. The core points contain 1804 elements but the group number only depends on the threshold value we give(the nearest elements that we want to get.)
clc;
clear;
data=xlsread('glass.xlsx');
minpts=6;
epsilon=4;
[idx, corepts] = dbscan(data,epsilon,minpts);
fig1 = figure();
gscatter(data(:,1),data(:,2),idx);
fig2 = figure();
core=data(corepts, :);
corr_idx = idx(corepts, :);
gscatter(core(:,1),core(:,2),corr_idx);
[i,id] = mink(abs(data(:)-core(:).'),25);
clusters = data(id);
link=linkage(clusters);
figure(3)
dendrogram(link)
I am not sure what is the purpose of these lines,
gscatter(core(:,1),core(:,2),corr_idx);
[i,id] = mink(abs(data(:)-core(:).'),25);
clusters = data(id);
link=linkage(clusters);
figure(3)
dendrogram(link)
If you are trying to calculate the distance between the data points and core points, then the statement seems to be incorrect.
As mentioned on this link https://stackoverflow.com/questions/52364959/how-to-find-center-points-of-dbscan-clusrering-in-sklearn the dbscan does not have a cluster center, but if you still want to a center, you can calculate it yourself using some method. For example, I modified the code to find the cluster center by taking the mean value of the cluster element.
clc;
clear;
data=xlsread('glass.xlsx');
minpts=6;
epsilon=4;
[idx, corepts] = dbscan(data,epsilon,minpts);
fig1 = figure();
gscatter(data(:,1),data(:,2),idx);
fig2 = figure();
ax = axes();
hold on;
core=data(corepts, :);
core_idx = idx(corepts, :);
gscatter(core(:,1),core(:,2),core_idx);
centers = splitapply(@(x) mean(x, 1), core, core_idx);
gscatter(centers(:,1), centers(:,2), (1:6)');
for i=1:6
ax.Children(i).Marker = 'x';
ax.Children(i).MarkerSize = 30;
ax.Children(i).LineWidth = 10;
end
The cluster elements are shown are marked on the figure.
Thank you Sir, It worked.
Please tell me if that is incorrect(calculate the distance between the data points and core points and find some nearest elements) how can I do that?
Is it possible to take 5 different elements from each cluster?
I think you misunderstood the meaning of core points. All the points shown in the image in my last comment are the core points of that cluster. The core point in dbscan does not imply the center of the cluster. If you want to find the five closest point from the center of the cluster (center as I calculated in the last comment by taking an average of the cluster), then you can try the following code
clc;
clear;
data=xlsread('glass.xlsx');
minpts=6;
epsilon=4;
[idx, corepts] = dbscan(data,epsilon,minpts);
fig1 = figure();
gscatter(data(:,1),data(:,2),idx);
fig2 = figure();
ax = axes();
hold on;
core=data(corepts, :);
core_idx = idx(corepts, :);
gscatter(core(:,1),core(:,2),core_idx);
centers = splitapply(@(x) mean(x, 1), core, core_idx);
gscatter(centers(:,1), centers(:,2), (1:6)');
for i=1:6
ax.Children(i).Marker = 'x';
ax.Children(i).MarkerSize = 30;
ax.Children(i).LineWidth = 10;
end
clusters = splitapply(@(x) {x}, core, core_idx);
closest_points = cell(1,5);
closest_idx = cell(1,5);
for i = 1:length(clusters)
[~, index] = mink(sum((clusters{i}-centers(i,:)).^2,2), 5, 1);
closest_points{i} = clusters{i}(index,:);
closest_idx{i} = i*ones(size(closest_points{i},1),1);
end
closest_points = cell2mat(closest_points');
closest_idx = cell2mat(closest_idx');
g = gscatter(closest_points(:,1), closest_points(:,2), closest_idx);
[g.MarkerSize] = deal(30);
[g.Color] = deal([0 0 0]);
The result is, the closet points are shown in black. Note that the distance is calculated in all 11 dimensions, so points may not appear close in 2 dimensions, but they are overall closer to center on considering 11 dimensions.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!