here you have the x y coordinate of the blu dots. The original dataset is too big to be shared here.

# Mean shift clustering - issue with finding the center of my clusters

13 views (last 30 days)

Show older comments

Hi all, as you can see from the attached image, I cannot detect the center of my dots (in blu) by using the mean shift clustering. I will report the code below and I want to point out that I got the same result also chaining the bandwidht with any kind of number. Thanks a lot for helping me.

my code:

%%

% Import the data

% Prompt the user to choose a file

[filename, filepath] = uigetfile('*.txt', 'Select a text file');

file_name = filename;

remove = '.txt';

file_name_clean = strrep(file_name, remove, '');

%%

% Plotting

plot_name = ['Intensity_' file_name_clean '.svg'];

% Import data from text file

opts = delimitedTextImportOptions("NumVariables", 28);

opts.DataLines = [2, Inf];

opts.Delimiter = "\t";

opts.VariableNames = ["channel_name", "x", "y", "x_c", "y_c"];

opts.SelectedVariableNames = ["x", "y"]; % Only select the x and y columns

opts.VariableTypes = ["string", "double", "double", "double", "double"];

opts.ExtraColumnsRule = "ignore";

opts.EmptyLineRule = "read";

% Construct the full file path

file_path = fullfile(filepath, file_name);

data = readmatrix(file_path, opts);

% Perform Mean Shift clustering

bandwidth = 50; % bandwidth parameter for Mean Shift

[cluster_centers, data2cluster, cluster2dataCell] = MeanShiftCluster(data, bandwidth);

% Plotting the data with logarithmic x-axis and error bars for averages and standard deviations

figure;

plot(data(:,2), data(:,1), '.', 'MarkerSize', 10, 'DisplayName', 'XY coordinates');

hold on;

% Set x-axis limit starting from 0

xlim([0, max(data(:,2))]);

% Set y-axis limit starting from 0

ylim([0, max(data(:,1))]);

% Plot cluster centers

hold on;

plot(cluster_centers(:,2), cluster_centers(:,1), 'kx', 'MarkerSize', 15, 'LineWidth', 3, 'DisplayName', 'Cluster Centers');

hold off;

xlabel('X');

ylabel('Y');

title('Mean Shift Clustering');

legend('XY coordinates', 'Cluster Centers');

### Answers (3)

Mathieu NOE
on 13 May 2024

hello Marco

seems that your issue is simply because the function works for row oriented data

see those lines in MeanShiftCluster.m

%**** Initialize stuff ***

[numDim,numPts] = size(dataPts);

so, with your provided data file, I needed to transpose the data array

[cluster_centers, data2cluster, cluster2dataCell] = MeanShiftCluster(data', bandwidth); % NB : data' (transposed)

full code :

%%

clc

clearvars

close all

% Import the data

% Prompt the user to choose a file

[filename, filepath] = uigetfile('*.txt', 'Select a text file');

file_name = filename;

remove = '.txt';

file_name_clean = strrep(file_name, remove, '');

%%

% Plotting

plot_name = ['Intensity_' file_name_clean '.svg'];

% Import data from text file

opts = delimitedTextImportOptions("NumVariables", 28);

opts.DataLines = [2, Inf];

opts.Delimiter = "\t";

opts.VariableNames = ["channel_name", "x", "y", "x_c", "y_c"];

opts.SelectedVariableNames = ["x", "y"]; % Only select the x and y columns

opts.VariableTypes = ["string", "double", "double", "double", "double"];

opts.ExtraColumnsRule = "ignore";

opts.EmptyLineRule = "read";

% Construct the full file path

file_path = fullfile(filepath, file_name);

% data = readmatrix(file_path, opts);

data = readmatrix(file_path); % <= works better in this case without opts

% Perform Mean Shift clustering

bandwidth = 50; % bandwidth parameter for Mean Shift

[cluster_centers, data2cluster, cluster2dataCell] = MeanShiftCluster(data', bandwidth); % NB : data' (transposed)

% Plotting the data with logarithmic x-axis and error bars for averages and standard deviations

figure;

plot(data(:,2), data(:,1), '.', 'MarkerSize', 15, 'DisplayName', 'XY coordinates');

hold on;

% Set x-axis limit starting from 0

xlim([0, max(data(:,2))]);

% Set y-axis limit starting from 0

ylim([0, max(data(:,1))]);

% Plot cluster centers

hold on;

% plot(cluster_centers(:,2), cluster_centers(:,1), 'kx', 'MarkerSize', 15, 'DisplayName', 'Cluster Centers');

plot(cluster_centers(2,:), cluster_centers(1,:), 'kx', 'MarkerSize', 15, 'DisplayName', 'Cluster Centers');

hold off;

xlabel('X');

ylabel('Y');

title('Mean Shift Clustering');

legend('XY coordinates', 'Cluster Centers');

##### 8 Comments

Mathieu NOE
on 16 May 2024

I have to say I'm not an expert in image processing (and I don't have the required toolbox either), but there are many answers on this forum about how to detect circles or blobs in images and find their centers

and probably dozens more examples if you search in the FEX

Mathieu NOE
on 16 May 2024

Now probably my best contribution so far , and I post it here with maybe the hope that you will find it interesting enough to accept it ! :)

so I followed my idea to split the data in smaller chuncks , => splitting along the x axis only and repeating the process in each x window . then concatenate the cluster centers results ;

there is something I noticed though, is that you may have some duplicates at the junction between two data batches , so the trick here was to apply the same process once again on the cluster centers concatenation result, and this way you get the "unique" centers.

I also tried with different split factor (x_inter in the code below) , to see when we achieve the best performance between the clsutering process and the time to concatenate the results - there is a optimum to find :

the result on your data file are :

x_inter = 10; Elapsed time is 5.850134 seconds.

x_inter = 50; Elapsed time is 2.427936 seconds.

x_inter = 100; Elapsed time is 2.262699 seconds.

x_inter = 200; Elapsed time is 2.621037 seconds.

x_inter = 500; Elapsed time is 5.565575 seconds.

here the code :

%%

clc

clearvars

close all

% Import the data

% Prompt the user to choose a file

% [filename, filepath] = uigetfile('*.txt', 'Select a text file');

filepath = pwd;

filename = 'selected_dataset.txt';

remove = '.txt';

file_name_clean = strrep(filename, remove, '');

%%

% Plotting

plot_name = ['Intensity_' file_name_clean '.svg'];

% Import data from text file

opts = delimitedTextImportOptions("NumVariables", 28);

opts.DataLines = [2, Inf];

opts.Delimiter = "\t";

opts.VariableNames = ["channel_name", "x", "y", "x_c", "y_c"];

opts.SelectedVariableNames = ["x", "y"]; % Only select the x and y columns

opts.VariableTypes = ["string", "double", "double", "double", "double"];

opts.ExtraColumnsRule = "ignore";

opts.EmptyLineRule = "read";

% Construct the full file path

file_path = fullfile(filepath, filename);

% data = readmatrix(file_path, opts);

data = readmatrix(file_path);

%% Split the big data set in smaller chunks

x_inter = 100; % split the data along x intervals

minx = min(data(:,2));

maxx = max(data(:,2));

dx = (maxx - minx)/x_inter;

cx_all = [];

cy_all = [];

% Perform Mean Shift clustering

bandwidth = 50; % bandwidth parameter for Mean Shift

tic

for ck = 1:x_inter

xmin = minx+(ck-1)*dx;

xmax = xmin+dx;

ind = (data(:,2)>=xmin) & (data(:,2)<xmax);

data_batch = data(ind,:);

if ~isempty(data_batch) % if you split by too much, data_batch may be empty - so check it !

% Perform Mean Shift clustering

[cluster_centers, ~, ~] = MeanShiftCluster(data_batch', bandwidth); % NB : data_batch' (transposed) (row oriented array)

cx = cluster_centers(2,:);

cy = cluster_centers(1,:);

cx_all = [cx_all cx];

cy_all = [cy_all cy];

end

end

% as they may be some redondant cluster centers due to the data splitting

% process, we repeat the MeanShiftCluster process once more on the result

[cluster_centers, ~, ~] = MeanShiftCluster([cx_all;cy_all], bandwidth);

cx = cluster_centers(1,:);

cy = cluster_centers(2,:);

toc

% Plotting the data with logarithmic x-axis and error bars for averages and standard deviations

figure;

plot(data(:,2), data(:,1), '.', 'MarkerSize', 15, 'DisplayName', 'XY coordinates');

hold on;

% Set x-axis limit starting from 0

xlim([0, max(data(:,2))]);

% Set y-axis limit starting from 0

ylim([0, max(data(:,1))]);

% Plot cluster centers

hold on;

% plot(cluster_centers(2,:), cluster_centers(1,:), 'kx', 'MarkerSize', 15, 'DisplayName', 'Cluster Centers');

plot(cx, cy, 'kx', 'MarkerSize', 15, 'DisplayName', 'Cluster Centers');

hold off;

xlabel('X');

ylabel('Y');

title('Mean Shift Clustering');

legend('XY coordinates', 'Cluster Centers');

##### 7 Comments

Image Analyst
on 21 May 2024

How did you read in selected_dataset.rtf? Readmatrix() does not like that extension.

I don't think dbscan should take a long time. I'm attaching a demo of it. It should work for random (x,y) locations but if you have data in a regular grid, such that the locations can be considered pixels on an image, then you can use image analysis to find things like centroids, areas, diameters, etc.

##### 0 Comments

### See Also

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!