Error using kmeans ---X must have more rows than the number of clusters.

Question

Luã Monteiro on 9 Mar 2023

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/1925740-error-using-kmeans-x-must-have-more-rows-than-the-number-of-clusters

Commented: Luã Monteiro on 9 Mar 2023

I have this data:

% by layer
Sw_MF16_Tf=LeArquivo('Water Saturation Time 2020-05-31.txt',81,58,20,2);
Sw_MF16_Ti=LeArquivo('Water Saturation Time 2013-05-31.txt',81,58,20,2);
% layer variation
delta_MF16_l8 = Sw_MF16_Tf(:,:,8)-Sw_MF16_Ti(:,:,8);
delta_MF16_l9 = Sw_MF16_Tf(:,:,9)-Sw_MF16_Ti(:,:,9);
% mean L8 e L9
MF16_L8_L9_mean = (delta_MF16_l8 + delta_MF16_l9)/2;
% normalized mean dSw layer
M_MF16 = mean(MF16_L8_L9_mean,'omitnan');
S_MF16 = std(MF16_L8_L9_mean,'omitnan');
N_MF16 =(((MF16_L8_L9_mean-M_MF16)./S_MF16)./100);
% data read
data = N_MF16 ;
% Perform clustering
k = 9;
[idx, centroids] = kmeans(data, k);
% Plot clustered data
figure;
scatter(data(:,1), data(:,2), [], idx, 'filled');
title(sprintf('K-Means Clustering with k = %d', k));
xlabel('Feature 1');
ylabel('Feature 2');
colormap(parula(k));
colorbar;
% Plot centroids
hold on;
scatter(centroids(:,1), centroids(:,2), 100, 'k', 'filled');
legend('Cluster 1', 'Cluster 2', 'Cluster 3', 'Centroids');

How do I cluster this by region

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

dpb on 9 Mar 2023

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/1925740-error-using-kmeans-x-must-have-more-rows-than-the-number-of-clusters#answer_1189475

Moved: dpb on 9 Mar 2023

Open in MATLAB Online

Wow! that's hard to read with all the obfuscated_with_underscores_and_suffixes variable names! Simplify, simplify!!

Anyway, stylistic points aside, in

...
% layer variation
delta_MF16_l8 = Sw_MF16_Tf(:,:,8)-Sw_MF16_Ti(:,:,8);
delta_MF16_l9 = Sw_MF16_Tf(:,:,9)-Sw_MF16_Ti(:,:,9);
MF16_L8_L9_mean = (delta_MF16_l8 + delta_MF16_l9)/2;

you've reduced down to a single plane of the mean of the differences of only two planes. Then, going on

% normalized mean dSw layer
M_MF16 = mean(MF16_L8_L9_mean,'omitnan');
S_MF16 = std(MF16_L8_L9_mean,'omitnan');
N_MF16 =(((MF16_L8_L9_mean-M_MF16)./S_MF16)./100);
% data read
data = N_MF16 ;

you've reduced further to a single vector of the means of each column which is a row vector.

% Perform clustering
k = 9;
[idx, centroids] = kmeans(data, k);
...

you've reduced data down to a single row by using the mean everywhere. kmeans treats a vector input as a column vector whichever orientation is passed, so the conclusion must be that there are fewer columns than 9 in your dataset. Looking at the input file, that appears to be true in that there are only six (6) columns.

What it might mean to do the means by the height of the array instead of by column, there's no way of knowing since we have no idea what the data actually are as to whether those would be meaningful effects/variables.

Since you didn't provide the function LeArquivo, nobody here could even try to poke around and see what they might make of the data -- the extreme presence of missing data would appear to be troubling.

3 Comments
Show 1 older commentHide 1 older comment

dpb on 9 Mar 2023

As above notes, there's not sufficient number of variables left to have that many groups, at least as you've structured the problem.

What's the variable that you think would segregate the data into such regions? There's certainly nothing in the other image that would indicate any reason to do so; there are a few isolates splotches of different heights (of whatever it is that is being plotted), but certainly no pattern that looks even remotely like your second image.

Luã Monteiro on 9 Mar 2023

MF16.xls

This is a water saturation map from an oil field which has some quilometers of extension. The ideia of clustering is to to a better well placement once it is impossible to explore a large oil field with few wells. The small patterns in geology and reservoir engeneering makes a huge difference in a well placement.

Here is a water saturation data example in .xls, if I could clustering it by region would be great.

Sign in to comment.

Error using kmeans ---X must have more rows than the number of clusters.

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

3 Comments
Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Community Treasure Hunt

Error using kmeans ---X must have more rows than the number of clusters.

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

3 Comments Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment