Finding clusters in 1D array that contains certain number of entries with specific value

11 views (last 30 days)
Hello,
I am wondering what would be the efficient way to find clusters of data in 1D array that contains certain number of specific value. For instance, I have a binary array that looks like this
data = [0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 0 1 0 0 0 1 0 1]'
I would like to find a cluster that satisfy the following conditions
1) length of the cluster is fixed as 'L' (let's say L=7)
2) cluster should contain at least 'p' number of ones (let's say p=3)
3) if some clusters overlap or are located right next to each other, they can merge to create one bigger cluster
After clustering, I should get the following answer (answers are marked with {})
data = [0 0 0 {0 1 0 1 0 0 1 0} 0 0 0 {0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 0 1 0} 0 0 1 0 1]'
Here, you can see some clusters merged to form a bigger cluster.
In the end, I need to know the index of the start and end point of the each cluster as well as the total number of clusters. This will give me the following answers.
1) number of clusters = 2
2) index of initial value in each cluster = [4 15]
2) index of last value in each cluster = [11 34]
The actual array I need to work with is quite large so I want to see if there's an efficient way to compute this.
Thank you in advance!
  11 Comments
Image Analyst
Image Analyst on 22 Apr 2020
Edited: Image Analyst on 22 Apr 2020
I'm afraid I still don't understand. Perhaps you can use cumsum() to count the 1's as you go along:
data = [0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 0 1 0 0 0 1 0 1]
c = cumsum(data)
So why aren't the final 101 in a cluster? Is it because somehow cluster 2 and 3 got merged to form one giant cluster of more than 7 elements and there just weren't 7 elements remaining after that to form another cluster?
And why does cluster 1 have 8 elements instead of 7? And why is cluster #1 {0 1 0 1 0 0 1 0} instead of {0 0 1 0 1 0 0 1} or {1 0 1 0 0 1 0 0}
And why do you need to do this quirky thing anyway? What is the real world use case?
Tae Lim
Tae Lim on 22 Apr 2020
Thank you all for your response. Again, it is simply my mistake that I didn't see the trailing numbers will form yet another cluster. So the correct answer will be
[0 0 0 {0 1 0 1 0 0 1 0} 0 0 0 {0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 0 1 0 0 0 1 0 1}]'
Image Analyst; This actually came from real world application of rail tie replacement. You can either spot replace failed tie individually or remove the whole section with a dedicated machine. The use of dedicated machine is cheaper and thus preferred, but it may require replacing adjacent ties that are still good. The window 'L' denotes minimum length required to use the dedicated machine. The '0' in data indicates good ties and '1' are bad ties.
Sindar and Mohammad Sami; 'movsum' seems to be a great function to start with! I will start working with the function.
Thank you!

Sign in to comment.

Answers (0)

Tags

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!