# Data filtering(Give a constraint to the length of each index )

5 views (last 30 days)
Jaehwi Bong on 19 Aug 2019
Edited: Jan on 19 Aug 2019
I have this kind of data. First column and second column refer to an index and x value relatively.
data = [ 1 201; 1 202; 2 301; 2 313; 2 311; 3 401; 3 452; 3 433; 3 405; 4 504; 4 303; 4 604; 4 703; 5 600; 5 700; 5 606; 5 703; 5 905; 5 444;];
For example, I want to delete from after 4th data in each index if the number of index is over 4.
The length of 3rd, 4th and 5th index is 4, 4, 6 relatively in this example data. I'd like to keep their data only from 1st to 3rd values.
Every index has less number than 4.
data_filtered = [1 201; 1 202; 2 301; 2 313; 2 311; 3 401; 3 452; 3 433; 4 504; 4 303; 4 604; 5 600; 5 700; 5 606;];
If anyone can help, it would be greatly appreciated.
Thank you!

#### 1 Comment

Jan on 19 Aug 2019
In other words: You want to keep only the first n-1 rows for each value in the first column.

Jan on 19 Aug 2019
Edited: Jan on 19 Aug 2019
There are more efficient ways, but starting with a simple loop is a good apporach:
data = [ 1 201; 1 202; 2 301; 2 313; 2 311; 3 401; 3 452; 3 433; ...
3 405; 4 504; 4 303; 4 604; 4 703; 5 600; 5 700; 5 606; 5 703; ...
5 905; 5 444];
nRow = size(data, 1);
keep = false(1, nRow);
index = -1; % any not occurring index
n = 4; % keep n-1 indices
for k = 1:nRow
if data(k, 1) ~= index
index = data(k, 1);
count = 0;
end
count = count + 1;
keep(k) = count < n;
end
data_filtered = data(keep, :)
A vectorized method:
index = data(:, 1);
D = [false ; diff(index(:)) ~= 0] ;
Start = [1; find(D)]; % Start indices of next run
N = ones(size(D));
N(D) = 1 - diff(Start);
N = cumsum(N);
data_filtered = data(N < 4, :)