Asked by LifeSux SuperHard
on 17 Jun 2013

I'm trying to write a program that for a set of numbers with N groups of M numbers close (e.g. within 2) to each other will store each group separately and where each number M occurs in the original set. Groups **MAY** be adjacent in the set.

For example, the set

A = [300 301 300 350 350 350 400 399 401]

Would produce

R = [300] [1] [399] [8] [300] [3] [400] [7] [301] [2] [401] [9]

Or, preferably, the program would save into R only the start and end index of each group of numbers close to each other, e.g. for the above set

R = [300] [1] [399] [7] [300] [3] [400] [9] [301] [401]

Or, generally,

R = [number1]_ofgroup1 [Start]_ofgroup1 [number2]_ofgroup1 [End]_ofgroup1 ... [number_n]_ofgroup1 ... [number1]_ofgroupN [Start]_ofgroupN [number2]_ofgroupN [End]_ofgroupN ... [number_n]_ofgroupN

So far what I have been able to do is pick out the numbers close together and their location in the set. What I have NOT been able to do is group sets of numbers close to each other.

Now don't fixate on helping me figure out how to store this information into cells, I can do that pretty easily. Really what is most important to me is just how to separate the groups, like, with a for loop, or whatever.

% Test Set % Two sets of badness (307/308, 389/390), non-adjacent B = [ 267 267 267 267 267 267 267 267 308 308 308 308 308 308 350 350 350 349 349 350 389 390 391 391 391 389 389 390 390 ];

A = sort(B);

%Find unique numbers in A U = unique(A); %Unique in A

% Visual Check (Print U, A, respective indices) for i = 1:length(U) fprintf('%d %d\n', U(i), i); end fprintf('U\n\n'); for i = 1:length(A) fprintf('%d %d\n', A(i), i); end fprintf('A\n\n');

%Find badness l = 0; %Index: Number of indicies in A identified as corresponding to offending numbers. %Alternatively, how much bad, total? badL = []; %Array in which to store badness locations for i = 1:length(U) for j = 1:length(A) a = abs(U(i) - A(j)); if a <=2 && a ~= 0 %Within 2 of each other l = l + 1; badL(l) = j; %#ok<*SAGROW> end end end

% Above seems to pick up duplicate bad indices when any #Bad couple differ % by exactly two, not sure how to fix withiin the loop, but the problem does % seem to be solved by "unique-ing" badL. badness = sort(unique(badL)); %Really, really hard to work with unsorted bad indicies

%Test initialization of R (just to show how I want the data formatted) for i = 1:3 %Group (N) for k = 1:4 %Bad numbers R{1,i}{1,1}{k,1} = []; end for m = 1:2 %Start and end indices for group N R{1,i}{1,2}{m,1} = []; end end

*No products are associated with this question.*

Answer by Roger Stafford
on 17 Jun 2013

[B,p] = sort(A); f = find([true,diff(B)>2,true])'; R = [f(1:end-1),f(2end)-1]; % Start and end indices w.r. to B

This isn't quite what you asked for. The indices in R refer to the sorted version of A, namely B. The vector 'p' could be used to find your way back to A. However what was a consecutive interval in B will not necessarily be one in A. I leave you with that problem.

LifeSux SuperHard
on 17 Jun 2013

Yeah that's fine. I'll make that work for me.

I'm also just curious, how would you do that explicitly? Like if you couldn't use a bunch of matlab functions, but instead like only unique, round, and sort.

I'm sorry I'm just more interested in learning how to do this stuff explicitly than rely on some tool someone else, someone else who is a much, much better programmer than me, wrote for me.

But I'm certainly not going to ignore what you suggested.

Answer by Roger Stafford
on 17 Jun 2013

The idea with the 'find' operation is to recognize a change of more than 2 in successive elements of the sorted list B. The 'diff' function just takes successive differences in B. You could just as well write

B(2:end)-B(1:end-1)

instead of using 'diff' or an equivalent for-loop operation. You need the extra 'true' values at the beginning and the end to furnish the first start and the last end values which 'diff' wouldn't catch. The 'find' function itself could be reproduced with the appropriate for-loop, something like this:

t = [true,diff(B)>2,true]; f = []; % Start with f empty for k = 1:length(t) if t(k) f = [f,k]; % Append the index k to the f vector wherever t(k) is true end end

Then it is easy to see that f(1:end-1) is a list of the start indices and f(2:end)-1 a list of the end indices. The use of logical vectors for inputs to 'find' makes it a very useful function indeed. You should learn to be comfortable with it. The need for it pops up quite frequently in programming matlab.

Does that give you a feeling for the function calls involved?

Answer by Andrei Bobrov
on 18 Jun 2013

B = [ 267 267 391 391 391 267 267 267 308 308 308 308 308 308 350 350 350 349 349 350 389 390 391 391 391 389 389 390 390 ];

[a,c,c] = unique(B); t = [true;diff(a) > 2]; tt = cumsum(t); q = numel(B); out = accumarray([tt,ones(numel(a),1);tt(c),2*ones(q,1)],[a;(1:q)'],[],@(x){x});

Answer by Jan Simon
on 18 Jun 2013

Just to show how to pre-allocate efficiently, when the maximum number is known in advance:

badL = zeros(1, length(U)*length(A)); for i = 1:length(U) for j = 1:length(A) a = abs(U(i) - A(j)); if a <=2 && a ~= 0 %Within 2 of each other l = l + 1; badL(l) = j; end end end badL = badL(1:l);

This reduces the memory footprint massively, when the result contains more than a few elements. E.g. if the result contains 45 elements, letting it grow iteratively allocates `sum(1:45)=1035` elements and copies 990 elements. Then pre-allocating 1000 elements and cutting the array at the end should be more efficient already.

Opportunities for recent engineering grads.

## 0 Comments