Can i vectorize my loop

Question

0 votes

I have an array of numbers that i wish to reduce based on whether the previous kept value is within a defined tolerance. I don't believe that the diff() function does what i need and therefore I think i will need something more bespoke. Currently I perform this task using the following for loop but this is very slow when processing large amounts of data.

MinInterval=0.1
KeptData(1,1)=OriginalData(1,1);
n=1;
for p=1:length(OriginalData)-1
    difference=OriginalData(p+1,1)-KeptData(n,1);
    if difference<=MinInterval
       %do nothing
    else
       n=n+1;
       KeptData(n,1)=OriginalData(p+1,1);
    end
end

For example,

OriginalData=[1; 2; 3; 3.1; 3.2; 3.3; 3.4; 3.5; 3.6; 4; 5];

assuming a tolerance of 0.1 would become

KeptData=[1; 2; 3; 3.2; 3.4; 3.6; 4; 5]

7 Comments
Show 5 older comments Hide 5 older comments

Nigel on 29 Nov 2013

Sorry for the delay responding, I've been out of the office. The behavior I require is exactly that shown by the example. 3.1 is close enough to 3 to be ignored but 3.2 is has a difference of 0.2 to 3 and therefore i want to keep it. 3.3 is within 0.1 of 3.2 and is ignored but 3.4 has a difference of 0.2 and this rule needs to be applied throughout the dataset

Nigel on 29 Nov 2013

I have followed your suggestion Alfonso. Great result. Processing reduced from 850 secs to 1.5. Thank you all

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Jeremy Wurbs on 27 Nov 2013

Open in MATLAB Online

0 votes

Very interesting question.

First off, running your solution as is I get:

[ 1.0000 2.0000 3.0000 3.1000 3.2000 3.4000 3.5000 3.6000 4.0000 5.0000]

It seems there are precision issues with MinInterval being 0.1 and your data being spaced "0.1" apart (which is impossible to represent in binary; you can see this by trying the statement (3*0.1==0.3) in the command window).

Anyway, adding a small wiggle factor to your MinInterval, I get the code:

% Method 1
tic
MinInterval=0.1+eps; % We will have issues with precision without eps
KeptData(1,1)=OriginalData(1,1);
n=1;
for p=1:length(OriginalData)-1
  difference=OriginalData(p+1,1)-KeptData(n,1);
  if difference<=MinInterval
     %do nothing
  else
     n=n+1;
     KeptData(n,1)=OriginalData(p+1,1);
  end
end
t1 = toc;

A simple attempt to vectorize the above only using diff could look like:

% Method 2
tic
KeptData2 = OriginalData([true; diff(OriginalData)>MinInterval]);
t2 = toc;

But the above gives a different behavior around "stair-case" sections in your data (the [... 3.0; 3.1; 3.2; 3.3; ...] part). We can get more similar behavior to the original code with the following:

% Method 3
MinInterval = 0.1; % Shouldn't have any precision issues
tic
NewMinInterval = 2*MinInterval;
edges = min(OriginalData):NewMinInterval:max(OriginalData);
hc = histc(OriginalData,edges);
KeptData3 = edges(logical(hc));
t3 = toc;

Using the following:

KeptData'
KeptData2'
KeptData3
disp(['Using for loop: '  num2str(t1)])
disp(['Using diff alone: ' num2str(t2)])
disp(['Using histc + diff: '  num2str(t3)])

I get the results:

ans =
      1.0000    2.0000    3.0000    3.2000    3.4000    3.6000    4.0000    5.0000
ans =
       1     2     3     4     5
KeptData3 =
      1.0000    2.0000    3.0000    3.2000    3.4000    3.6000    4.0000    5.0000
Using for loop: 0.017651
Using diff alone: 0.0047814
Using histc + diff: 0.0071429

Hopefully that helps. Let me know if that doesn't give the behavior you're looking for.

Cheers

1 Comment
Show -1 older comments Hide -1 older comments

Nigel on 29 Nov 2013

Thank you for your reply. Your code works very well for the example, but when I apply it to my dataset i get a significantly different set of data returned (my code returns 469966 datapoints and yours 34441). If i get chance i will have a look at the returned data to understand what is going on, however, i have preallocated the KeptData array as suggested by Alfonso and this has made an huge difference to the processing time. My loop originally ran for 850 secs and with preallocation it takes 1.5 secs so i will continue with this solution. Your code incidently was much faster at 0.03. Thank you for your help

Sign in to comment.

Can i vectorize my loop

7 Comments
Show 5 older comments Hide 5 older comments

Answers (1)

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Products

Tags

Community Treasure Hunt

Can i vectorize my loop

7 Comments Show 5 older comments Hide 5 older comments

Answers (1)

1 Comment Show -1 older comments Hide -1 older comments

Categories

Products

Tags

See Also

Community Treasure Hunt

7 Comments
Show 5 older comments Hide 5 older comments

1 Comment
Show -1 older comments Hide -1 older comments