Why xcorr 'coef' is used by correlation coefficients?

29 views (last 30 days)
In reviewing questions and material on xcorr, it appears to be that for autocorrelation or cross-corelation coefficients, most responses suggest using the 'coef' option in xcorr. While this does give you a value between -1 and 1, I am not sure why this option is calculated with as xcorr(a,b)/(norm(a)*norm(b)) where a and b are column vectors, in most cases I would think an unbiased correlation should be used?
In my limited understanding, it seems the correlation values should be unbiased, then normalized...
For example autocorrelation, xcorr(a,'unbiased')./var(a),
To illustrate my point, if I autocorrelate a sine function, I would expect the lagged correlation coefficient to vary between 1 and -1 every time the cycle re-alignes itself. But the 'coef' option consistently deceases the correlation coefficient with lag. I realized this is because of how it is calculated, but I don't understand why it is calculated this way? Shouldn't the unbiased approach be used?
A simple example to illustrate this question: t=0:500; n=length(t); ts=5*sin(2*pi*(t./12)); lags=-250:250; test1=xcov(ts,250,'coef'); test2=xcov(ts,250,'unbiased')./var(ts);
figure; plot(t,ts); xlabel('time'); ylabel('amplitude'); figure; plot(lags,test2,'r'); hold on; plot(lags,test1);

Accepted Answer

Brian
Brian on 5 Dec 2011
Hey Wayne,
Thanks again for you prompt response. I understand what you are saying about how Matlab makes this calculation when the 'coef' option is used. What I am trying to better understand is whether it makes sense to say that the lagged sine function is less correlated as lags increase as the normalized output does('coef' option). As the example showed xcorr(ts,'coef') has a value .76 at lag 120, even though the lagged time series actually perfectly correlated if we compared the two time series in an unbiased way (i.e. lagging one of the two sine curves by 120 would have both shorten times overlaying each other). Shouldn't the process of normalizing the result be done in an unbiased way? Does this clarify what I am trying to get at?
Thanks again!
Brian Dz
  1 Comment
Wayne King
Wayne King on 5 Dec 2011
Hi Brian, yes, it makes perfect sense. The answer is what I explained in my previous response. And to be clear, the same thing happens to the autocorrelation sequence estimate when the 'biased', 'unbiased', or 'none' options are used. The autocorrelation decays in all instances as you would expect since fewer and fewer terms enter the sum.

Sign in to comment.

More Answers (4)

Brian
Brian on 5 Dec 2011
Hi Wayne,
Thanks for the quick response. I understand the purpose of a normalizing the covariance, my question in regards to why the the lagged autocorrelation of a sine function is not consistently between 1 and -1 as the lag is increased despite the fact that the signal would be perfectly overlapped at regular intervals.
Is is illustrated by test1 which uses the 'coef' option.
t=0:500; n=length(t); ts=5*sin(2*pi*(t./12)); lags=-250:250; test1=xcorr(ts,250,'coef'); test2=xcorr(ts,250,'unbiased')./var(ts);
figure; plot(t,ts); xlabel('time'); ylabel('amplitude'); figure; plot(lags,test2,'r'); hold on; plot(lags,test1);
So at a lag of 120, the autocorrelation should be 1 again but the 'coef' option has the autocorrelation at 0.76. I understand how matlab calculated this, but should matlab be using an unbiased calculation, so the correlation is 1 rather then .76? Again thank you for you quick reply!
Brian

Wayne King
Wayne King on 5 Dec 2011
Hi Brian, 'coeff' is helpful because it gives you a convenient scale to interpret the results. It's the same reason why correlation in statistics is often more useful than covariance.
If I tell you that the maximum autocorrelation between two vectors is 4500 for example, it's hard to interpret what that means. That might mean that the two vectors are nearly perfectly correlated at that lag, or it might mean that their correlation is pretty small (near zero). That's because it all depends on the units of the input vectors. The 'coeff' option, however, makes it easier to interpret. If I tell you that the maximum correlation is 0.9, then you know there is a very strong correlation at a given lag.
To keep it in the sine wave context, note:
x = cos(pi/4*(0:99));
y = 4*cos(pi/4*(0:99)-pi/2);
[c,lags] = xcorr(y,x,10);
stem(lags,c);
Note the maximum correlation at lag 2 is 200. Again, very hard to know exactly what that means without knowing more about the signals.
But:
x = cos(pi/4*(0:99));
y = 4*cos(pi/4*(0:99)-pi/2);
[c,lags] = xcorr(y,x,10,'coeff');
stem(lags,c);
Now you see exactly what it means. The two signals are basically perfectly correlated.

Wayne King
Wayne King on 5 Dec 2011
Hi Brian, that is because you have fewer and fewer terms that enter the autocorrelation sum as the lag increases. The normalization in the denominator is based on all the data in the sequence, as is the autocorrelation at zero lag. That is not the case as you increase the lag.
That's why with:
[c,lags] = xcorr(ts);
stem(lags,c);
You see the autocorrelation decay. You don't use a different normalization term at different lags, which you would have to do get 1s or -1s at all your periods as you suggest.

Brian
Brian on 5 Dec 2011
Hi Wayne,
Thank you for you time and consideration! I appreciate your clarification on these questions!
Brian Dz

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!