Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Need to assess temporal correlation of multivariate time series

Subject: Need to assess temporal correlation of multivariate time series

From: Kirk

Date: 28 Jan, 2010 16:57:04

Message: 1 of 8

I have a data set of 4 climate variables (columns) recorded as monthly observations (rows). The variables are correlated within observations (across any given row), but I need to test the independence of those observations are over time (down any given column). In other words, I can show that tmax, tmin, solar radiation and precipitation are all correlated in any given month (row). However, does the fact that July had a higher than average tmax, make it more likely that August will also have a higher than average tmax.

Subject: Need to assess temporal correlation of multivariate time series

From: Frank

Date: 28 Jan, 2010 19:33:22

Message: 2 of 8

Sounds like you want autocorrelation. See if the function xcorr gives
you what you want.

Subject: Need to assess temporal correlation of multivariate time series

From: Wayne King

Date: 29 Jan, 2010 00:41:22

Message: 3 of 8

"Kirk" <kwythers.nospam@umn.edu> wrote in message <hjsfl0$8n3$1@fred.mathworks.com>...
> I have a data set of 4 climate variables (columns) recorded as monthly observations (rows). The variables are correlated within observations (across any given row), but I need to test the independence of those observations are over time (down any given column). In other words, I can show that tmax, tmin, solar radiation and precipitation are all correlated in any given month (row). However, does the fact that July had a higher than average tmax, make it more likely that August will also have a higher than average tmax.

Hi Kirk,

Frank has given you good advice to look at xcorr() to estimate the cross correlation between your time series. If you use the normalization option 'coeff', you can construct confidence intervals for determining whether the cross correlation is significantly different from zero. The large-sample distribution of the cross correlation coefficient under the null hypothesis that at least one of the time series is white noise is N(0,1/N) (variance is 1/N), where N is the length of the time series. So for an approximate 95% confidence interval you have +/- 1.96/sqrt(N). You can also use the magnitude squared coherence (see mscohere() in the Signal Processing Toolbox) for frequency domain correlation between two series. Here is an example of using xcorr() to estimate the cross correlation with approximate 95% confidence intervals.


n = 0:99;
x = cos(pi/10*n)+0.5*randn(size(n));
% y is a filtered version of x (and delayed due to the filtering)
y = filter(0.1*ones(10,1),1,x);
[C,lags] = xcorr(x,y,50,'coeff');
stem(lags(51:end),C(51:end),'markerfacecolor',[0 0 1])
line(lags(51:end),-1.96/sqrt(100)*ones(51,1),'color',[1 0 0])
line(lags(51:end),1.96/sqrt(100)*ones(51,1),'color',[1 0 0])

Hope that helps,
Wayne

Subject: Need to assess temporal correlation of multivariate time series

From: Kirk

Date: 29 Jan, 2010 17:15:08

Message: 4 of 8

"Wayne King" <wmkingty@gmail.com> wrote in message <hjtari$t03$1@fred.mathworks.com>...
> "Kirk" <kwythers.nospam@umn.edu> wrote in message <hjsfl0$8n3$1@fred.mathworks.com>...
> > I have a data set of 4 climate variables (columns) recorded as monthly observations (rows). The variables are correlated within observations (across any given row), but I need to test the independence of those observations are over time (down any given column). In other words, I can show that tmax, tmin, solar radiation and precipitation are all correlated in any given month (row). However, does the fact that July had a higher than average tmax, make it more likely that August will also have a higher than average tmax.
>
> Hi Kirk,
>
> Frank has given you good advice to look at xcorr() to estimate the cross correlation between your time series. If you use the normalization option 'coeff', you can construct confidence intervals for determining whether the cross correlation is significantly different from zero. The large-sample distribution of the cross correlation coefficient under the null hypothesis that at least one of the time series is white noise is N(0,1/N) (variance is 1/N), where N is the length of the time series. So for an approximate 95% confidence interval you have +/- 1.96/sqrt(N). You can also use the magnitude squared coherence (see mscohere() in the Signal Processing Toolbox) for frequency domain correlation between two series. Here is an example of using xcorr() to estimate the cross correlation with approximate 95% confidence intervals.
>
>
> n = 0:99;
> x = cos(pi/10*n)+0.5*randn(size(n));
> % y is a filtered version of x (and delayed due to the filtering)
> y = filter(0.1*ones(10,1),1,x);
> [C,lags] = xcorr(x,y,50,'coeff');
> stem(lags(51:end),C(51:end),'markerfacecolor',[0 0 1])
> line(lags(51:end),-1.96/sqrt(100)*ones(51,1),'color',[1 0 0])
> line(lags(51:end),1.96/sqrt(100)*ones(51,1),'color',[1 0 0])
>
> Hope that helps,
> Wayne

This is very helpful Wayne. Thank you. If you don't mind I have a few more questions as I try and educate myself at bit. I plotted x and y and can see that both variables follow a general trend with a slight shift in phase. However, I'd like to better understand what each line in the stem plot represents, so that I can get a handle on what the 95% confidence interval is showing.

Subject: Need to assess temporal correlation of multivariate time series

From: Wayne King

Date: 29 Jan, 2010 17:48:08

Message: 5 of 8

"Kirk" <kwythers.nospam@umn.edu> wrote in message <hjv52s$169$1@fred.mathworks.com>...
> "Wayne King" <wmkingty@gmail.com> wrote in message <hjtari$t03$1@fred.mathworks.com>...
> > "Kirk" <kwythers.nospam@umn.edu> wrote in message <hjsfl0$8n3$1@fred.mathworks.com>...
> > > I have a data set of 4 climate variables (columns) recorded as monthly observations (rows). The variables are correlated within observations (across any given row), but I need to test the independence of those observations are over time (down any given column). In other words, I can show that tmax, tmin, solar radiation and precipitation are all correlated in any given month (row). However, does the fact that July had a higher than average tmax, make it more likely that August will also have a higher than average tmax.
> >
> > Hi Kirk,
> >
> > Frank has given you good advice to look at xcorr() to estimate the cross correlation between your time series. If you use the normalization option 'coeff', you can construct confidence intervals for determining whether the cross correlation is significantly different from zero. The large-sample distribution of the cross correlation coefficient under the null hypothesis that at least one of the time series is white noise is N(0,1/N) (variance is 1/N), where N is the length of the time series. So for an approximate 95% confidence interval you have +/- 1.96/sqrt(N). You can also use the magnitude squared coherence (see mscohere() in the Signal Processing Toolbox) for frequency domain correlation between two series. Here is an example of using xcorr() to estimate the cross correlation with approximate 95% confidence intervals.
> >
> >
> > n = 0:99;
> > x = cos(pi/10*n)+0.5*randn(size(n));
> > % y is a filtered version of x (and delayed due to the filtering)
> > y = filter(0.1*ones(10,1),1,x);
> > [C,lags] = xcorr(x,y,50,'coeff');
> > stem(lags(51:end),C(51:end),'markerfacecolor',[0 0 1])
> > line(lags(51:end),-1.96/sqrt(100)*ones(51,1),'color',[1 0 0])
> > line(lags(51:end),1.96/sqrt(100)*ones(51,1),'color',[1 0 0])
> >
> > Hope that helps,
> > Wayne
>
> This is very helpful Wayne. Thank you. If you don't mind I have a few more questions as I try and educate myself at bit. I plotted x and y and can see that both variables follow a general trend with a slight shift in phase. However, I'd like to better understand what each line in the stem plot represents, so that I can get a handle on what the 95% confidence interval is showing.

Hi Kirk,
The stem plot is just emphasizing that you are estimating a cross-correlation sequence, as opposed to a function, so you only have something defined at the lags. If you compute the autocorrelation (cross-correlation with itself) of a white noise input, then the theoretical (normalized) autocorrelation is 1 at lag zero and zero at all other lags. Since you're estimating the theoretical autocorrelation based on a sample, you can't expect the estimates to be identically zero at nonzero lags. The confidence intervals allow you to make an inference about what is statistically different from zero, i.e. what exhibits correlation.

S = RandStream('mt19937ar');
RandStream.setDefaultStream(S);
x = randn(100,1);
% get autocorrelation of white noise
[C,lags] = xcorr(x,x,50,'coeff');
stem(lags(51:end),C(51:end),'markerfacecolor',[0 0 1])
line(lags(51:end),-1.96/sqrt(100)*ones(51,1),'color',[1 0 0])
line(lags(51:end),1.96/sqrt(100)*ones(51,1),'color',[1 0 0])

Note that even though the autocorrelation estimates at nonzero lags are not zero (as you would expect from an estimate), they are within the 95% confidence intervals (save for one lag) so would you NOT reject the null hypothesis that it's white noise.

Hope that helps,
Wayne

Subject: Need to assess temporal correlation of multivariate time series

From: Kirk

Date: 29 Jan, 2010 20:06:05

Message: 6 of 8

"Wayne King" <wmkingty@gmail.com> wrote in message <hjv70o$6d2$1@fred.mathworks.com>...
> "Kirk" <kwythers.nospam@umn.edu> wrote in message <hjv52s$169$1@fred.mathworks.com>...
> > "Wayne King" <wmkingty@gmail.com> wrote in message <hjtari$t03$1@fred.mathworks.com>...
> > > "Kirk" <kwythers.nospam@umn.edu> wrote in message <hjsfl0$8n3$1@fred.mathworks.com>...
> > > > I have a data set of 4 climate variables (columns) recorded as monthly observations (rows). The variables are correlated within observations (across any given row), but I need to test the independence of those observations are over time (down any given column). In other words, I can show that tmax, tmin, solar radiation and precipitation are all correlated in any given month (row). However, does the fact that July had a higher than average tmax, make it more likely that August will also have a higher than average tmax.
> > >
> > > Hi Kirk,
> > >
> > > Frank has given you good advice to look at xcorr() to estimate the cross correlation between your time series. If you use the normalization option 'coeff', you can construct confidence intervals for determining whether the cross correlation is significantly different from zero. The large-sample distribution of the cross correlation coefficient under the null hypothesis that at least one of the time series is white noise is N(0,1/N) (variance is 1/N), where N is the length of the time series. So for an approximate 95% confidence interval you have +/- 1.96/sqrt(N). You can also use the magnitude squared coherence (see mscohere() in the Signal Processing Toolbox) for frequency domain correlation between two series. Here is an example of using xcorr() to estimate the cross correlation with approximate 95% confidence intervals.
> > >
> > >
> > > n = 0:99;
> > > x = cos(pi/10*n)+0.5*randn(size(n));
> > > % y is a filtered version of x (and delayed due to the filtering)
> > > y = filter(0.1*ones(10,1),1,x);
> > > [C,lags] = xcorr(x,y,50,'coeff');
> > > stem(lags(51:end),C(51:end),'markerfacecolor',[0 0 1])
> > > line(lags(51:end),-1.96/sqrt(100)*ones(51,1),'color',[1 0 0])
> > > line(lags(51:end),1.96/sqrt(100)*ones(51,1),'color',[1 0 0])
> > >
> > > Hope that helps,
> > > Wayne
> >
> > This is very helpful Wayne. Thank you. If you don't mind I have a few more questions as I try and educate myself at bit. I plotted x and y and can see that both variables follow a general trend with a slight shift in phase. However, I'd like to better understand what each line in the stem plot represents, so that I can get a handle on what the 95% confidence interval is showing.
>
> Hi Kirk,
> The stem plot is just emphasizing that you are estimating a cross-correlation sequence, as opposed to a function, so you only have something defined at the lags. If you compute the autocorrelation (cross-correlation with itself) of a white noise input, then the theoretical (normalized) autocorrelation is 1 at lag zero and zero at all other lags. Since you're estimating the theoretical autocorrelation based on a sample, you can't expect the estimates to be identically zero at nonzero lags. The confidence intervals allow you to make an inference about what is statistically different from zero, i.e. what exhibits correlation.
>
> S = RandStream('mt19937ar');
> RandStream.setDefaultStream(S);
> x = randn(100,1);
> % get autocorrelation of white noise
> [C,lags] = xcorr(x,x,50,'coeff');
> stem(lags(51:end),C(51:end),'markerfacecolor',[0 0 1])
> line(lags(51:end),-1.96/sqrt(100)*ones(51,1),'color',[1 0 0])
> line(lags(51:end),1.96/sqrt(100)*ones(51,1),'color',[1 0 0])
>
> Note that even though the autocorrelation estimates at nonzero lags are not zero (as you would expect from an estimate), they are within the 95% confidence intervals (save for one lag) so would you NOT reject the null hypothesis that it's white noise.

Whereas in your first example, most of the lags were outside the 95% confidence intervals, indicating that you SHOULD reject the null hypothesis that al least one of the time series is white noise?

Subject: Need to assess temporal correlation of multivariate time series

From: Wayne King

Date: 29 Jan, 2010 23:31:05

Message: 7 of 8

"Kirk" <kwythers.nospam@umn.edu> wrote in message <hjvf3d$nkk$1@fred.mathworks.com>...
> "Wayne King" <wmkingty@gmail.com> wrote in message <hjv70o$6d2$1@fred.mathworks.com>...
> > "Kirk" <kwythers.nospam@umn.edu> wrote in message <hjv52s$169$1@fred.mathworks.com>...
> > > "Wayne King" <wmkingty@gmail.com> wrote in message <hjtari$t03$1@fred.mathworks.com>...
> > > > "Kirk" <kwythers.nospam@umn.edu> wrote in message <hjsfl0$8n3$1@fred.mathworks.com>...
> > > > > I have a data set of 4 climate variables (columns) recorded as monthly observations (rows). The variables are correlated within observations (across any given row), but I need to test the independence of those observations are over time (down any given column). In other words, I can show that tmax, tmin, solar radiation and precipitation are all correlated in any given month (row). However, does the fact that July had a higher than average tmax, make it more likely that August will also have a higher than average tmax.
> > > >
> > > > Hi Kirk,
> > > >
> > > > Frank has given you good advice to look at xcorr() to estimate the cross correlation between your time series. If you use the normalization option 'coeff', you can construct confidence intervals for determining whether the cross correlation is significantly different from zero. The large-sample distribution of the cross correlation coefficient under the null hypothesis that at least one of the time series is white noise is N(0,1/N) (variance is 1/N), where N is the length of the time series. So for an approximate 95% confidence interval you have +/- 1.96/sqrt(N). You can also use the magnitude squared coherence (see mscohere() in the Signal Processing Toolbox) for frequency domain correlation between two series. Here is an example of using xcorr() to estimate the cross correlation with approximate 95% confidence intervals.
> > > >
> > > >
> > > > n = 0:99;
> > > > x = cos(pi/10*n)+0.5*randn(size(n));
> > > > % y is a filtered version of x (and delayed due to the filtering)
> > > > y = filter(0.1*ones(10,1),1,x);
> > > > [C,lags] = xcorr(x,y,50,'coeff');
> > > > stem(lags(51:end),C(51:end),'markerfacecolor',[0 0 1])
> > > > line(lags(51:end),-1.96/sqrt(100)*ones(51,1),'color',[1 0 0])
> > > > line(lags(51:end),1.96/sqrt(100)*ones(51,1),'color',[1 0 0])
> > > >
> > > > Hope that helps,
> > > > Wayne
> > >
> > > This is very helpful Wayne. Thank you. If you don't mind I have a few more questions as I try and educate myself at bit. I plotted x and y and can see that both variables follow a general trend with a slight shift in phase. However, I'd like to better understand what each line in the stem plot represents, so that I can get a handle on what the 95% confidence interval is showing.
> >
> > Hi Kirk,
> > The stem plot is just emphasizing that you are estimating a cross-correlation sequence, as opposed to a function, so you only have something defined at the lags. If you compute the autocorrelation (cross-correlation with itself) of a white noise input, then the theoretical (normalized) autocorrelation is 1 at lag zero and zero at all other lags. Since you're estimating the theoretical autocorrelation based on a sample, you can't expect the estimates to be identically zero at nonzero lags. The confidence intervals allow you to make an inference about what is statistically different from zero, i.e. what exhibits correlation.
> >
> > S = RandStream('mt19937ar');
> > RandStream.setDefaultStream(S);
> > x = randn(100,1);
> > % get autocorrelation of white noise
> > [C,lags] = xcorr(x,x,50,'coeff');
> > stem(lags(51:end),C(51:end),'markerfacecolor',[0 0 1])
> > line(lags(51:end),-1.96/sqrt(100)*ones(51,1),'color',[1 0 0])
> > line(lags(51:end),1.96/sqrt(100)*ones(51,1),'color',[1 0 0])
> >
> > Note that even though the autocorrelation estimates at nonzero lags are not zero (as you would expect from an estimate), they are within the 95% confidence intervals (save for one lag) so would you NOT reject the null hypothesis that it's white noise.
>
> Whereas in your first example, most of the lags were outside the 95% confidence intervals, indicating that you SHOULD reject the null hypothesis that al least one of the time series is white noise?

That's correct; if you look at the 1st example, you will see that the excursions outside the confidence intervals actually capture the common oscillation (periodicity) in the two series. That is also a case where the frequency-domain equivalent (coherence) would have worked well.
Wayne

Subject: Need to assess temporal correlation of multivariate time series

From: Kirk

Date: 1 Feb, 2010 14:59:04

Message: 8 of 8

> >
> > Whereas in your first example, most of the lags were outside the 95% confidence intervals, indicating that you SHOULD reject the null hypothesis that al least one of the time series is white noise?
>
> That's correct; if you look at the 1st example, you will see that the excursions outside the confidence intervals actually capture the common oscillation (periodicity) in the two series. That is also a case where the frequency-domain equivalent (coherence) would have worked well.
> Wayne

Thanks Wayne. I can see the at monthly max and min temperature data I am working with has strong cyclic patters (just like your 1st example), and is strongly autocorrelated.

My dilemma is that I am using mvnrnd to model the original data. mvnrnd uses measured monthly means, and covariances among the 4 climate variables. I am concerned that monthly mvnrnd predictions are independent to the preceding month. If I can show with xcorr that the each month is clearly not independent, what to do about it? Is there a way to add a temporal correlation component to mvnrnd predictions?

Or how to show that mvnrnd predictions follow the patterns of the original data set even though they are independent predictions. Is this making sense?

Thanks again.

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us