Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Difference of 2 CDF functions

Subject: Difference of 2 CDF functions

From: Themis

Date: 12 Mar, 2009 17:38:01

Message: 1 of 23

I'm trying to calculate the difference of 2 CDF functions, coming from 2 normal populations, in order to calculate the error. I have 2 sample data of the same size.
I tried calculating the normcdf function for both data and then subtracting but that doesnt work since im subtracting uncorrelated data.
Any ideas on what im missing here?

Subject: Difference of 2 CDF functions

From: Roger Stafford

Date: 12 Mar, 2009 18:04:01

Message: 2 of 23

"Themis " <thkountouris@hotmail.com> wrote in message <gpbh9p$k2k$1@fred.mathworks.com>...
> I'm trying to calculate the difference of 2 CDF functions, coming from 2 normal populations, in order to calculate the error. I have 2 sample data of the same size.
> I tried calculating the normcdf function for both data and then subtracting but that doesnt work since im subtracting uncorrelated data.
> Any ideas on what im missing here?

  I think you need to be using the 'mvncdf' function of the Statistics Toolbox. I assume your normal populations are also jointly normal.

Roger Stafford

Subject: Difference of 2 CDF functions

From: Peter Perkins

Date: 12 Mar, 2009 20:02:40

Message: 3 of 23

Themis wrote:
> I'm trying to calculate the difference of 2 CDF functions, coming from 2 normal populations, in order to calculate the error. I have 2 sample data of the same size.
> I tried calculating the normcdf function for both data and then subtracting but that doesnt work since im subtracting uncorrelated data.

Do you want to compute the difference of the two CDFs, or the CDF of the difference of the two variables?

Subject: Difference of 2 CDF functions

From: Themis

Date: 12 Mar, 2009 20:14:02

Message: 4 of 23

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gpbiqh$6ik$1@fred.mathworks.com>...
> "Themis " <thkountouris@hotmail.com> wrote in message <gpbh9p$k2k$1@fred.mathworks.com>...
> > I'm trying to calculate the difference of 2 CDF functions, coming from 2 normal populations, in order to calculate the error. I have 2 sample data of the same size.
> > I tried calculating the normcdf function for both data and then subtracting but that doesnt work since im subtracting uncorrelated data.
> > Any ideas on what im missing here?
>
> I think you need to be using the 'mvncdf' function of the Statistics Toolbox. I assume your normal populations are also jointly normal.
>
> Roger Stafford

I'm not sure thats what I need. Let me make what I am saying more clear.
Data from the first sample comes from a data set I calculated using Weyl Sums.
Data from the second sample comes from data I calculated by generating random normal population.
Therefore I have 2 row vectors.
The data in the first sample appear to be Gaussian and if I plot the 2 CDF fucntions together they nearly match. I would like to calculate the difference between them and plot them in a graph.
In other words I am expecting a graph with values close to 0.

Subject: Difference of 2 CDF functions

From: Roger Stafford

Date: 12 Mar, 2009 20:57:01

Message: 5 of 23

"Themis " <thkountouris@hotmail.com> wrote in message <gpbqea$fvs$1@fred.mathworks.com>...
> "Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gpbiqh$6ik$1@fred.mathworks.com>...
> > "Themis " <thkountouris@hotmail.com> wrote in message <gpbh9p$k2k$1@fred.mathworks.com>...
> > > I'm trying to calculate the difference of 2 CDF functions, coming from 2 normal populations, in order to calculate the error. I have 2 sample data of the same size.
> > > I tried calculating the normcdf function for both data and then subtracting but that doesnt work since im subtracting uncorrelated data.
> > > Any ideas on what im missing here?
> >
> > I think you need to be using the 'mvncdf' function of the Statistics Toolbox. I assume your normal populations are also jointly normal.
> >
> > Roger Stafford
>
> I'm not sure thats what I need. Let me make what I am saying more clear.
> Data from the first sample comes from a data set I calculated using Weyl Sums.
> Data from the second sample comes from data I calculated by generating random normal population.
> Therefore I have 2 row vectors.
> The data in the first sample appear to be Gaussian and if I plot the 2 CDF fucntions together they nearly match. I would like to calculate the difference between them and plot them in a graph.
> In other words I am expecting a graph with values close to 0.

  I am struggling to make sense out of your description, Themis. You state that you are trying to determine the difference between two cumulative distribution functions, the second of them being a normal distribution, and you have generated a random normal population to obtain this distribution. Why would you do that when the cumulative normal distribution is a well-known function that can be calculated? Doing it this way only makes your comparison less accurate.

  As for the Weyl sums, you don't make it clear whether this is some kind of stochastic process or you are determining their cumulative distribution on theoretical grounds. If it is theoretical, the comparisons should be exact I would think.

Roger Stafford

Subject: Difference of 2 CDF functions

From: Themis

Date: 12 Mar, 2009 22:11:01

Message: 6 of 23

>Why would you do that when the cumulative normal distribution is a >well-known function that can be calculated? Doing it this way only makes >your comparison less accurate.

Are you saying that I should calculate the cdf of using the standard normal cdf equation over x instead of calculating it off a random sample?



> As for the Weyl sums, you don't make it clear whether this is some kind of >stochastic process or you are determining their cumulative distribution on >theoretical grounds. If it is theoretical, the comparisons should be exact I >would think.

I am using MATLAB to calculate this sum:
http://img14.imageshack.us/my.php?image=equation.jpg
and then use the Distribution Fitting Tool to plot the CDF.

I hope this helps

Subject: Difference of 2 CDF functions

From: Roger Stafford

Date: 12 Mar, 2009 23:45:06

Message: 7 of 23

"Themis " <thkountouris@hotmail.com> wrote in message <gpc19l$2qc$1@fred.mathworks.com>...
> >Why would you do that when the cumulative normal distribution is a >well-known function that can be calculated? Doing it this way only makes >your comparison less accurate.
>
> Are you saying that I should calculate the cdf of using the standard normal cdf equation over x instead of calculating it off a random sample?
>
>
>
> > As for the Weyl sums, you don't make it clear whether this is some kind of >stochastic process or you are determining their cumulative distribution on >theoretical grounds. If it is theoretical, the comparisons should be exact I >would think.
>
> I am using MATLAB to calculate this sum:
> http://img14.imageshack.us/my.php?image=equation.jpg
> and then use the Distribution Fitting Tool to plot the CDF.
>
> I hope this helps

  What I see in the referenced website seems to have an error. It shows a sum from i = 1 to i = N, but the probable index variable within the summand is presumably n, there being no i present there. Also I am a little surprised not to see 2*pi inside the cosine.

  The important question here is: of the four parameters, N, M, m, and k, which is the one to be varied in arriving at some kind of discrete distribution? I doubt if it is either k or N and probably not M. Whatever it is, you will be comparing the continuous distribution of a normal variable with a presumably integer-valued and therefore discrete parameter, so any comparison must inevitably be approximate.

  In answer about whether to use a normally distributed random variable as opposed to the theoretical cdf for a normal variable, I am indeed saying that the theoretical value is superior because of greater accuracy, and it is readily available for computation. Of course you need to know what mean and variance to use, but that is also true of a randomly generated variable.

Roger Stafford

Subject: Difference of 2 CDF functions

From: Themis

Date: 13 Mar, 2009 00:31:01

Message: 8 of 23

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gpc6q2$f51$1@fred.mathworks.com>...

>
> What I see in the referenced website seems to have an error. It shows a sum from i = 1 to i = N, but the probable index variable within the summand is presumably n, there being no i present there. Also I am a little surprised not to see 2*pi inside the cosine.
>
> The important question here is: of the four parameters, N, M, m, and k, which is the one to be varied in arriving at some kind of discrete distribution? I doubt if it is either k or N and probably not M. Whatever it is, you will be comparing the continuous distribution of a normal variable with a presumably integer-valued and therefore discrete parameter, so any comparison must inevitably be approximate.
>
> In answer about whether to use a normally distributed random variable as opposed to the theoretical cdf for a normal variable, I am indeed saying that the theoretical value is superior because of greater accuracy, and it is readily available for computation. Of course you need to know what mean and variance to use, but that is also true of a randomly generated variable.
>
> Roger Stafford

Yes you are right, there is an error in that equation. i should have been n.
Also k is ussualy taken as k=3. M and N are the inputs here and the process is calculated for all values of m.
The end result is M-1 values, which appear to be normaly distributed.

Subject: Difference of 2 CDF functions

From: Roger Stafford

Date: 13 Mar, 2009 02:56:01

Message: 9 of 23

"Themis " <thkountouris@hotmail.com> wrote in message <gpc9g5$crs$1@fred.mathworks.com>...
> Yes you are right, there is an error in that equation. i should have been n.
> Also k is ussualy taken as k=3. M and N are the inputs here and the process is calculated for all values of m.
> The end result is M-1 values, which appear to be normaly distributed.

  Aha! The fog begins to clear. You will have to proceed with some care. This presumed convergence to a normal distribution is probably analogous to that given by the famous central limit theorem for the sum of many successive independent values of a random variable. It is necessary to carefully rescale and translate the independent variable values - in this case the m values - to smaller and smaller values as M increases so as to force the mean and variance to approach that of the given normal distribution. Then if your surmise is correct, the respective cdf's will also converge.

  But again, it is useless to generate random normally distributed variables when the distribution is already thoroughly understood.

Roger Stafford

Subject: Difference of 2 CDF functions

From: Themis

Date: 13 Mar, 2009 07:14:03

Message: 10 of 23

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gpci01$ov8$1@fred.mathworks.com>...
> "Themis " <thkountouris@hotmail.com> wrote in message <gpc9g5$crs$1@fred.mathworks.com>...
> > Yes you are right, there is an error in that equation. i should have been n.
> > Also k is ussualy taken as k=3. M and N are the inputs here and the process is calculated for all values of m.
> > The end result is M-1 values, which appear to be normaly distributed.
>
> Aha! The fog begins to clear. You will have to proceed with some care. This presumed convergence to a normal distribution is probably analogous to that given by the famous central limit theorem for the sum of many successive independent values of a random variable. It is necessary to carefully rescale and translate the independent variable values - in this case the m values - to smaller and smaller values as M increases so as to force the mean and variance to approach that of the given normal distribution. Then if your surmise is correct, the respective cdf's will also converge.
>
> But again, it is useless to generate random normally distributed variables when the distribution is already thoroughly understood.
>
> Roger Stafford

I'm glad i made myself clear. That is exactly what I am trying to achieve and for large values of M indeed they do converge. The 2 CDF functions are almost identical, thats why i want to calculate their difference, to find out exactly how small it is.
I understand what you are saying about the random variables and i;ve changed my code. However, I still cant manage to find a way to calculate this.

Subject: Difference of 2 CDF functions

From: Themis

Date: 14 Mar, 2009 11:29:02

Message: 11 of 23

I thought of attaching this pic to make my problem more clear.
If I plot the CDF of the data I retrieve and the CDF of a standard normal population with the same mean and variance I get the plot below:

http://img12.imageshack.us/my.php?image=datag.jpg

However, I can't seem to get the plot of their differences correct. The smallest I managed to figure out was betwwen -0.3 and 0.3 but that is clearly wrong since the 2 plots almost match.

I have tried calculating the 2 CDF's with the normcdf function and then subtracting the data and I also tried calculating the CDF's using this code :

function Fx = CDFNormal(x,mu,var)

arg = (x-my)/sqrt(var*2);
Fx = 0.5*erf(arg)+0.5;

which returns a slightly different result than Matlab's normcdf.

Any ideas?

Thank you,

Subject: Difference of 2 CDF functions

From: Themis

Date: 14 Mar, 2009 11:33:02

Message: 12 of 23

Another point I have to make is that the calulated data i am trying to plot its CDF function comes out in random order, so i have to sort them from lowest to greatest before I run normcdf, which I guess this might be wrong but I can't find a way around it.

Subject: Difference of 2 CDF functions

From: Roger Stafford

Date: 14 Mar, 2009 17:43:01

Message: 13 of 23

"Themis " <thkountouris@hotmail.com> wrote in message <gpg4du$fht$1@fred.mathworks.com>...
> I thought of attaching this pic to make my problem more clear.
> If I plot the CDF of the data I retrieve and the CDF of a standard normal population with the same mean and variance I get the plot below:
>
> http://img12.imageshack.us/my.php?image=datag.jpg
>
> However, I can't seem to get the plot of their differences correct. The smallest I managed to figure out was betwwen -0.3 and 0.3 but that is clearly wrong since the 2 plots almost match.
> .......
> Another point I have to make is that the calulated data i am trying to plot its CDF function comes out in random order, so i have to sort them from lowest to greatest before I run normcdf, which I guess this might be wrong but I can't find a way around it.
---------------
  Now that I take a second look at your Weyl sums, Themis, it is not surprising that you are approaching a normal distribution. For large values of N the n^k factor would cause the cosine value to wildly fluctuate in value as m changes, and the successive values of the sum for different values of m would be of a pseudo-random nature with very little correlation, in a sense, of one sum to another. Accordingly, the central limit theorem would likely predict such a behavior. Your image shown at "image=datagl.jpg" certainly seems to show a good match.

  There is no way for any of us to know why you are having difficulty measuring the difference between these two curves unless you show us the code you are using. I am not talking about the generation of the normal cdf, but about the calculation of the difference. You state "I managed to figure out was betwwen -0.3 and 0.3 but that is clearly wrong since the 2 plots almost match." You need to show the details of that calculation. In particular you need to show whether this 0.3 figure applies to the m value before it was rescaled or afterwards from matching the normal distribution. (You should be keeping the normal distribution fixed and do rescaling and translation on the m values as M and N change to get a meaningful comparison.)

  The preliminary sorting of your data is not wrong. It is absolutely necessary for obtaining a valid cdf evaluation. Remember the definition of cumulative probability. It is the probability that a random variable will be less than a certain quantity. For your Weyl sums it would be the count of the number of sums that fall below a certain value divided by the total number. There is no good way of giving such a count until you have sorted your values.

Roger Stafford

Subject: Difference of 2 CDF functions

From: Themis

Date: 14 Mar, 2009 19:32:00

Message: 14 of 23

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gpgqb5$r6v$1@fred.mathworks.com>...

> There is no way for any of us to know why you are having difficulty measuring the difference between these two curves unless you show us the code you are using. I am not talking about the generation of the normal cdf, but about the calculation of the difference. You state "I managed to figure out was betwwen -0.3 and 0.3 but that is clearly wrong since the 2 plots almost match." You need to show the details of that calculation. In particular you need to show whether this 0.3 figure applies to the m value before it was rescaled or afterwards from matching the normal distribution. (You should be keeping the normal distribution fixed and do rescaling and translation on the m values as M and N change to get a meaningful comparison.)
>

> Roger Stafford


Thank you for your reply Roger. I shall describe the whole process I am going through below.

First I calculate the sum for a value for M and N and I get a row vector with M-1 values. Lets say that the resulting vector (a) has values between -2.9 and 3.3. I then sort the data in vector a.
Then I create another another vector b = 2.9 : 3.3 .
Then I run normcdf for both vectors and get 2 other vectors, say c and d.
Finally I calculate the difference between the 2 vectors, by e=d-c;

The plot of e looks as the one in the picture:
http://img14.imageshack.us/my.php?image=21777861.jpg

Subject: Difference of 2 CDF functions

From: Roger Stafford

Date: 14 Mar, 2009 20:30:04

Message: 15 of 23

"Themis " <thkountouris@hotmail.com> wrote in message <gph0ng$ek4$1@fred.mathworks.com>...
> Thank you for your reply Roger. I shall describe the whole process I am going through below.
>
> First I calculate the sum for a value for M and N and I get a row vector with M-1 values. Lets say that the resulting vector (a) has values between -2.9 and 3.3. I then sort the data in vector a.
> Then I create another another vector b = 2.9 : 3.3 .
> Then I run normcdf for both vectors and get 2 other vectors, say c and d.
> Finally I calculate the difference between the 2 vectors, by e=d-c;
>
> The plot of e looks as the one in the picture:
> http://img14.imageshack.us/my.php?image=21777861.jpg

  You are still not coming through to me at all clearly, Themis. What do you mean by "Then I create another another vector b = 2.9 : 3.3 ."? By what means is it created? Surely not as it stands there; you would get only one value in b that way. If you use the same M and N in another Weyl sum, you should get the same results. What do you mean by "Then I run normcdf for both vectors and get 2 other vectors, say c and d."? What mean and variance value(s) do you give to 'normcdf', those that you have calculated from the Weyl sums? If so, you are allowing the Weyl sums to dictate the mean and variance. You should be rescaling and translating the Weyl sums to match a fixed normal distribution, say the standard normal. When you subtract e = d-c, do d and c have like numbers of points? If so, you must have used the same M value in the original Weyl sums, but perhaps different N's?
Anyway, what is the significance of comparing different normal distributions, c and d? You should be comparing a with c or b with d. I think you need to explain things in much, much greater detail if I am to understand what the problem is.

Roger Stafford

Subject: Difference of 2 CDF functions

From: Themis

Date: 14 Mar, 2009 21:06:02

Message: 16 of 23

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gph44c$5h3$1@fred.mathworks.com>...
> You are still not coming through to me at all clearly, Themis. What do you mean by "Then I create another another vector b = 2.9 : 3.3 ."? By what means is it created? Surely not as it stands there; you would get only one value in b that way. If you use the same M and N in another Weyl sum, you should get the same results. What do you mean by "Then I run normcdf for both vectors and get 2 other vectors, say c and d."? What mean and variance value(s) do you give to 'normcdf', those that you have calculated from the Weyl sums? If so, you are allowing the Weyl sums to dictate the mean and variance. You should be rescaling and translating the Weyl sums to match a fixed normal distribution, say the standard normal. When you subtract e = d-c, do d and c have like numbers of points? If so, you must have used the same M value in the original Weyl sums, but perhaps different N's?
> Anyway, what is the significance of comparing different normal distributions, c and d? You should be comparing a with c or b with d. I think you need to explain things in much, much greater detail if I am to understand what the problem is.
>
> Roger Stafford

Vector b has the same number of elements as a. That is not the way i create it though, i just didnt want to get into much detail about that.
I use b=-2.9 : x : 3.3 , where x is the step required in order for b and a to have the same number of elements. So a and b have the same number of elements. However, the step is not constant for vector a, therefore I think that is why I get such a big value for the error.

I always take k=3 for now. If I take k=2, the distribution doesn;t appear to be Normal. So this is all for the case where k=3. If I change M,N I get a different vector a.
I am trying to find the difference for a fixed value of M and N. I am not talking about taking a different Weyl sum. I am saying take a value for M,N , get a vector a. Then compare the CDF of vector a to the CDF of a standard normal distribution with the same mean and variance of the Weyl Sum. The reason for this is, since I can see that the 2 CDF's almost match and the PDF of vector a appears to be Normally distributed as well. The ultimate goal is to prove that the Weyl sum is Normally distributed. The way to do that is to compare the CDF of the Weyl sum to the CDF of Normal distribution.
i hope this makes things more clear.

Subject: Difference of 2 CDF functions

From: Roger Stafford

Date: 15 Mar, 2009 02:05:04

Message: 17 of 23

"Themis " <thkountouris@hotmail.com> wrote in message <gph67p$9v6$1@fred.mathworks.com>...
> Vector b has the same number of elements as a. That is not the way i create it though, i just didnt want to get into much detail about that.
> I use b=-2.9 : x : 3.3 , where x is the step required in order for b and a to have the same number of elements. So a and b have the same number of elements. However, the step is not constant for vector a, therefore I think that is why I get such a big value for the error.
>
> I always take k=3 for now. If I take k=2, the distribution doesn;t appear to be Normal. So this is all for the case where k=3. If I change M,N I get a different vector a.
> I am trying to find the difference for a fixed value of M and N. I am not talking about taking a different Weyl sum. I am saying take a value for M,N , get a vector a. Then compare the CDF of vector a to the CDF of a standard normal distribution with the same mean and variance of the Weyl Sum. The reason for this is, since I can see that the 2 CDF's almost match and the PDF of vector a appears to be Normally distributed as well. The ultimate goal is to prove that the Weyl sum is Normally distributed. The way to do that is to compare the CDF of the Weyl sum to the CDF of Normal distribution.
> i hope this makes things more clear.

  Here is the way I would compute the difference between your two cdf's, Themis. You will note I have done the rescaling and translation that I recommended so as to compare with a fixed standard normal distribution. Actually the division by sqrt(N) in the Weyl sum formula does most of this adjustment so perhaps this isn't really necessary. You could adjust the normal distribution instead and it wouldn't make much difference.

  I don't know what values you are using for M and N, so I picked these out of the air (well actually so the match would look good.)

 M = 5000;
 N = 100;
 k = 3;
 w = zeros(1,M); % Allocate space for w
 for m = 1:M % Compute Weyl sums in w for M values of m
  w(m) = sum(cos((1:N).^k*(m-1)/M))/sqrt(N);
 end
 w = sort((w-mean(w))/std(w)); % Adjust to mean 0, std 1 and sort
 cw = (1/2:M-1/2)/M; % Average cdf values at the w points
 cn = 1/2+1/2*erf(w/sqrt(2)); % Standard normal cdf values at the w points
 plot(w,cw,'y.',w,cn,'r.') % Plot the cdf differences
 std(abs(cw-cn))
  = 0.0050
 max(abs(cw-cn))
  = 0.0204

  These look like a fairly good match. As I see it, this is a classic demonstration of the central limit theorem in action. You should be careful about stating that the Weyl sum _is_ normally distributed. It only (apparently) approaches a normal distribution as a limit as M approaches infinity. There is an important differences between the two claims.

  I can't tell from your description where your calculations differed from the above so as to produce the .3 value you mentioned. You are probably in a better position to discover that than I.

  By the way, I notice that earlier your formula for the standard normal distribution in terms of the 'erf' function left out a division of the argument by the square root of 2. That is undoubtedly why your answer didn't agree with that of normcdf.

  Note: The Weyl sum calculation can be vectorized by using 'ndgrid' so as to eliminate the for-loop but I doubt if there is much gain in speed in doing so since the summation takes up most of the time.

Roger Stafford

Subject: Difference of 2 CDF functions

From: Themis

Date: 15 Mar, 2009 02:29:01

Message: 18 of 23

Thank you for your reply Roger. I am going to try what you suggested first thing in the morning as I;m getting ready for bed since its 230am now in the UK, and I will let you know as soon as I get any results.
Once again, thank you for taking time to look into my issue.

Themis

Subject: Difference of 2 CDF functions

From: Themis

Date: 15 Mar, 2009 13:43:01

Message: 19 of 23

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gphnog$ntf$1@fred.mathworks.com>...

> Here is the way I would compute the difference between your two cdf's, Themis. You will note I have done the rescaling and translation that I recommended so as to compare with a fixed standard normal distribution. Actually the division by sqrt(N) in the Weyl sum formula does most of this adjustment so perhaps this isn't really necessary. You could adjust the normal distribution instead and it wouldn't make much difference.
>
> I don't know what values you are using for M and N, so I picked these out of the air (well actually so the match would look good.)
>
> M = 5000;
> N = 100;
> k = 3;
> w = zeros(1,M); % Allocate space for w
> for m = 1:M % Compute Weyl sums in w for M values of m
> w(m) = sum(cos((1:N).^k*(m-1)/M))/sqrt(N);
> end
> w = sort((w-mean(w))/std(w)); % Adjust to mean 0, std 1 and sort
> cw = (1/2:M-1/2)/M; % Average cdf values at the w points
> cn = 1/2+1/2*erf(w/sqrt(2)); % Standard normal cdf values at the w points
> plot(w,cw,'y.',w,cn,'r.') % Plot the cdf differences
> std(abs(cw-cn))
> = 0.0050
> max(abs(cw-cn))
> = 0.0204
>

This works perfectly with my values for M,N as well. I have a few questions though.
At the sorting step why is it necessary to adjust the mean?
Also is it compulsory to adjust this to N~(0,1) ? The sum behaves like N~(0,0.5).

I think the difference in our codes is that I calculate the CDF of a vector x that runs through the minimum and maximum values of the weyl sum, where you instead use the method in cw to define the CDF of the Normal Distribution.

Subject: Difference of 2 CDF functions

From: Themis

Date: 15 Mar, 2009 14:33:02

Message: 20 of 23

Also I have just observed that the CDF plot of your method yields a different plot than when using the Distribution fitting tool, to plot the CDF. The 2 plots dont seem to match, except for the point (0,0.5)
The only plot that matches is, if i sort the vector of the sum and plot it against cw.
The values of normcdf to cn dont seem to match either.

How does this behave for you?

Subject: Difference of 2 CDF functions

From: Themis

Date: 16 Mar, 2009 21:51:00

Message: 21 of 23

"Themis " <thkountouris@hotmail.com> wrote in message <gpj3iu$5q2$1@fred.mathworks.com>...
> Also I have just observed that the CDF plot of your method yields a different plot than when using the Distribution fitting tool, to plot the CDF. The 2 plots dont seem to match, except for the point (0,0.5)
> The only plot that matches is, if i sort the vector of the sum and plot it against cw.
> The values of normcdf to cn dont seem to match either.
>
> How does this behave for you?

Roger?

Subject: Difference of 2 CDF functions

From: Roger Stafford

Date: 17 Mar, 2009 00:36:02

Message: 22 of 23

"Themis " <thkountouris@hotmail.com> wrote in message <gpmhk4$ce1$1@fred.mathworks.com>...
> This works perfectly with my values for M,N as well. I have a few questions though.
> At the sorting step why is it necessary to adjust the mean?
> Also is it compulsory to adjust this to N~(0,1) ? The sum behaves like N~(0,0.5).

  As I have explained earlier, it is not an absolute necessity to do these things. It is simply my attempt to match the your distribution to a fixed standard normal distribution so that changes in N and M wouldn't confuse things. However, it appears that division by sqrt(N) already performs that action to a large extent so perhaps you would find it better to match the normal distribution to your Weyl distribution instead. The adjustments to the code I sent are easy to make.

> I think the difference in our codes is that I calculate the CDF of a vector x that runs through the minimum and maximum values of the weyl sum, where you instead use the method in cw to define the CDF of the Normal Distribution.

  I don't think that is the right way to do things at all! Unlike a strictly normal distribution, the maximum and minimum of your sums may be very skewed with respect to the position of the mean value. In the examples I ran, that definitely seemed to be the case. The maximum values were much farther away from the mean than the minimum values. It is better to be guided by the mean value in this respect.

> Also I have just observed that the CDF plot of your method yields a different plot than when using the Distribution fitting tool, to plot the CDF. The 2 plots dont seem to match, except for the point (0,0.5)
> The only plot that matches is, if i sort the vector of the sum and plot it against cw.

  I cannot speak for Matlab's Distribution fitting tool since I do not have it to experiment with, but plotting with a sorted w along the x-axis and cw on the y-axis is precisely what you want to do. The real-life cumulative distribution function of a Weyl sum after it is sorted is a discontinuous step function with the step occurring at the 'w' points in the code I wrote. At the first w point, by definition, the cdf jumps from 0 to 1/M, at the second point from 1/M to 2/M, etc. To compare it with the continuous cdf of the normal distribution I consider it best to draw line segments connecting the midpoints of these vertical steps: 1/(2*M) at the jump from 0 to 1/M, 3/(2*M) at the jump from 1/M to 2/M, etc. The 'plot' function automatically draws the line segments connecting these. That plot is what you should see in yellow in:

 plot(w,cw,'y.')

In plot(w,cn,'r.') you should see the precise values of the normal cdf evaluated at the same w points. If there is a perfect match, the normal curve would go precisely through the midpoints, 1/(2*M), 3/(2*M), etc., but of course that is not to be expected. In the example with M = 5000 I was typically getting a miss of .0050 which is some 25 times 1/M, but for N = 100, that doesn't seem excessive. Remember, you are only getting normality as a limiting behavior, courtesy of the central limit theorem, as M and N approach infinity.

> The values of normcdf to cn dont seem to match either.
> How does this behave for you?

  As to the 'normcdf' values there should be no imprecision at all. They are very well understood values of the integral:

 F(x) = 1/sqrt(2*pi)/sigma * integral, t=-inf to t=x of
          exp(-(t-mu)^2/(2*sigma^2) dt

for a mean of mu and standard deviation of sigma. When I calculated this in cn, this was for the standard normal with mu = 0 and sigma = 1. There should be exact agreement (that is, exact within IEEE floating point accuracy) to the values as given by 'normcdf' with the same mean and standard deviation. As a test, for M = 5000, N = 100 and k = 3, I get

 format long
 w(2535) = 9.923408898431161e-05
 cn(2535) = 0.50003958867369

That should be very close to what 'normcdf' gives you for that value of w with zero mean and standard deviation one.

  In this connection you will recall I pointed out to you that you were using 'erf' improperly when you wrote:

 Fx = 0.5*erf(arg)+0.5;

It should be Fx = 0.5*erf(arg/sqt(2))+0.5;

  Does this rather long-winded discussion help, Themis?

Roger Stafford

Subject: Difference of 2 CDF functions

From: Themis

Date: 17 Mar, 2009 09:53:02

Message: 23 of 23

Yes, that makes sense to me Roger, thank you.
Just wanted to clarify though, if I want to adjust this to N~(0,0.5) i should sort my data like this : w = sort((w)/sqrt(0.5)); right?

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us