Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
correlation (different amount of rows)

Subject: correlation (different amount of rows)

From: rehuka

Date: 2 Feb, 2009 22:49:43

Message: 1 of 3

Fellow Matlabers...

I'm attempting to do some statistics, most important the correlation between to data sets..here is the setup:

x=[time,valuex]
y=[time,valuey]

My problem is that x and y contain different amount of rows. x looks in general like this: (data for some timesteps, and some NaN's. x and y have still different rows NaN's included)

       NaN 7.4800
       NaN 7.5000
   98.0000 7.5300
   98.0000 7.5400
   98.0000 7.5500
  100.0000 7.4700
       NaN 7.7000
       NaN 7.9000

and y: (data for each timestep)

   96.0000 7.3391
   97.0000 7.3327
   98.0000 7.3283
   99.0000 7.3251
  100.0000 7.3225
  101.0000 7.3204
  102.0000 7.3192

Can anyone think of a clever way to do this? I've been twisting it around till the point I have no idea what I'm doing anymore...

Many thanks!
reinert

Subject: correlation (different amount of rows)

From: Roger Stafford

Date: 3 Feb, 2009 01:10:04

Message: 2 of 3

rehuka <reinert.karlsen@gmail.com> wrote in message <17009690.1233615013776.JavaMail.jakarta@nitrogen.mathforum.org>...
> .......
> I'm attempting to do some statistics, most important the correlation between to data sets..here is the setup:
> .......

 In statistics there is no way of formulating a valid correlation between random variables without being able to use coincident sets of values, that is, values that occur together in some statistical sense (as in the numbers on a pair of dice tossed simultaneously.) In your case the variables valuex and valuey that you use need to occur together to be used this way, but it isn't clear from your listing which are coincident. Three different values of valuex are listed with the same 'time', so 'time' can't be the criterion for coincidence. What is? Which pairs of valuex and valuey are actually coincident pairs? Those are what you should use to compute correlation between valuex and valuey, and to get a reliable measure, there should be a large number of these so as to give a good representation of the underlying statistics involved.

  If NaNs can occur, before you discard these it is necessary to verify that the occurrence of NaNs is not itself correlated in some way with the statistics of finite values of these random variables. Otherwise your statistics could be skewed.

Roger Stafford

Subject: correlation (different amount of rows)

From: rehuka

Date: 3 Feb, 2009 07:42:08

Message: 3 of 3

Thanks for the input Roger, I see what you mean.I realised I did not explain the data good enough, so I will do that now. Sorry for the poor explanation.

The first row is indeed time, or the cell of a model if you wish, but it is the same as the traveltime to that cell.

The complete dataset has all the traveltime steps (1:348), and a corresponding value. This is the model output. Model output is calibrated against observations, which is the "incomplete" set with NaNs. I have made these NaNs myself as they are not relevant for the model, and should not me included in calibration (nor analysis after) and not read by Matlab in the correlation. So you can imagine it as a groundwater problem, where the traveltime is the same to (sometimes) multiple observations. I will average these observations, and weight them, so that there is only 1 observation per time. The "incomplete" dataset will have traveltimes like 26,28,30,60,96,98....but the value of each set at the same time can be correlated.

So I would like Matlab to read this, find where there are values for the same time (first row) and correlate all these values (or do other statistics).

I hope this was more clear. Thanks beforehand again.
Reinert

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us