How to compute the average path from scattered data (and its variance)?

Question

0 votes

I have a set of 2-dimensional vectors which contain each one the positions of a robot during experiments. Each vector has different size. I would like to compute a path that corresponds to the average of all paths (and possibly the variance). The idea is to represents in a compact way the result of all the experiments.

I found a function, but I don't understand how it works. Are there any solution to my problem? I tried to pad all the vectors such that they have the same size and then use mean() command, but results are poor.

Here a plot of some paths I want to average:

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

William Rose on 27 Sep 2021

0 votes

@Alberto Bacchin,

This is a good question which arises in slightly different forms in a wide array of problems. One example: how to average an ensemble of GPS position recordings, each of which corresponds to the same course. Another (from my work): find th average hip or knee angle (in 3D), over one stride, when you have a recording of a person walking for many strides on a treadmill, and the stride lengths and durations vary somewhat.

I assume the recordngs do not all have the same number of points.

Resample each vector to have 1000 elements, using interp1(), then average them with mean().

This is probably the best approach in the absence of ther information.

The most obvious "other information" would be a time stamp for each position. If your data is sampled at uniform intervals, then the array index is a time stamp. If you want the average position at each time, and if the recordings are of varying length, then pad at the beginning with the initial position, or pad at the end with the final position, to get vectors of uniform length, then average them. You said you did something like this and the results were bad. But I don't know how you padded. A different padding choice could help.

3 Comments
Show 1 older comment Hide 1 older comment

Adam Danz on 27 Sep 2021

It really depends on what questions are being asked by the experiment. If the average trajectory space is the important independent variable, spatial averaging (ie, histcounts2) would be sufficient. If the position at time t is the important variable, averaging across all trajectories at fixed time intervals would be sufficient.

Either way, orientation and velcoity data from the averaged trajectory will be virtually meaningless for some parts such as around (14, -4).

Whatever method is used, I highly recommend computing binned bootstrapped confidence intervals to measure the certainty of segments of the averaged trajectory.

Alberto Bacchin on 28 Sep 2021

@William Rose Yeah! The fact tha the robot follow a trajectory that is not a math function is really challenging. By the way, I have time stamps. But I would like to use only spatial information because I don't care about time in my analysis.

I tried to pad the end of the vector by repeating their last element to set the same size for all. But this lead to some wrong estimations, espacially at the end. I think it is due to the fact that sometimes the robot was faster, sometimes it was slower (see picture).

This is why I would not use time in synchronization. But it seems the only way...

In the comment to the answer below I described my second approach.

Sign in to comment.

Answer 2

Adam Danz on 27 Sep 2021

Edited: Adam Danz on 27 Sep 2021

0 votes

It looks like the following assumptions can be made:

The robot starts at the same location
There is a uniform temporal sampling interval.

If this is true, why not just average the (x,y) coordinates across all trials?

Alternatively, you could compute the 2D density of (x,y) values using histcounts2 and then use those data to compute the path of highest density within a 2D grid. This approach would require lots of repetitions (more than what is shown in your sample image).

2 Comments
Show None Hide None

Alberto Bacchin on 28 Sep 2021

Edited: Alberto Bacchin on 28 Sep 2021

@Adam Danz I can confirm the robot starts always in the same position. The position is sampled at constant frequency, but the path execution times are different among the runs.

I achieved good results by down-sampling the vectors (see the picture). Basically I take out thr saome number of equal-spaced samples from them. But this is not properly what I was looking for. I would like to condider only spatial information (namely the trajectories), without caring about time.

As you can see, there are some strange behaviuors, near the T crossroad per example. In blue, the 95% CI computed with bootci and plotted with fill and errorbars.

About histcounts2, I think I have not enough trials (8 in total).

Adam Danz on 28 Sep 2021

Edited: Adam Danz on 28 Sep 2021

I have analyzed similar data created by monkeys steering through a virtual environment but I was working with hundreds of trials. Analyzing 8 trials will produce weak results given the noise within each trial no matter what method you use. I would still recommend using histcounts2 or histogram2 if you want to know the spatial density of trajectory data within a grid.

Sign in to comment.

Answer 3

William Rose on 28 Sep 2021

0 votes

pathSimulateAndAverage.m

@Alberto Bacchin,

Since you said you do not care about time, it is OK to normalize all the paths to have the same number of steps. Here is a script that

Generates 8 random paths with a different numbers of steps for each
Computes 8 paths with the same number of steps, by interpolation
Computes the mean+-SD path
Plots the mean path and plots 1-SD ellipse at 20 points along the path

The generates the plots below. Each run will have different semi-random walks. Good luck.

3 Comments
Show 1 older comment Hide 1 older comment

Adam Danz on 28 Sep 2021

Edited: Adam Danz on 28 Sep 2021

Good idea!

The only problem with this approach is that it assumes all trial durations are the same or at least very similar. That doesn't seem to be the case according to the OP's previous comment, "but the path execution times are different among the runs".

Consider a single trial that ended after 2 second and extends only 2 meters (assuming units are meters) while the other trials lasted 20 seconds and contain 20-meter-long trajectories. When the shorter trajectory is resampled in your solution, that trajectory will have the same number of samples as the longer trajectories and will pull the spatial average toward it which will heavily bias the results.

It also assumes that the single-trial trajectories conform to a mean trajectory. Consider another example where all trials are the same duration and with the same numnber of samples but in one trial the robot was way off course and started looping around a circular path centered at (2,-4). The points in that trial will strongly bias the average trajectory given the relatively few trials. This could be solved by taking the median instead of the mean but the result is still not a spatial average.

If the OP wants a spatial average (which is still not 100% clear to me), then averaging by trajectory indices will probably not suffice unless variability between trials is small, samles are at fixed temporal intervals, and all trials have the same duration.

Adam Danz on 28 Sep 2021

Open in MATLAB Online

My idea to compute 2D density also doesn't seem to fit the OP's needs. There's just too much variation.

Continuing from your m-file using the original simulated paths prior to normalizing number of samples,

[N,xedges,yedges] = histcounts2(path(1,:), path(2,:),15);
% Eliminate the count at the starting point since it 
% will skew the color range
start = [0,0]; 
N(find(xedges<=start(1),1), find(yedges<=start(2),1,'last')) = 0;
hold on
histogram2('XBinEdges',xedges,'YBinEdges',yedges,'BinCounts',N,'FaceColor','Flat','FaceAlpha',.65)
set(gca,'ColorScale','log')
colorbar()

William Rose on 29 Sep 2021

@Alberto Bacchin, @Adam Danz,

I really like the the visual impact of the colored 2D histogram overlaid on te actual paths. It is a very nice example of good graphical communication. The code to make it is very efficient. The 2D histogram approach handles short paths and paths that turn back on themselves better than my approach does. The mean-of-the-normalized paths approach has some attractive features, but as Adam correctly notes, it doesn;t work well if the paths are not reasonably similar. Alberto, you have a couple of potentially good options, or use both!

Sign in to comment.

Answer 4

William Rose on 29 Sep 2021

0 votes

@Alberto Bacchin, here's another plot you can make, if you use the mean-of-the-normalized paths approach (pathSimulateAndAverage.m). The plot shows the raw paths, mean path, and 90% and 95% confidence regions.

The script pathSimulateAndAverage.m makes the paths and the plot above. It calls plotFilledEllipse.m (attached). Thanks to @Star Strider for providing the basis of plotFilledEllipse().

The script calls chi2inv() in the Stats and Machine Learning Toolbox. If you do not have that toolbox, then, if desired confidence interval (ci)=.90, .95, .99, replace chi2inv(ci,2) with 4.605, 5.991, 9.210.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

How to compute the average path from scattered data (and its variance)?

0 Comments
Show -2 older comments Hide -2 older comments

Answers (4)

3 Comments
Show 1 older comment Hide 1 older comment

2 Comments
Show None Hide None

3 Comments
Show 1 older comment Hide 1 older comment

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Products

Release

Tags

Community Treasure Hunt

How to compute the average path from scattered data (and its variance)?

0 Comments Show -2 older comments Hide -2 older comments

Answers (4)

3 Comments Show 1 older comment Hide 1 older comment

2 Comments Show None Hide None

3 Comments Show 1 older comment Hide 1 older comment

0 Comments Show -2 older comments Hide -2 older comments

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

3 Comments
Show 1 older comment Hide 1 older comment

2 Comments
Show None Hide None

3 Comments
Show 1 older comment Hide 1 older comment

0 Comments
Show -2 older comments Hide -2 older comments