Thread Subject: Optimization of sampling scheme

Subject: Optimization of sampling scheme

From: Wolfgang Schwanghart

Date: 6 May, 2008 08:37:04

Message: 1 of 12

Dear all,

I am not an expert in optimization and hence I'd really
appreciate your help with following problem.

Consider n random sampling locations in 2-D. E.g.

n = 100;
x = rand(n,2);

The pairwise euclidean distance between all sampling points
is (in case you have the Stats toolbox)

d = pdist(x);

You may want to plot a histogram of d

hist(d,20);

and realize, that the distances have a non-uniform
distribution. But a uniform distribution of the sampling
points is what I want. Hence, I'd like to shift the sampling
locations so that their pairwise distance approximates a
continuous uniform distribution.

I have no idea how to formulate such an optimization
problem. If you have an idea, hint, ... whatever, that'll be
great. If the question is too general to be answered here,
please let me know. This is no homework :-)

Best regards,
Wolfgang

Subject: Optimization of sampling scheme

From: Wolfgang Schwanghart

Date: 6 May, 2008 09:45:06

Message: 2 of 12

I forgot to mention... The optimization should follow the
constraint that the sampling locations remain within the
rectangular sampling domain set by the initial random
configuration.

x(:,1) >= min(x(:,1)) & <= max(x(:,1))
x(:,2) >= min(x(:,2)) & <= max(x(:,2))

Subject: Optimization of sampling scheme

From: John D'Errico

Date: 6 May, 2008 10:05:05

Message: 3 of 12

"Wolfgang Schwanghart" <schwanghart@googlemail.com> wrote in message
<fvp5bg$s0t$1@fred.mathworks.com>...
> Dear all,
>
> I am not an expert in optimization and hence I'd really
> appreciate your help with following problem.
>
> Consider n random sampling locations in 2-D. E.g.
>
> n = 100;
> x = rand(n,2);
>
> The pairwise euclidean distance between all sampling points
> is (in case you have the Stats toolbox)
>
> d = pdist(x);
>
> You may want to plot a histogram of d
>
> hist(d,20);
>
> and realize, that the distances have a non-uniform
> distribution. But a uniform distribution of the sampling
> points is what I want. Hence, I'd like to shift the sampling
> locations so that their pairwise distance approximates a
> continuous uniform distribution.
>
> I have no idea how to formulate such an optimization
> problem. If you have an idea, hint, ... whatever, that'll be
> great. If the question is too general to be answered here,
> please let me know. This is no homework :-)

Well, its obviously NOT a homework problem.
Why not? Because it has no solution, nor is it
a well formulated problem mathematically.

Consider any given point. If the set of interpoint
distances has as many points close to that point
as there are at any given distance, then in the
(x,y) plane, the density of the points must fall off
as the inverse of the distance from the chosen
location.

Conversely, if the points were perfectly uniformly
distributed over the entire plane, then if we were
to plot a histogram of the points as a function of
euclidean distance, then that histogram would be
linearly increasing. But, we cannot have a set of
points which falls off in density as you are asking
from any location in space. Perhaps you are wishing
for a perfectly uniform sampling of points over the
domain of interest.

Of course, it makes no sense to talk about a
uniform distribution over an infinite plane. When
you sampled randomly over the unit square, the
points near the edges of the square had fewer
near neighbors than did the points in the center
of the square. And of course, the set of distances
that are large enough to be near sqrt(2) will be
small when you sample in the unit square.

So, due to these edge effects, we cannot ever
have a uniform set of distances over the unit
square (or any closed region) come out of pdist.

As I said, this problem has no solution as you
have posed it. Perhaps you are really interested
in having a set that has a roughly CONSTANT
distance to the nearest neighbor for any point.
(Even this wording is probably poor on my part
as I think about it.)

HTH,
John

Subject: Optimization of sampling scheme

From: Wolfgang Schwanghart

Date: 6 May, 2008 10:36:02

Message: 4 of 12

Thanks very much John for your answer. One of the main
problems of non-mathematicians is probably to pose well
formulated problems. Hence, let me explain where I want to go.

I want to set up a sampling scheme inside a rectangle from
which I get the best idea of small to large scale spatial
variability of some geochemical parameter (e.g. N in soils).
In order to assess spatial variability I'll use the
empirical variogram.

http://en.wikipedia.org/wiki/Variogram

In the variogram all distances should be sufficiently
represented for a given number of sampling locations.

Well... colleagues calling for lunchtime ... I think I need
to rethink my problem.

Thanks again,
Wolfgang




"John D'Errico" <woodchips@rochester.rr.com> wrote in
message <fvpagh$75i$1@fred.mathworks.com>...
> "Wolfgang Schwanghart" <schwanghart@googlemail.com> wrote
in message
> <fvp5bg$s0t$1@fred.mathworks.com>...
> > Dear all,
> >
> > I am not an expert in optimization and hence I'd really
> > appreciate your help with following problem.
> >
> > Consider n random sampling locations in 2-D. E.g.
> >
> > n = 100;
> > x = rand(n,2);
> >
> > The pairwise euclidean distance between all sampling points
> > is (in case you have the Stats toolbox)
> >
> > d = pdist(x);
> >
> > You may want to plot a histogram of d
> >
> > hist(d,20);
> >
> > and realize, that the distances have a non-uniform
> > distribution. But a uniform distribution of the sampling
> > points is what I want. Hence, I'd like to shift the sampling
> > locations so that their pairwise distance approximates a
> > continuous uniform distribution.
> >
> > I have no idea how to formulate such an optimization
> > problem. If you have an idea, hint, ... whatever, that'll be
> > great. If the question is too general to be answered here,
> > please let me know. This is no homework :-)
>
> Well, its obviously NOT a homework problem.
> Why not? Because it has no solution, nor is it
> a well formulated problem mathematically.
>
> Consider any given point. If the set of interpoint
> distances has as many points close to that point
> as there are at any given distance, then in the
> (x,y) plane, the density of the points must fall off
> as the inverse of the distance from the chosen
> location.
>
> Conversely, if the points were perfectly uniformly
> distributed over the entire plane, then if we were
> to plot a histogram of the points as a function of
> euclidean distance, then that histogram would be
> linearly increasing. But, we cannot have a set of
> points which falls off in density as you are asking
> from any location in space. Perhaps you are wishing
> for a perfectly uniform sampling of points over the
> domain of interest.
>
> Of course, it makes no sense to talk about a
> uniform distribution over an infinite plane. When
> you sampled randomly over the unit square, the
> points near the edges of the square had fewer
> near neighbors than did the points in the center
> of the square. And of course, the set of distances
> that are large enough to be near sqrt(2) will be
> small when you sample in the unit square.
>
> So, due to these edge effects, we cannot ever
> have a uniform set of distances over the unit
> square (or any closed region) come out of pdist.
>
> As I said, this problem has no solution as you
> have posed it. Perhaps you are really interested
> in having a set that has a roughly CONSTANT
> distance to the nearest neighbor for any point.
> (Even this wording is probably poor on my part
> as I think about it.)
>
> HTH,
> John

Subject: Optimization of sampling scheme

From: John D'Errico

Date: 6 May, 2008 12:20:18

Message: 5 of 12

"Wolfgang Schwanghart" <schwanghart@googlemail.com> wrote in message
<fvpcai$mfm$1@fred.mathworks.com>...
> Thanks very much John for your answer. One of the main
> problems of non-mathematicians is probably to pose well
> formulated problems. Hence, let me explain where I want to go.
>
> I want to set up a sampling scheme inside a rectangle from
> which I get the best idea of small to large scale spatial
> variability of some geochemical parameter (e.g. N in soils).
> In order to assess spatial variability I'll use the
> empirical variogram.
>
> http://en.wikipedia.org/wiki/Variogram
>
> In the variogram all distances should be sufficiently
> represented for a given number of sampling locations.
>
> Well... colleagues calling for lunchtime ... I think I need
> to rethink my problem.
>
> Thanks again,
> Wolfgang

Ah. I was thinking towards a different goal when
I read your first post.

A problem is that the distance distribution of
any set of points inside a finite 2-d domain like
the unit square will be strongly influenced by the
edge effects. So that distribution must drop off
strongly as you begin to look at distances that
span the diameter of the region.

Next, any roughly uniform sampling in 2-d
will exhibit an increase in the number of
neighbors at a distance of d as d increases
from zero. We can see this by understanding
that the area of an annular ring of thickness
dr is pi*r*dr. So we should expect the distance
distribution to rise roughly linearly with radius
for small r, at least for any "locally uniform"
sampling. Again, there are edge effects that
will probably screw this up, making the increase
nonlinear even for small radius.

So I'm not at all sure that a goal of a maximally
uniform distance distribution is achievable.

John

Subject: Optimization of sampling scheme

From: Bruno Luong

Date: 6 May, 2008 12:39:02

Message: 6 of 12

Quite interesting problem. I'm not so sure as John that the
solution does not exist. If you suppose probability
distribution of R=(x1,x2) depends only on |R| (thus in your
case the points are distributed on a disk rather than on a
rectangle), the pdf f(r) of |R| must verifies some integral
equation that I believe there exists a solution. The
integral equation get quit messy with the border, and I'm
too lazy to write it out. If someone is willing to tackle,
feel free to do so.

Bruno

Subject: Optimization of sampling scheme

From: John D'Errico

Date: 6 May, 2008 13:34:03

Message: 7 of 12

"Bruno Luong" <b.luong@fogale.fr> wrote in message
<fvpjh6$sn5$1@fred.mathworks.com>...
> Quite interesting problem. I'm not so sure as John that the
> solution does not exist. If you suppose probability
> distribution of R=(x1,x2) depends only on |R| (thus in your
> case the points are distributed on a disk rather than on a
> rectangle), the pdf f(r) of |R| must verifies some integral
> equation that I believe there exists a solution. The
> integral equation get quit messy with the border, and I'm
> too lazy to write it out. If someone is willing to tackle,
> feel free to do so.
>
> Bruno

Oh, I'm quite comfortable with the idea that an
exact solution cannot exist on the domains we
have been discussing, especially if you allow the
points to lie strictly inside that domain. As you
approach the diameter of the domain, the distance
distribution must suffer some edge effects, even
for a circular domain.

Even for a circular domain, simplest is to force all
the points to lie on the outer edge of the circle.
This will result in a nearly uniform distance
distribution, since we don't have the radius
issue creeping in. But even that suffers from a
subtle problem. For example,

t = linspace(0,2*pi,2000)';
xy = [cos(t),sin(t)];
d = sqrt(bsxfun(@minus,xy(:,1),xy(:,1)').^2 + ...
     bsxfun(@minus,xy(:,2),xy(:,2)').^2);

hist(d(:),200)

Note that at any positive distance less than the
diameter, there are exactly two points at that
distance from any given location.

Even for points that lie uniformly spaced on a
straight line segment, the distance distribution
will be triangular, not uniform. The edge effects
still dominate.

x = linspace(0,1,2000)';
d = pdist(x);
hist(d(:),500)

John

Subject: Optimization of sampling scheme

From: Bruno Luong

Date: 6 May, 2008 16:40:21

Message: 8 of 12

John,

I just wonder in which limit one can rigorously extrapolate
a thinking from a discrete data to a continuous pdf of a
random variable.

Let's simplify the problem first. Do you think in 1D there
is still no solution? (your logical seems to hold in 1D)

Bruno

Subject: Optimization of sampling scheme

From: John D'Errico

Date: 6 May, 2008 16:42:03

Message: 9 of 12

Here is an interesting sampling scheme on the
unit circle. Its not exactly uniform in distance,
but it is far closer than many others. It is
surely not a good sampling of the complete
domain.

t = rand(1000,1)*1.01*pi;
xy = [cos(t),sin(t)];
d = pdist(xy);
hist(d(:),500)

John

Subject: Optimization of sampling scheme

From: John D'Errico

Date: 6 May, 2008 17:36:03

Message: 10 of 12

"John D'Errico" <woodchips@rochester.rr.com> wrote in message
<fvq1or$65s$1@fred.mathworks.com>...
> Here is an interesting sampling scheme on the
> unit circle. Its not exactly uniform in distance,
> but it is far closer than many others. It is
> surely not a good sampling of the complete
> domain.
>
> t = rand(1000,1)*1.01*pi;
> xy = [cos(t),sin(t)];
> d = pdist(xy);
> hist(d(:),500)
>
> John

Its is possible that a more fractal scheme might
produce a nearly uniform distance distribution,
as well as fill the domain of interest better.

t = rand(2000,1).^0.2;
xy = [t.*cos(t*2.5*pi),t.*sin(t*2.5*pi)];
subplot(1,2,1)
plot(xy(:,1),xy(:,2),'r.')
d = pdist(xy);
subplot(1,2,2)
hist(d(:),500)

Have fun,
John

Subject: Optimization of sampling scheme

From: Torsten Hennig

Date: 7 May, 2008 06:48:25

Message: 11 of 12

>Dear all,
>
>I am not an expert in optimization and hence I'd really
>appreciate your help with following problem.
>
>Consider n random sampling locations in 2-D. E.g.
>
>n = 100;
>x = rand(n,2);
>
>The pairwise euclidean distance between all sampling >points
>is (in case you have the Stats toolbox)
>
>d = pdist(x);
>
>You may want to plot a histogram of d
>
>hist(d,20);
>
>and realize, that the distances have a non-uniform
>distribution. But a uniform distribution of the sampling
>points is what I want. Hence, I'd like to shift the >sampling
>locations so that their pairwise distance approximates a
>continuous uniform distribution.
>
>I have no idea how to formulate such an optimization
>problem. If you have an idea, hint, ... whatever, >that'll be
>great. If the question is too general to be answered >here,
>please let me know. This is no homework :-)
>
>Best regards,
>Wolfgang


Hi Wolfgang,

maybe you should tackle the problem 'the other way round'.
Given that the points are to be distributed within the
rectangle x_min <= x <= x_max, y_min <= y <= y_max,
you could first generate distances d_ij (1<=i<=n,
i+1<=j<=n) uniformly distributed over
[0; sqrt((x_max-x_min)^2 + (y_max-y_min)^2)].
After this, you could calculate a series of
points (x_i,y_i) (1<=i<=n) by solving the following
optimization problem:
minimize : eps
-eps <= d_ij^2 - (x_i-x_j)^2 - (y_i-y_j)^2 <= eps
(1<=i<=n, i+1<=j<=n)
x_min <= x_i <= x_max (1<=i<=n)
y_min <= y_i <= y_max (1<=i<=n)

If a solution with a 'small' value of eps exists, you
can be sure that the pointwise distances are
distributed quite uniformly.

It's an interesting problem ; it's worth a try to
post it to sci.math as well.

Best wishes
Torsten.

Subject: Optimization of sampling scheme

From: Wolfgang Schwanghart

Date: 7 May, 2008 13:24:04

Message: 12 of 12

Thanks a lot to you all for your helpful comments. I wrote a
function for the problem as suggested by Torsten. So far
it's rarely an optimization of a 2-D geostatistical sampling
scheme, because nearly always I get a circle type of scheme.
And also it might be most economical to cover all ranges of
distance within a sampling domain, I doubt its acceptance
among soil scientists. :-)

Best regards,
Wolfgang




function [x,x0,dnow,d] = samplescheme(n,dom)

% optimization of 2-D geostatistical sampling scheme
%
% [x,x0,dnow,d] = samplescheme(n,dom)
% ___________________________________________________
%
% Generate a point sampling scheme where the
% distribution of the interpoint distances follows
% a theoretical distribution.
%
% Input:
% n: number of desired sampling locations
% (scalar)
% dom: boundaries of rectangular sampling
% domain ([[minx maxx miny maxy])
%
% Output:
% x: sampling locations (nx2 matrix)
% x0: initial random configuration of sampling
% locations (nx2 matrix)
% dnow: vector with pairwise distances
% d: the desired pairwise distances
%
% Requirements:
% Optimization Toolbox, Statistical Toolbox
%
% Example:
%
% [x,x0,dnow,d] = samplescheme(50,[0 1 0 1])
% scatter(x0(:,1),x0(:,2),'.')
% hold on
% scatter(x(:,1),x(:,2),'rs')
% axis image
% legend('x0','x')
% figure
% subplot(1,2,1);
% hist(d,20);
% title('theor. distribution of sample spacings');
% subplot(1,2,2);
% hist(dnow,20);
% title('emp. distribution of sample spacings');
%
% ___________________________________________________
%



% nr of pairwise distances
n_d = (n^2-n)/2;

% upper and lower limits of distances
dlu = [0.001; ...
       sqrt((dom(2)-dom(1))^2 + (dom(4)-dom(3))^2)];

% create desired distance distribution here
% d = rand(n_d,1);
% d = raylrnd(ones(n_d,1));
% d = log(linspace(1,10,n_d)');
d = hypot(rand(n_d,1),rand(n_d,1));
   
% linear transformation of d to fit upper and lower
% domain boundaries
d = (d-min(d))*(dlu(2)-dlu(1))/max(d-min(d))+dlu(1);

% indices for pairwise distances
IX1 = 1:n;
IX2 = (1:n)';

[IX1,IX2] = meshgrid(IX1,IX2);
[ignore1,ignore2,IX1] = find(spdiags(IX1, -n:-1));
[ignore1,ignore2,IX2] = find(spdiags(IX2, -n:-1));

clear ignore1 ignore2

ij = [IX1(:) IX2(:)];

clear IX1 IX2

% initial values
x0 = rand(n,2);

x0(:,1) = x0(:,1) * (dom(2)-dom(1)) + dom(1);
x0(:,2) = x0(:,2) * (dom(4)-dom(3)) + dom(3);

lb = [repmat(dom(1),n,1) repmat(dom(3),n,1)];
ub = [repmat(dom(2),n,1) repmat(dom(4),n,1)];

% function handle
fun = @(x) d.^2 - ...
           (x(ij(:,1),1)-x(ij(:,2),1)).^2 - ...
           (x(ij(:,1),2)-x(ij(:,2),2)).^2;

% use lsqnonlin to solve the problem
x = lsqnonlin(fun,x0,lb,ub);
dnow = pdist(x)';

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
optimization Wolfgang Schwanghart 6 May, 2008 04:40:24
rssFeed for this Thread

Contact us at files@mathworks.com