Thread Subject: calculating distance using PDIST

Subject: calculating distance using PDIST

From: amit

Date: 21 May, 2008 12:39:01

Message: 1 of 4

I am using PDIST function of Statistics Toolbox to find
Euclidean Distance.
but the memory requirement goes beyond the RAM capacity when
I use matrix size (50000,20).

I am trying following :

a=randn(50000,20);
b=squareform(pdist(a));

How the memory requirement can be minimized?


Subject: calculating distance using PDIST

From: Steven Lord

Date: 21 May, 2008 13:13:39

Message: 2 of 4


"amit " <amit_tilwankar@yahoo.com> wrote in message
news:g11555$ihe$1@fred.mathworks.com...
>I am using PDIST function of Statistics Toolbox to find
> Euclidean Distance.
> but the memory requirement goes beyond the RAM capacity when
> I use matrix size (50000,20).
>
> I am trying following :
>
> a=randn(50000,20);
> b=squareform(pdist(a));
>
> How the memory requirement can be minimized?

When you call PDIST on that matrix, the output (which will become the input
to SQUAREFORM) is a 1,249,975,000 element vector, which would require
approximately 9.3 GB of memory. I'm guessing your computer doesn't have
that much memory. Then when you call SQUAREFORM, that creates a
50000-by-50000 matrix as output, which requires _another_ 9.3 GB of memory.

You will need to break your problem into smaller pieces and process each
piece separately. Alternately, if you post the problem you're trying to
solve to the newsgroup (and please post it, don't email it to me directly,
I'll see it when you post it), there may be a way to solve it that doesn't
require creating such a large matrix.

--
Steve Lord
slord@mathworks.com


Subject: calculating distance using PDIST

From: Peter Perkins

Date: 21 May, 2008 16:02:39

Message: 3 of 4

amit wrote:
> I am using PDIST function of Statistics Toolbox to find
> Euclidean Distance.
> but the memory requirement goes beyond the RAM capacity when
> I use matrix size (50000,20).

Are you sure you have your data in the correct orientation, i.e., 50000
observations, each in 20 dimensions?

As others have said, you are unlikely to be able to fit a 50000x49999/2 distance
matrix upper triangle into memory, and even if you could, what would you do with
such a thing that wouldn't take even _more_ memory?

Subject: calculating distance using PDIST

From: amit

Date: 21 May, 2008 18:13:01

Message: 4 of 4

Thanks for suggestions!

I just want to find out the distance between all the vectors
 i.e. 50000x20 using PDIST, how the operation can be divided
into pieces?

"Steven Lord" <slord@mathworks.com> wrote in message
<g11763$t0v$1@fred.mathworks.com>...
>
> "amit " <amit_tilwankar@yahoo.com> wrote in message
> news:g11555$ihe$1@fred.mathworks.com...
> >I am using PDIST function of Statistics Toolbox to find
> > Euclidean Distance.
> > but the memory requirement goes beyond the RAM capacity when
> > I use matrix size (50000,20).
> >
> > I am trying following :
> >
> > a=randn(50000,20);
> > b=squareform(pdist(a));
> >
> > How the memory requirement can be minimized?
>
> When you call PDIST on that matrix, the output (which will
become the input
> to SQUAREFORM) is a 1,249,975,000 element vector, which
would require
> approximately 9.3 GB of memory. I'm guessing your
computer doesn't have
> that much memory. Then when you call SQUAREFORM, that
creates a
> 50000-by-50000 matrix as output, which requires _another_
9.3 GB of memory.
>
> You will need to break your problem into smaller pieces
and process each
> piece separately. Alternately, if you post the problem
you're trying to
> solve to the newsgroup (and please post it, don't email it
to me directly,
> I'll see it when you post it), there may be a way to solve
it that doesn't
> require creating such a large matrix.
>
> --
> Steve Lord
> slord@mathworks.com
>
>

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

Public Submission Policy

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.

Contact us at files@mathworks.com