Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Bootstrapping multivariate data

Subject: Bootstrapping multivariate data

From: CT

Date: 17 Aug, 2010 05:59:05

Message: 1 of 14

Hi all,

I am searching for a Matlab function that can do the non-parametric bootstrapping of multivariate data. For instance, I have a matrix of sample data (MxN) where M is the dimension of the random vector (multivariate data), and N is the number of observation. I want to generate (resample) bootstrap data from this initial multivariate data. Does anyone knows this function?

Thank you very much for your help.

Best regards,
CT DO

Subject: Bootstrapping multivariate data

From: Rogelio

Date: 17 Aug, 2010 06:33:42

Message: 2 of 14

"CT " <cong-thanh.do@hotmail.fr> wrote in message <i4d8f9$n3a$1@fred.mathworks.com>...
> Hi all,
>
> I am searching for a Matlab function that can do the non-parametric bootstrapping of multivariate data. For instance, I have a matrix of sample data (MxN) where M is the dimension of the random vector (multivariate data), and N is the number of observation. I want to generate (resample) bootstrap data from this initial multivariate data. Does anyone knows this function?
>
> Thank you very much for your help.
>
> Best regards,
> CT DO

matlab does not not have a pre-built function for multivariate data.However, in the file exhcnage you can find a code, the function is called 'bstrag'

Subject: Bootstrapping multivariate data

From: Peter Perkins

Date: 17 Aug, 2010 12:34:34

Message: 3 of 14

On 8/17/2010 1:59 AM, CT wrote:
> I am searching for a Matlab function that can do the non-parametric
> bootstrapping of multivariate data. For instance, I have a matrix of
> sample data (MxN) where M is the dimension of the random vector
> (multivariate data), and N is the number of observation. I want to
> generate (resample) bootstrap data from this initial multivariate data.
> Does anyone knows this function?

If you have access to the Statistics Toolbox, the BOOTSTRP function does
what you are asking. it is here:

<http://www.mathworks.com/access/helpdesk/help/toolbox/stat /bootstrp.html>

Subject: Bootstrapping multivariate data

From: Simon Preston

Date: 17 Aug, 2010 14:18:05

Message: 4 of 14

Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <i4dvkq$8ns$2@fred.mathworks.com>...
> On 8/17/2010 1:59 AM, CT wrote:
> > I am searching for a Matlab function that can do the non-parametric
> > bootstrapping of multivariate data. For instance, I have a matrix of
> > sample data (MxN) where M is the dimension of the random vector
> > (multivariate data), and N is the number of observation. I want to
> > generate (resample) bootstrap data from this initial multivariate data.
> > Does anyone knows this function?
>
> If you have access to the Statistics Toolbox, the BOOTSTRP function does
> what you are asking. it is here:
>
> <http://www.mathworks.com/access/helpdesk/help/toolbox/stat /bootstrp.html>

Isn't this just:

X(:,ceil(rand(1,N)*N))

where X is the sample matrix?

Subject: Bootstrapping multivariate data

From: CT

Date: 17 Aug, 2010 16:31:11

Message: 5 of 14

Thank you all for your replies. I'll try to perform your suggestions and will let you know about the results.

CT DO


"Simon Preston" <preston.simon+mathsworks@gmail.com> wrote in message <i4e5mt$c2f$1@fred.mathworks.com>...
> Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <i4dvkq$8ns$2@fred.mathworks.com>...
> > On 8/17/2010 1:59 AM, CT wrote:
> > > I am searching for a Matlab function that can do the non-parametric
> > > bootstrapping of multivariate data. For instance, I have a matrix of
> > > sample data (MxN) where M is the dimension of the random vector
> > > (multivariate data), and N is the number of observation. I want to
> > > generate (resample) bootstrap data from this initial multivariate data.
> > > Does anyone knows this function?
> >
> > If you have access to the Statistics Toolbox, the BOOTSTRP function does
> > what you are asking. it is here:
> >
> > <http://www.mathworks.com/access/helpdesk/help/toolbox/stat /bootstrp.html>
>
> Isn't this just:
>
> X(:,ceil(rand(1,N)*N))
>
> where X is the sample matrix?

Subject: Bootstrapping multivariate data

From: Peter Perkins

Date: 17 Aug, 2010 17:58:47

Message: 6 of 14

On 8/17/2010 10:18 AM, Simon Preston wrote:
>> <http://www.mathworks.com/access/helpdesk/help/toolbox/stat
>> /bootstrp.html>

Sorry, for some reason that link was missing an "s"
<http://www.mathworks.com/access/helpdesk/help/toolbox/stats/bootstrp.html>

> Isn't this just:
>
> X(:,ceil(rand(1,N)*N))
>
> where X is the sample matrix?

That's the basis of it, yes. But:

1) It's kind of tedious to write the same loop over and over, regardless
of how simple that loop is,
1) There is a good deal of flexibility in the arguments you can pass to
BOOTSTRP, so a single matrix isn't the only case it handles for you, and
2) (in recent MATLAB releases) There is support for parallelizing the
computations using PARFOR (if your installation supports that)

Just as an aside, since 2008b you might find it easier to use RANDI to
generate random integers.

Subject: Bootstrapping multivariate data

From: Rogelio

Date: 17 Aug, 2010 19:31:04

Message: 7 of 14

Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <i4eikn$ndn$1@fred.mathworks.com>...
> On 8/17/2010 10:18 AM, Simon Preston wrote:
> >> <http://www.mathworks.com/access/helpdesk/help/toolbox/stat
> >> /bootstrp.html>
>
> Sorry, for some reason that link was missing an "s"
> <http://www.mathworks.com/access/helpdesk/help/toolbox/stats/bootstrp.html>
>
> > Isn't this just:
> >
> > X(:,ceil(rand(1,N)*N))
> >
> > where X is the sample matrix?
>
> That's the basis of it, yes. But:
>
> 1) It's kind of tedious to write the same loop over and over, regardless
> of how simple that loop is,
> 1) There is a good deal of flexibility in the arguments you can pass to
> BOOTSTRP, so a single matrix isn't the only case it handles for you, and
> 2) (in recent MATLAB releases) There is support for parallelizing the
> computations using PARFOR (if your installation supports that)
>
> Just as an aside, since 2008b you might find it easier to use RANDI to
> generate random integers.

Just one thing to point out, you said that M is the dimention of the data. I thought that you ment different groups or different experiments where the data was collected, after all thats why your data is not of dimenation N*M x 1, for instance. If the columns of the matrix represent different groups, for some or another reason, you cannot pool the series. As far as know 'bootstrp' does not distinguishes among different groups. If this last statement is incorrect, can someone send me the link to read about it.
Thanks

Subject: Bootstrapping multivariate data

From: CT

Date: 18 Aug, 2010 06:17:24

Message: 8 of 14

I mean that I have N observations of the random vectors x, the vector x has M elements, these are the seed data. So each variable here is a vector (of M elements). Their probability density distribution (pdf) might be multivariate distribution, e.g. Gaussian mixture model (GMM). Since the bootstrap here is non-parametric, the N observations will be used instead of a concrete pdf.

I have tried to used BOOTSTRP to perform the bootstrapping, but it is not easy, even unfeasible (tell me if I am wrong), since the manual of BOOTSTRP in Matlab is not clear in this case (I think).

If the generated data is only X(:,ceil(rand(1,N)*N)), I don't see anything new that the bootstrap can bring. As I see, this is only a disorder of the initial data, we cannot expect anything different from the new data, I'm wrong?

"Rogelio " <rogelioa@math.uio.no> wrote in message <i4eo1o$c8b$1@fred.mathworks.com>...
> Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <i4eikn$ndn$1@fred.mathworks.com>...
> > On 8/17/2010 10:18 AM, Simon Preston wrote:
> > >> <http://www.mathworks.com/access/helpdesk/help/toolbox/stat
> > >> /bootstrp.html>
> >
> > Sorry, for some reason that link was missing an "s"
> > <http://www.mathworks.com/access/helpdesk/help/toolbox/stats/bootstrp.html>
> >
> > > Isn't this just:
> > >
> > > X(:,ceil(rand(1,N)*N))
> > >
> > > where X is the sample matrix?
> >
> > That's the basis of it, yes. But:
> >
> > 1) It's kind of tedious to write the same loop over and over, regardless
> > of how simple that loop is,
> > 1) There is a good deal of flexibility in the arguments you can pass to
> > BOOTSTRP, so a single matrix isn't the only case it handles for you, and
> > 2) (in recent MATLAB releases) There is support for parallelizing the
> > computations using PARFOR (if your installation supports that)
> >
> > Just as an aside, since 2008b you might find it easier to use RANDI to
> > generate random integers.
>
> Just one thing to point out, you said that M is the dimention of the data. I thought that you ment different groups or different experiments where the data was collected, after all thats why your data is not of dimenation N*M x 1, for instance. If the columns of the matrix represent different groups, for some or another reason, you cannot pool the series. As far as know 'bootstrp' does not distinguishes among different groups. If this last statement is incorrect, can someone send me the link to read about it.
> Thanks

Subject: Bootstrapping multivariate data

From: Rogelio

Date: 18 Aug, 2010 06:55:23

Message: 9 of 14

If you are saying or have a feeling that your data might come from a multivariate distribution, then as far as I know 'bootstrp' will pool your data together, assuming they come from the same pdf which might be an erronous assumption.
> I have tried to used BOOTSTRP to perform the bootstrapping, but it is not easy, even unfeasible (tell me if I am wrong), since the manual of BOOTSTRP in Matlab is not clear in this case (I think)<
Why? can you tell us what is the mistake or post the code
>As I see, this is only a disorder of the initial data, we cannot expect anything different from the new data, I'm wrong?<
What the bootstrapring does, roughly speaking, is to resample with replacement. We create pseudo random variables out from your original data. The empirical pdf will converge to the pdf, this is asymptotically.


"CT " <cong-thanh.do@hotmail.fr> wrote in message <i4fttk$qpk$1@fred.mathworks.com>...
> I mean that I have N observations of the random vectors x, the vector x has M elements, these are the seed data. So each variable here is a vector (of M elements). Their probability density distribution (pdf) might be multivariate distribution, e.g. Gaussian mixture model (GMM). Since the bootstrap here is non-parametric, the N observations will be used instead of a concrete pdf.
>
> I have tried to used BOOTSTRP to perform the bootstrapping, but it is not easy, even unfeasible (tell me if I am wrong), since the manual of BOOTSTRP in Matlab is not clear in this case (I think).
>
> If the generated data is only X(:,ceil(rand(1,N)*N)), I don't see anything new that the bootstrap can bring. As I see, this is only a disorder of the initial data, we cannot expect anything different from the new data, I'm wrong?
>
> "Rogelio " <rogelioa@math.uio.no> wrote in message <i4eo1o$c8b$1@fred.mathworks.com>...
> > Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <i4eikn$ndn$1@fred.mathworks.com>...
> > > On 8/17/2010 10:18 AM, Simon Preston wrote:
> > > >> <http://www.mathworks.com/access/helpdesk/help/toolbox/stat
> > > >> /bootstrp.html>
> > >
> > > Sorry, for some reason that link was missing an "s"
> > > <http://www.mathworks.com/access/helpdesk/help/toolbox/stats/bootstrp.html>
> > >
> > > > Isn't this just:
> > > >
> > > > X(:,ceil(rand(1,N)*N))
> > > >
> > > > where X is the sample matrix?
> > >
> > > That's the basis of it, yes. But:
> > >
> > > 1) It's kind of tedious to write the same loop over and over, regardless
> > > of how simple that loop is,
> > > 1) There is a good deal of flexibility in the arguments you can pass to
> > > BOOTSTRP, so a single matrix isn't the only case it handles for you, and
> > > 2) (in recent MATLAB releases) There is support for parallelizing the
> > > computations using PARFOR (if your installation supports that)
> > >
> > > Just as an aside, since 2008b you might find it easier to use RANDI to
> > > generate random integers.
> >
> > Just one thing to point out, you said that M is the dimention of the data. I thought that you ment different groups or different experiments where the data was collected, after all thats why your data is not of dimenation N*M x 1, for instance. If the columns of the matrix represent different groups, for some or another reason, you cannot pool the series. As far as know 'bootstrp' does not distinguishes among different groups. If this last statement is incorrect, can someone send me the link to read about it.
> > Thanks

Subject: Bootstrapping multivariate data

From: Rogelio

Date: 18 Aug, 2010 07:08:05

Message: 10 of 14

By the way ...... what is the statistc that you are bootstraping? it will be nice if you post the code.

"Rogelio " <rogelioa@math.uio.no> wrote in message <i4g04r$fa8$1@fred.mathworks.com>...
> If you are saying or have a feeling that your data might come from a multivariate distribution, then as far as I know 'bootstrp' will pool your data together, assuming they come from the same pdf which might be an erronous assumption.
> > I have tried to used BOOTSTRP to perform the bootstrapping, but it is not easy, even unfeasible (tell me if I am wrong), since the manual of BOOTSTRP in Matlab is not clear in this case (I think)<
> Why? can you tell us what is the mistake or post the code
> >As I see, this is only a disorder of the initial data, we cannot expect anything different from the new data, I'm wrong?<
> What the bootstrapring does, roughly speaking, is to resample with replacement. We create pseudo random variables out from your original data. The empirical pdf will converge to the pdf, this is asymptotically.
>
>
> "CT " <cong-thanh.do@hotmail.fr> wrote in message <i4fttk$qpk$1@fred.mathworks.com>...
> > I mean that I have N observations of the random vectors x, the vector x has M elements, these are the seed data. So each variable here is a vector (of M elements). Their probability density distribution (pdf) might be multivariate distribution, e.g. Gaussian mixture model (GMM). Since the bootstrap here is non-parametric, the N observations will be used instead of a concrete pdf.
> >
> > I have tried to used BOOTSTRP to perform the bootstrapping, but it is not easy, even unfeasible (tell me if I am wrong), since the manual of BOOTSTRP in Matlab is not clear in this case (I think).
> >
> > If the generated data is only X(:,ceil(rand(1,N)*N)), I don't see anything new that the bootstrap can bring. As I see, this is only a disorder of the initial data, we cannot expect anything different from the new data, I'm wrong?
> >
> > "Rogelio " <rogelioa@math.uio.no> wrote in message <i4eo1o$c8b$1@fred.mathworks.com>...
> > > Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <i4eikn$ndn$1@fred.mathworks.com>...
> > > > On 8/17/2010 10:18 AM, Simon Preston wrote:
> > > > >> <http://www.mathworks.com/access/helpdesk/help/toolbox/stat
> > > > >> /bootstrp.html>
> > > >
> > > > Sorry, for some reason that link was missing an "s"
> > > > <http://www.mathworks.com/access/helpdesk/help/toolbox/stats/bootstrp.html>
> > > >
> > > > > Isn't this just:
> > > > >
> > > > > X(:,ceil(rand(1,N)*N))
> > > > >
> > > > > where X is the sample matrix?
> > > >
> > > > That's the basis of it, yes. But:
> > > >
> > > > 1) It's kind of tedious to write the same loop over and over, regardless
> > > > of how simple that loop is,
> > > > 1) There is a good deal of flexibility in the arguments you can pass to
> > > > BOOTSTRP, so a single matrix isn't the only case it handles for you, and
> > > > 2) (in recent MATLAB releases) There is support for parallelizing the
> > > > computations using PARFOR (if your installation supports that)
> > > >
> > > > Just as an aside, since 2008b you might find it easier to use RANDI to
> > > > generate random integers.
> > >
> > > Just one thing to point out, you said that M is the dimention of the data. I thought that you ment different groups or different experiments where the data was collected, after all thats why your data is not of dimenation N*M x 1, for instance. If the columns of the matrix represent different groups, for some or another reason, you cannot pool the series. As far as know 'bootstrp' does not distinguishes among different groups. If this last statement is incorrect, can someone send me the link to read about it.
> > > Thanks

Subject: Bootstrapping multivariate data

From: Peter Perkins

Date: 18 Aug, 2010 12:22:25

Message: 11 of 14

On 8/18/2010 2:55 AM, Rogelio wrote:
> If you are saying or have a feeling that your data might come from a
> multivariate distribution, then as far as I know 'bootstrp' will pool
> your data together, assuming they come from the same pdf which might be
> an erronous assumption.

Rogelio, your definition of "multivariate" seems to mean "grouped" or
"stratified" or "from a mixture distribution". The usual way to define
"multivariate" is simply that there are multiple variables. You are
correct that BOOTSTRP does not resample with stratification, but it's
not clear that that is what the OP was asking about.

Subject: Bootstrapping multivariate data

From: CT

Date: 18 Aug, 2010 15:55:28

Message: 12 of 14

For instance, I have a matrix X(M,N) = X(3,500) of initial data. There are thus N = 500 observations of random vector tri-variate random vector x following the multivariate normal distribution. These data can be generated by the code:
mu = [1 -1 -2]; Sigma = [2 -1 1; -1 2 -1; 1 -1 2];
X = mvnrnd(mu, Sigma, 500);
I don't know if I can use 'bootstrp' to generate the data of the same nature, i.e. they follow (asymptotically) the multivariate normal distribution that I have used to generate X:
[bootstat, bootsamp] = bootstrp(10, [], X); (I don't care about the stats of the data at the moment, I want to have the resampled data only).

However, 'bootstrp' returns the matrix bootsamp of dimension 500x10, so 'bootstrp' has done only for one dimensional variable? And I don't know if 'bootstrp' can return the stats for multivariate distribution or not? (here are the mean vector and covariance matrix)

"Rogelio " <rogelioa@math.uio.no> wrote in message <i4g0sl$a6k$1@fred.mathworks.com>...
> By the way ...... what is the statistc that you are bootstraping? it will be nice if you post the code.
>
> "Rogelio " <rogelioa@math.uio.no> wrote in message <i4g04r$fa8$1@fred.mathworks.com>...
> > If you are saying or have a feeling that your data might come from a multivariate distribution, then as far as I know 'bootstrp' will pool your data together, assuming they come from the same pdf which might be an erronous assumption.
> > > I have tried to used BOOTSTRP to perform the bootstrapping, but it is not easy, even unfeasible (tell me if I am wrong), since the manual of BOOTSTRP in Matlab is not clear in this case (I think)<
> > Why? can you tell us what is the mistake or post the code
> > >As I see, this is only a disorder of the initial data, we cannot expect anything different from the new data, I'm wrong?<
> > What the bootstrapring does, roughly speaking, is to resample with replacement. We create pseudo random variables out from your original data. The empirical pdf will converge to the pdf, this is asymptotically.
> >
> >
> > "CT " <cong-thanh.do@hotmail.fr> wrote in message <i4fttk$qpk$1@fred.mathworks.com>...
> > > I mean that I have N observations of the random vectors x, the vector x has M elements, these are the seed data. So each variable here is a vector (of M elements). Their probability density distribution (pdf) might be multivariate distribution, e.g. Gaussian mixture model (GMM). Since the bootstrap here is non-parametric, the N observations will be used instead of a concrete pdf.
> > >
> > > I have tried to used BOOTSTRP to perform the bootstrapping, but it is not easy, even unfeasible (tell me if I am wrong), since the manual of BOOTSTRP in Matlab is not clear in this case (I think).
> > >
> > > If the generated data is only X(:,ceil(rand(1,N)*N)), I don't see anything new that the bootstrap can bring. As I see, this is only a disorder of the initial data, we cannot expect anything different from the new data, I'm wrong?
> > >
> > > "Rogelio " <rogelioa@math.uio.no> wrote in message <i4eo1o$c8b$1@fred.mathworks.com>...
> > > > Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <i4eikn$ndn$1@fred.mathworks.com>...
> > > > > On 8/17/2010 10:18 AM, Simon Preston wrote:
> > > > > >> <http://www.mathworks.com/access/helpdesk/help/toolbox/stat
> > > > > >> /bootstrp.html>
> > > > >
> > > > > Sorry, for some reason that link was missing an "s"
> > > > > <http://www.mathworks.com/access/helpdesk/help/toolbox/stats/bootstrp.html>
> > > > >
> > > > > > Isn't this just:
> > > > > >
> > > > > > X(:,ceil(rand(1,N)*N))
> > > > > >
> > > > > > where X is the sample matrix?
> > > > >
> > > > > That's the basis of it, yes. But:
> > > > >
> > > > > 1) It's kind of tedious to write the same loop over and over, regardless
> > > > > of how simple that loop is,
> > > > > 1) There is a good deal of flexibility in the arguments you can pass to
> > > > > BOOTSTRP, so a single matrix isn't the only case it handles for you, and
> > > > > 2) (in recent MATLAB releases) There is support for parallelizing the
> > > > > computations using PARFOR (if your installation supports that)
> > > > >
> > > > > Just as an aside, since 2008b you might find it easier to use RANDI to
> > > > > generate random integers.
> > > >
> > > > Just one thing to point out, you said that M is the dimention of the data. I thought that you ment different groups or different experiments where the data was collected, after all thats why your data is not of dimenation N*M x 1, for instance. If the columns of the matrix represent different groups, for some or another reason, you cannot pool the series. As far as know 'bootstrp' does not distinguishes among different groups. If this last statement is incorrect, can someone send me the link to read about it.
> > > > Thanks

Subject: Bootstrapping multivariate data

From: Richard Willey

Date: 18 Aug, 2010 16:56:17

Message: 13 of 14

Here's some very basic code that might illustrate what's' going on

%% Generate your original data set

mu = [1 -1 -2]; Sigma = [2 -1 1; -1 2 -1; 1 -1 2];

X = mvnrnd(mu, Sigma, 500);



%% Sampling with replacement to create a new data set



% Generate an index

boot_index = randsample(1:length(X),length(X), 'true')'



% Use the index to create a new dataset

Boot_dataset = X(bootindex,:)



A bootstrap is simply repeating this same operation nboot times and then
calculating something interesting using this set of new data sets.



Jumping back to the whole "multivariate" discussion.



Each time you're drawing from X, you're extracting an entire row.

All of the elements of this row are related in that they are a single output
from your original multivariate normal distribution.



All of this assumes that you need to perform a nonparametric bootstrap.



If you have prior knowledge that your population is described by a
multivariate normal distribution with



mu = [1 -1 -2]



and



Sigma = [2 -1 1; -1 2 -1; 1 -1 2];



then its often entirely appropriate to use parametric bootstrap and
generate your new dataset using mvnrnd.

Subject: Bootstrapping multivariate data

From: CT

Date: 19 Aug, 2010 16:13:58

Message: 14 of 14

Just a correction, the covariance matrix that I have used is only an example to illustrate the generation of multivariate data. A matrix like that might have no sense.
Thank you for the discussions.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us