Generate correlated samples with copulas: Problems/Errors by using "copulafit"

2 views (last 30 days)
Hello everybody,
I need to generate samples out of real measured data for teaching an unsupervised machine learning algorithm. Inspired by the examples in the documentation ( link: Page in Documentation ) I would like to do this by using copulafit .
In the following code, the variable thisData is a m-by-n-matrix ( m: samples, size: 231 and n: different indicators with their own distributions, size: 16 ). Out of these 231 measurementsets I would like to generate 2500 (variable "nSample") samples with the dependancy of the 16 different distributions.
Error Message:
By executing the code, there is the following error in the command line display: "Error in copulafit (line 125) [lowerBnd,upperBnd] = bracket1D(profileFun,lowerBnd,5); % 'upper', search ascending from 5
Error in SimulatingDependentRandomVariablesUsingCopulas3 (line 13) [Rho,nu] = copulafit('t',u,'Method','ApproximateML')"
My Code:
%%Show distributions of dataset
plotmatrix(thisData)
%%Transform the data to the copula scale (unit square) using a kernel estimator of the cumulative distribution function.
for i = 1:size(thisData,2)
u(:,i) = ksdensity(thisData(:,i),thisData(:,i),'function','cdf');
end
%plotmatrix(u,'Direction','out')
%%Fit a 't' copula.
[Rho,nu] = copulafit('t',u,'Method','ApproximateML')
%%Generate a random sample from the t copula.
r = copularnd('t',Rho,nu,1000);
u1 = r(:,1);
v1 = r(:,2);
scatterhist(u1,v1,'Direction','out')
xlabel('u')
ylabel('v')
set(get(gca,'children'),'marker','.')
%%Transform the random sample back to the original scale of the data.
x1 = ksdensity(x,u1,'function','icdf');
y1 = ksdensity(y,v1,'function','icdf');
scatterhist(x1,y1,'Direction','out')
set(get(gca,'children'),'marker','.')
I appreciate your help and support - thank you very much.
Jonas
  2 Comments
Tom Lane
Tom Lane on 24 Jul 2015
You give the location of the error but not the text of the error. Could you add that? I got your code to run after correcting the variable names x and y.
rowJoe
rowJoe on 27 Jul 2015
Edited: rowJoe on 27 Jul 2015
Hi Tom,
thank you very much for your comment. This is the error message:
Error using copulafit/approxProfileNLL_t (line 290)
The estimate of Rho has become rank-deficient. You may have too few data, or strong dependencies among variables.
Error in copulafit>bracket1D (line 489) oldnll = nllFun(bound);
Error in copulafit (line 125)
[lowerBnd,upperBnd] = bracket1D(profileFun,lowerBnd,5); % 'upper', search ascending from 5
Error in SimulatingDependentRandomVariablesUsingCopulas3 (line 14)
[Rho,nu] = copulafit('t',u,'Method','ApproximateML')
Moreover, I attached the file "exampleData.mat" which contains an example of matrix "thisData".

Sign in to comment.

Answers (1)

Shruti Sapre
Shruti Sapre on 29 Jul 2015
Edited: Shruti Sapre on 29 Jul 2015
Hi Jonas,
I understand that you are receiving an error while using the “copulafit” function with your data.
This error is due to collinearities in the input data; in this case, due to the presence of duplicate columns. The rank of the matrix (14) is less than the number of columns in the matrix (16).
These singularities can be observed by computing the eigenvalues of matrix "thisData" using the following commands:
>> [V,D] = eig(corr(thisData))
>> Eigenvalues = diag(D)
>> Eigenvalues(find(abs (Eigenvalues < 10^(-16))))
You can observe that there are a couple of eigenvalues which are smaller than 10^-16; effectively machine zeros in the context of a sample covariance/correlation matrix. Therefore, based on linear algebra and machine precision, some columns of "thisData" are treated as a linear combination of some other ones.
Modifying the matrix so that its rank is equal to the number of columns/indicators/variables can help resolve your issue.
Hope this helps!
-Shruti

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!