I have a probability problem that requires matrix A=[1 1 1 2 2] going through perms(A) which gives me many identical rows. How do I eliminate all the identical rowa but one? Answer probably is the easy one, but not for a beginner like myself. Thanks much.
Vladimir T
vladimir <rus_lopes@comcast.net> wrote in message
<27823314.1203129850043.JavaMail.jakarta@nitrogen.mathforum.org>...
> I have a probability problem that requires matrix A=[1 1 1 2 2] going through
perms(A) which gives me many identical rows. How do I eliminate all the
identical rowa but one? Answer probably is the easy one, but not for a beginner
like myself. Thanks much.
> Vladimir T
I don't know if its easy, but the answer is
definitely a "unique" one.
vladimir <rus_lopes@comcast.net> wrote in message
<27823314.1203129850043.JavaMail.jakarta@nitrogen.mathforum.org>...
> I have a probability problem that requires matrix A=[1 1 1 2 2] going through
perms(A) which gives me many identical rows. How do I eliminate all the
identical rowa but one? Answer probably is the easy one, but not for a beginner
like myself. Thanks much.
> Vladimir T
I don't know if its easy, but the answer is
definitely a "unique" one.
vladimir <rus_lopes@comcast.net> wrote in message
<27823314.1203129850043.JavaMail.jakarta@nitrogen.mathforum.org>...
> I have a probability problem that requires matrix A=[1 1 1 2 2] going through
perms(A) which gives me many identical rows. How do I eliminate all the
identical rowa but one? Answer probably is the easy one, but not for a beginner
like myself. Thanks much.
> Vladimir T
I don't know if its easy, but the answer is
definitely a "unique" one.
vladimir <rus_lopes@comcast.net> wrote in message
<27823314.1203129850043.JavaMail.jakarta@nitrogen.mathforum.org>...
> I have a probability problem that requires matrix A=[1 1 1 2 2] going through
perms(A) which gives me many identical rows. How do I eliminate all the
identical rowa but one? Answer probably is the easy one, but not for a beginner
like myself. Thanks much.
> Vladimir T
I don't know if its easy, but the answer is
definitely a "unique" one.
vladimir <rus_lopes@comcast.net> wrote in message
<27823314.1203129850043.JavaMail.jakarta@nitrogen.mathforum.org>...
> I have a probability problem that requires matrix A=[1 1 1 2 2] going
through perms(A) which gives me many identical rows. How do I eliminate all
the identical rowa but one? Answer probably is the easy one, but not for a
beginner like myself. Thanks much.
> Vladimir T
---------
If you have p ones and q twos, the number of different rows would be
nCp = (p+q)!/(p!*q!)
where n = p+q. In your example, this would be 5C3 = 10 out of a total of 5!
= 120 produced by 'perms'. With such small numbers, using 'perms' followed
by 'unique' to eliminate repetitions, as was suggested by John, is a good
method.
However, for larger values of p and q, this method runs into problems. For
example, if p and q were both equal to 7, then 'perms' would generate an
array of some 14! = 87 billion rows which would then have to be narrowed
down to a much smaller 14C7 = 3432 different rows, a rather inefficient
process even if one had enough memory.
This would suggest that one might consider using matlab's 'nchoosek'
function which creates all possible combinations of p things out of a set of n
things. That is, it directly creates the magic number of nCp different
combinations with nothing that need be eliminated. One way to use
'nchoosek' for this purpose might be the following:
n = p+q;
C = nchoosek(1:n,p);
N = size(C,1); % (N = nCp)
B = repmat(2,N,n); % Start with all twos
for ix = 1:N
B(ix,C(ix,:)) = 1; % Insert p ones into each row
end
Array B would then have the desired result. (Actually this for-loop could be
eliminated using 'sub2ind' appropriately.)
A note of caution. Unfortunately I don't have 'nchoosek' to experiment with
on my system, and there is one aspect of it that is worrisome. In their
documentation MathWorks gives the same "practical" limit on the use of
'nchoosek' as it does for 'perms', namely that n ought not to exceed 15. One
would think that 'nchoosek' ought to be able to deal with considerably larger
values of n than this without encountering the difficulties of 'perms'. This
would appear to suggest that perhaps 'nchoosek' also does something crude
like a 'perms' operation followed by a call to 'unique', in which case there
would be no particular advantage to using 'nchoosek' as given above. A huge
array would be created in either case. Can anyone in this group tell me if that
is true?
On Feb 16, 10:55=A0am, "Roger Stafford"
<ellieandrogerxy...@mindspring.com.invalid> wrote:
> vladimir <rus_lo...@comcast.net> wrote in message
>
> <27823314.1203129850043.JavaMail.jaka...@nitrogen.mathforum.org>...> I hav=
e a probability problem that requires matrix A=3D[1 1 1 2 2] going
>
> through perms(A) which gives me many identical rows. How do I eliminate al=
l
> the identical rowa but one? Answer probably is the easy one, but not for a=
> beginner like myself. Thanks much.> Vladimir T
>
> ---------
> =A0 If you have p ones and q twos, the number of different rows would be
>
> =A0nCp =3D (p+q)!/(p!*q!)
>
> where n =3D p+q. =A0In your example, this would be 5C3 =3D 10 out of a tot=
al of 5!
> =3D 120 produced by 'perms'. =A0With such small numbers, using 'perms' fol=
lowed
> by 'unique' to eliminate repetitions, as was suggested by John, is a good
> method.
>
> =A0 However, for larger values of p and q, this method runs into problems.=
=A0For
> example, if p and q were both equal to 7, then 'perms' would generate an
> array of some 14! =3D 87 billion rows which would then have to be narrowed=
> down to a much smaller 14C7 =3D 3432 different rows, a rather inefficient
> process even if one had enough memory.
>
> =A0 This would suggest that one might consider using matlab's 'nchoosek'
> function which creates all possible combinations of p things out of a set =
of n
> things. =A0That is, it directly creates the magic number of nCp different
> combinations with nothing that need be eliminated. =A0One way to use
> 'nchoosek' for this purpose might be the following:
>
> =A0n =3D p+q;
> =A0C =3D nchoosek(1:n,p);
> =A0N =3D size(C,1); % (N =3D nCp)
> =A0B =3D repmat(2,N,n); % Start with all twos
> =A0for ix =3D 1:N
> =A0 B(ix,C(ix,:)) =3D 1; % Insert p ones into each row
> =A0end
>
> Array B would then have the desired result. =A0(Actually this for-loop cou=
ld be
> eliminated using 'sub2ind' appropriately.)
>
> =A0 A note of caution. =A0Unfortunately I don't have 'nchoosek' to experim=
ent with
> on my system, and there is one aspect of it that is worrisome. =A0In their=
> documentation MathWorks gives the same "practical" limit on the use of
> 'nchoosek' as it does for 'perms', namely that n ought not to exceed 15. =
=A0One
> would think that 'nchoosek' ought to be able to deal with considerably lar=
ger
> values of n than this without encountering the difficulties of 'perms'. =
=A0This
> would appear to suggest that perhaps 'nchoosek' also does something crude
> like a 'perms' operation followed by a call to 'unique', in which case the=
re
> would be no particular advantage to using 'nchoosek' as given above. =A0A =
huge
> array would be created in either case. =A0Can anyone in this group tell me=
if that
> is true?
>
> Roger Stafford
Neither PERMS or UNIQUE are used in NCHOOSEK. However,
NCHOOSEK calls internal function COMBS which is somehow
defined with a recursive call to itself.
"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in
message <fp710n$32a$1@fred.mathworks.com>...
> ..........
> A note of caution. Unfortunately I don't have 'nchoosek' to experiment
with
> on my system, and there is one aspect of it that is worrisome. In their
> documentation MathWorks gives the same "practical" limit on the use of
> 'nchoosek' as it does for 'perms', namely that n ought not to exceed 15.
One
> would think that 'nchoosek' ought to be able to deal with considerably
larger
> values of n than this without encountering the difficulties of 'perms'. This
> would appear to suggest that perhaps 'nchoosek' also does something
crude
> like a 'perms' operation followed by a call to 'unique', in which case there
> would be no particular advantage to using 'nchoosek' as given above. A
huge
> array would be created in either case. Can anyone in this group tell me if
that
> is true?
>
> Roger Stafford
---------
Vladimir, I've just been reading John D'Errico's description of his loop
version of 'nchoosek' at
which he calls 'loopchoose'. In it he mentions that the call nchoosek(1:20,10)
takes only 18.35 seconds on his computer. That puts to rest the worries I
mentioned earlier. If it were to first create a 'perms' type output, it would
generate 20! = 2.4e20 rows, which clearly could not have happened in 18.35
seconds nor in any computer known to man. So you can forget what I said in
that last paragraph. MathWorks needs to correct their documentation. You
should be able to use 'nchoosek' effectively for the larger values of p and q.
For more than two different numbers in your original array, let's say p of
one, q of another, r of another, etc., the total number of distinct rows would
be:
(p+q+r+...)/(p!*q!*r!*...) .
I think an iterative process could be devised that would solve your problem in
such a case with multiple calls on 'nchoosek'. However, I couldn't tell from
your question just how general a problem you are faced with.
one of the other many solutions is outlined below
(including the comparison with a more traditional one as
shown by others)...
timing on a wintel c2.2*2.4/2gb/winxp.sp2/r2008a
us
% the data
a=[3,3,3,6,6,3,3,7];
% the engine
tic;
as=sum(a);
au=unique(a);
na=numel(a);
nu=numel(au);
x=cell(na,1);
ap=repmat({au},1,na);
[x{1:na,1}]=ndgrid(ap{:});
x=reshape(cat(na+1,x{:}),[],na);
xs=whos('x');
x=x(sum(x,2)==as,:);
x=x(:,end:-1:1);
toc
% comparison with a more traditional solution
tic;
p=perms(a);
ps=whos('p');
pu=unique(p,'rows');
toc
% - max temp sizes and <equal>
disp([xs.bytes;ps.bytes;isequal(x,pu)]);
%{
Elapsed time is 0.051470 seconds. % sol #1
Elapsed time is 0.182210 seconds. % sol #2
419904 % <- x max size
2580480 % <- p max size
1 % <- res are equal
%}
"us " <us@neurol.unizh.ch> wrote in message <fp7991$656
$1@fred.mathworks.com>...
> vladimir:
> <SNIP combinatorial evergreen...
>
> one of the other many solutions is outlined below
> (including the comparison with a more traditional one as
> shown by others)...
> timing on a wintel c2.2*2.4/2gb/winxp.sp2/r2008a
>
> us
>
> % the data
> a=[3,3,3,6,6,3,3,7];
> % the engine
> tic;
> as=sum(a);
> au=unique(a);
> na=numel(a);
> nu=numel(au);
> x=cell(na,1);
> ap=repmat({au},1,na);
> [x{1:na,1}]=ndgrid(ap{:});
> x=reshape(cat(na+1,x{:}),[],na);
> xs=whos('x');
> x=x(sum(x,2)==as,:);
> x=x(:,end:-1:1);
> toc
> % comparison with a more traditional solution
> tic;
> p=perms(a);
> ps=whos('p');
> pu=unique(p,'rows');
> toc
> % - max temp sizes and <equal>
> disp([xs.bytes;ps.bytes;isequal(x,pu)]);
>
> %{
> Elapsed time is 0.051470 seconds. % sol #1
> Elapsed time is 0.182210 seconds. % sol #2
> 419904 % <- x max size
> 2580480 % <- p max size
> 1 % <- res are equal
> %}
----------
I have serious doubts about the above line
x=x(sum(x,2)==as,:);
in your solution #1, Urs. Suppose that array a = [1,2,3,4]. Presumably, at the
time you do the 'whos' command, 'x' will have 4^4 = 256 rows. Among its
various rows will appear [1,1,4,4], [1,3,3,3], and various rearrangements of
these, all of which have the as = 10 sum required by the above line. Won't
they slip through into your final x? I would predict that 'x' in this case will
have ten extra unwanted rows above the expected twenty four. I think you
need to make your rejection criterion more exacting than a simple sum of the
original elements.
YES... i KNEW one of the very CSSM seniors( :-) ) would
hook on to this (obvious) weakness the moment i clicked the
<post message> button... (seriously)
a more proper solution looks like the following snipplet -
and reveals (once more) that one solution is not
necessarily good for life...
us
put this into a file, eg, <foo.m>
function foo(a)
tic;
au=unique(a);
na=numel(a);
x=cell(na,1);
ap=repmat({au},1,na);
[x{1:na,1}]=ndgrid(ap{:});
x=reshape(cat(na+1,x{:}),[],na);
xs=whos('x');
x=x(:,end:-1:1);
x=x(cellfun(@(x) isequal(x,sort(a)),...
num2cell(sort(x,2),2),'uni',true),:);
toc
% comparison with a more traditional solution
tic;
p=perms(a);
ps=whos('p');
pu=unique(p,'rows');
toc
% - max temp sizes and <equal>
disp([xs.bytes;ps.bytes;isequal(x,pu)]);
end
% results of various runs on a wintel os
% c2.2*2.4/2gb/winvista.sp1/r2008a
foo([3,3,3,6,6,3,3,7]);
% Elapsed time is 0.081403 seconds. OK
% Elapsed time is 0.137693 seconds.
% 419904
% 2580480
% 1
foo(1:4)
% Elapsed time is 0.029856 seconds. OK
% Elapsed time is 0.058627 seconds.
% 8192
% 768
% 1
foo(1:7);
% Elapsed time is 10.774500 seconds. !!!!!
% Elapsed time is 0.009430 seconds.
% 46118408
% 282240
% 1
foo([1,1,1,1,1,1,1,1,1]);
% Elapsed time is 0.004127 seconds. !!!!!
% Elapsed time is 1.924781 seconds.
% 72 !!!!!
% 26127360
% 1
foo([1,1,1,1,1,2,1,1,1]);
% Elapsed time is 0.072507 seconds.
% Elapsed time is 2.280715 seconds. !!!!!
% 36864
% 26127360
% 1
"us " <us@neurol.unizh.ch> wrote in message <fp8c8e$ami
$1@fred.mathworks.com>...
\> YES... i KNEW one of the very CSSM seniors( :-) ) would ..........
There is another aspect of the above (corrected) solution #1 that bothers
me, Urs, though I admittedly don't have a better alternative ready to put
forth. With some of the larger numbers of elements in array 'a', one comes
up with some discouragingly large arrays for 'x' that have to be greatly
reduced in size. Suppose we have a = [1 2 3 4 5 6 7 7 7 7 7 7]. Then the
number of rows in x before it is reduced would be 7^12, which is a hefty
1.38e10 rows, whereas the number of valid combinations is only 12!/6!,
which is a very much smaller 6.65e5 rows. This is a very severe rejection
process with only one survivor in every twenty thousand. In this case, even
'perms' gives rise to a smaller 12!, which would be only 4.8e8 rows.
My intuition tells me that for a robust, general solution to this problem
suitable for all cases for which there is a reasonable number of rows in the
final solution, it would be best to use an algorithm which does not require
that any combinations be rejected later on, even if much despised for-loops
have to be used. I envision it as a generalization and adaptation of the
algorithm which is apparently used in the 'nchoosek' function. John D'Errico
has written 'loopchoose', which if I understand it correctly, creates only valid
combinations successively each time it takes a step, with none needing to be
rejected. The idea would be to generalize such methods to any number of
distinct values in the array rather than just the two which, in effect,
'nchoosek' and 'loopchoose' deal with.
Public Submission Policy
NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for
all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content.
Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available
via MATLAB Central. Read the complete Disclaimer prior to use.