Skip to Main Content Skip to Search
Login
File Exchange
MATLAB Newsgroup
Link Exchange
  Blogs  
 Contest 
MathWorks.com

Thread Subject: eliminate identical raws

Subject: eliminate identical raws

From: vladimir

Date: 16 Feb, 2008 02:43:39

Message: 1 of 12

I have a probability problem that requires matrix A=[1 1 1 2 2] going through perms(A) which gives me many identical rows. How do I eliminate all the identical rowa but one? Answer probably is the easy one, but not for a beginner like myself. Thanks much.
Vladimir T

Subject: Re: eliminate identical raws

From: John D'Errico

Date: 16 Feb, 2008 04:08:02

Message: 2 of 12

vladimir <rus_lopes@comcast.net> wrote in message
<27823314.1203129850043.JavaMail.jakarta@nitrogen.mathforum.org>...
> I have a probability problem that requires matrix A=[1 1 1 2 2] going through
perms(A) which gives me many identical rows. How do I eliminate all the
identical rowa but one? Answer probably is the easy one, but not for a beginner
like myself. Thanks much.
> Vladimir T

I don't know if its easy, but the answer is
definitely a "unique" one.

unique(B,'rows')

HTH,
John

Subject: Re: eliminate identical raws

From: John D'Errico

Date: 16 Feb, 2008 04:08:03

Message: 3 of 12

vladimir <rus_lopes@comcast.net> wrote in message
<27823314.1203129850043.JavaMail.jakarta@nitrogen.mathforum.org>...
> I have a probability problem that requires matrix A=[1 1 1 2 2] going through
perms(A) which gives me many identical rows. How do I eliminate all the
identical rowa but one? Answer probably is the easy one, but not for a beginner
like myself. Thanks much.
> Vladimir T

I don't know if its easy, but the answer is
definitely a "unique" one.

unique(B,'rows')

HTH,
John

Subject: Re: eliminate identical raws

From: John D'Errico

Date: 16 Feb, 2008 04:08:05

Message: 4 of 12

vladimir <rus_lopes@comcast.net> wrote in message
<27823314.1203129850043.JavaMail.jakarta@nitrogen.mathforum.org>...
> I have a probability problem that requires matrix A=[1 1 1 2 2] going through
perms(A) which gives me many identical rows. How do I eliminate all the
identical rowa but one? Answer probably is the easy one, but not for a beginner
like myself. Thanks much.
> Vladimir T

I don't know if its easy, but the answer is
definitely a "unique" one.

unique(B,'rows')

HTH,
John

Subject: Re: eliminate identical raws

From: John D'Errico

Date: 16 Feb, 2008 04:09:02

Message: 5 of 12

vladimir <rus_lopes@comcast.net> wrote in message
<27823314.1203129850043.JavaMail.jakarta@nitrogen.mathforum.org>...
> I have a probability problem that requires matrix A=[1 1 1 2 2] going through
perms(A) which gives me many identical rows. How do I eliminate all the
identical rowa but one? Answer probably is the easy one, but not for a beginner
like myself. Thanks much.
> Vladimir T

I don't know if its easy, but the answer is
definitely a "unique" one.

unique(B,'rows')

HTH,
John

Subject: Re: eliminate identical raws

From: Roger Stafford

Date: 16 Feb, 2008 15:55:03

Message: 6 of 12

vladimir <rus_lopes@comcast.net> wrote in message
<27823314.1203129850043.JavaMail.jakarta@nitrogen.mathforum.org>...
> I have a probability problem that requires matrix A=[1 1 1 2 2] going
through perms(A) which gives me many identical rows. How do I eliminate all
the identical rowa but one? Answer probably is the easy one, but not for a
beginner like myself. Thanks much.
> Vladimir T
---------
  If you have p ones and q twos, the number of different rows would be

 nCp = (p+q)!/(p!*q!)

where n = p+q. In your example, this would be 5C3 = 10 out of a total of 5!
= 120 produced by 'perms'. With such small numbers, using 'perms' followed
by 'unique' to eliminate repetitions, as was suggested by John, is a good
method.

  However, for larger values of p and q, this method runs into problems. For
example, if p and q were both equal to 7, then 'perms' would generate an
array of some 14! = 87 billion rows which would then have to be narrowed
down to a much smaller 14C7 = 3432 different rows, a rather inefficient
process even if one had enough memory.

  This would suggest that one might consider using matlab's 'nchoosek'
function which creates all possible combinations of p things out of a set of n
things. That is, it directly creates the magic number of nCp different
combinations with nothing that need be eliminated. One way to use
'nchoosek' for this purpose might be the following:

 n = p+q;
 C = nchoosek(1:n,p);
 N = size(C,1); % (N = nCp)
 B = repmat(2,N,n); % Start with all twos
 for ix = 1:N
  B(ix,C(ix,:)) = 1; % Insert p ones into each row
 end

Array B would then have the desired result. (Actually this for-loop could be
eliminated using 'sub2ind' appropriately.)

  A note of caution. Unfortunately I don't have 'nchoosek' to experiment with
on my system, and there is one aspect of it that is worrisome. In their
documentation MathWorks gives the same "practical" limit on the use of
'nchoosek' as it does for 'perms', namely that n ought not to exceed 15. One
would think that 'nchoosek' ought to be able to deal with considerably larger
values of n than this without encountering the difficulties of 'perms'. This
would appear to suggest that perhaps 'nchoosek' also does something crude
like a 'perms' operation followed by a call to 'unique', in which case there
would be no particular advantage to using 'nchoosek' as given above. A huge
array would be created in either case. Can anyone in this group tell me if that
is true?

Roger Stafford

Subject: Re: eliminate identical raws

From: Greg Heath

Date: 16 Feb, 2008 16:29:28

Message: 7 of 12

On Feb 16, 10:55=A0am, "Roger Stafford"
<ellieandrogerxy...@mindspring.com.invalid> wrote:
> vladimir <rus_lo...@comcast.net> wrote in message
>
> <27823314.1203129850043.JavaMail.jaka...@nitrogen.mathforum.org>...> I hav=
e a probability problem that requires matrix A=3D[1 1 1 2 2] going
>
> through perms(A) which gives me many identical rows. How do I eliminate al=
l
> the identical rowa but one? Answer probably is the easy one, but not for a=

> beginner like myself. Thanks much.> Vladimir T
>
> ---------
> =A0 If you have p ones and q twos, the number of different rows would be
>
> =A0nCp =3D (p+q)!/(p!*q!)
>
> where n =3D p+q. =A0In your example, this would be 5C3 =3D 10 out of a tot=
al of 5!
> =3D 120 produced by 'perms'. =A0With such small numbers, using 'perms' fol=
lowed
> by 'unique' to eliminate repetitions, as was suggested by John, is a good
> method.
>
> =A0 However, for larger values of p and q, this method runs into problems.=
 =A0For
> example, if p and q were both equal to 7, then 'perms' would generate an
> array of some 14! =3D 87 billion rows which would then have to be narrowed=

> down to a much smaller 14C7 =3D 3432 different rows, a rather inefficient
> process even if one had enough memory.
>
> =A0 This would suggest that one might consider using matlab's 'nchoosek'
> function which creates all possible combinations of p things out of a set =
of n
> things. =A0That is, it directly creates the magic number of nCp different
> combinations with nothing that need be eliminated. =A0One way to use
> 'nchoosek' for this purpose might be the following:
>
> =A0n =3D p+q;
> =A0C =3D nchoosek(1:n,p);
> =A0N =3D size(C,1); % (N =3D nCp)
> =A0B =3D repmat(2,N,n); % Start with all twos
> =A0for ix =3D 1:N
> =A0 B(ix,C(ix,:)) =3D 1; % Insert p ones into each row
> =A0end
>
> Array B would then have the desired result. =A0(Actually this for-loop cou=
ld be
> eliminated using 'sub2ind' appropriately.)
>
> =A0 A note of caution. =A0Unfortunately I don't have 'nchoosek' to experim=
ent with
> on my system, and there is one aspect of it that is worrisome. =A0In their=

> documentation MathWorks gives the same "practical" limit on the use of
> 'nchoosek' as it does for 'perms', namely that n ought not to exceed 15. =
=A0One
> would think that 'nchoosek' ought to be able to deal with considerably lar=
ger
> values of n than this without encountering the difficulties of 'perms'. =
=A0This
> would appear to suggest that perhaps 'nchoosek' also does something crude
> like a 'perms' operation followed by a call to 'unique', in which case the=
re
> would be no particular advantage to using 'nchoosek' as given above. =A0A =
huge
> array would be created in either case. =A0Can anyone in this group tell me=
 if that
> is true?
>
> Roger Stafford

Neither PERMS or UNIQUE are used in NCHOOSEK. However,
NCHOOSEK calls internal function COMBS which is somehow
defined with a recursive call to itself.

Hope this helps.

Greg

Subject: Re: eliminate identical raws

From: Roger Stafford

Date: 16 Feb, 2008 16:58:02

Message: 8 of 12

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in
message <fp710n$32a$1@fred.mathworks.com>...
> ..........
> A note of caution. Unfortunately I don't have 'nchoosek' to experiment
with
> on my system, and there is one aspect of it that is worrisome. In their
> documentation MathWorks gives the same "practical" limit on the use of
> 'nchoosek' as it does for 'perms', namely that n ought not to exceed 15.
One
> would think that 'nchoosek' ought to be able to deal with considerably
larger
> values of n than this without encountering the difficulties of 'perms'. This
> would appear to suggest that perhaps 'nchoosek' also does something
crude
> like a 'perms' operation followed by a call to 'unique', in which case there
> would be no particular advantage to using 'nchoosek' as given above. A
huge
> array would be created in either case. Can anyone in this group tell me if
that
> is true?
>
> Roger Stafford
---------
  Vladimir, I've just been reading John D'Errico's description of his loop
version of 'nchoosek' at

 <http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?
objectId=13276>

which he calls 'loopchoose'. In it he mentions that the call nchoosek(1:20,10)
takes only 18.35 seconds on his computer. That puts to rest the worries I
mentioned earlier. If it were to first create a 'perms' type output, it would
generate 20! = 2.4e20 rows, which clearly could not have happened in 18.35
seconds nor in any computer known to man. So you can forget what I said in
that last paragraph. MathWorks needs to correct their documentation. You
should be able to use 'nchoosek' effectively for the larger values of p and q.

  For more than two different numbers in your original array, let's say p of
one, q of another, r of another, etc., the total number of distinct rows would
be:

 (p+q+r+...)/(p!*q!*r!*...) .

I think an iterative process could be devised that would solve your problem in
such a case with multiple calls on 'nchoosek'. However, I couldn't tell from
your question just how general a problem you are faced with.

Roger Stafford


Subject: Re: eliminate identical raws

From: us

Date: 16 Feb, 2008 18:16:01

Message: 9 of 12

vladimir:
<SNIP combinatorial evergreen...

one of the other many solutions is outlined below
(including the comparison with a more traditional one as
shown by others)...
timing on a wintel c2.2*2.4/2gb/winxp.sp2/r2008a

us

% the data
     a=[3,3,3,6,6,3,3,7];
% the engine
tic;
     as=sum(a);
     au=unique(a);
     na=numel(a);
     nu=numel(au);
     x=cell(na,1);
     ap=repmat({au},1,na);
     [x{1:na,1}]=ndgrid(ap{:});
     x=reshape(cat(na+1,x{:}),[],na);
     xs=whos('x');
     x=x(sum(x,2)==as,:);
     x=x(:,end:-1:1);
toc
% comparison with a more traditional solution
tic;
     p=perms(a);
     ps=whos('p');
     pu=unique(p,'rows');
toc
% - max temp sizes and <equal>
     disp([xs.bytes;ps.bytes;isequal(x,pu)]);

%{
Elapsed time is 0.051470 seconds. % sol #1
Elapsed time is 0.182210 seconds. % sol #2
      419904 % <- x max size
     2580480 % <- p max size
           1 % <- res are equal
%}

Subject: Re: eliminate identical raws

From: Roger Stafford

Date: 17 Feb, 2008 03:08:01

Message: 10 of 12

"us " <us@neurol.unizh.ch> wrote in message <fp7991$656
$1@fred.mathworks.com>...
> vladimir:
> <SNIP combinatorial evergreen...
>
> one of the other many solutions is outlined below
> (including the comparison with a more traditional one as
> shown by others)...
> timing on a wintel c2.2*2.4/2gb/winxp.sp2/r2008a
>
> us
>
> % the data
> a=[3,3,3,6,6,3,3,7];
> % the engine
> tic;
> as=sum(a);
> au=unique(a);
> na=numel(a);
> nu=numel(au);
> x=cell(na,1);
> ap=repmat({au},1,na);
> [x{1:na,1}]=ndgrid(ap{:});
> x=reshape(cat(na+1,x{:}),[],na);
> xs=whos('x');
> x=x(sum(x,2)==as,:);
> x=x(:,end:-1:1);
> toc
> % comparison with a more traditional solution
> tic;
> p=perms(a);
> ps=whos('p');
> pu=unique(p,'rows');
> toc
> % - max temp sizes and <equal>
> disp([xs.bytes;ps.bytes;isequal(x,pu)]);
>
> %{
> Elapsed time is 0.051470 seconds. % sol #1
> Elapsed time is 0.182210 seconds. % sol #2
> 419904 % <- x max size
> 2580480 % <- p max size
> 1 % <- res are equal
> %}
----------
  I have serious doubts about the above line

 x=x(sum(x,2)==as,:);

in your solution #1, Urs. Suppose that array a = [1,2,3,4]. Presumably, at the
time you do the 'whos' command, 'x' will have 4^4 = 256 rows. Among its
various rows will appear [1,1,4,4], [1,3,3,3], and various rearrangements of
these, all of which have the as = 10 sum required by the above line. Won't
they slip through into your final x? I would predict that 'x' in this case will
have ten extra unwanted rows above the expected twenty four. I think you
need to make your rejection criterion more exacting than a simple sum of the
original elements.

Roger Stafford

Subject: Re: eliminate identical raws

From: us

Date: 17 Feb, 2008 04:13:02

Message: 11 of 12

"Roger Stafford":
<SNIP high noon...

YES... i KNEW one of the very CSSM seniors( :-) ) would
hook on to this (obvious) weakness the moment i clicked the
<post message> button... (seriously)

a more proper solution looks like the following snipplet -
and reveals (once more) that one solution is not
necessarily good for life...

us

put this into a file, eg, <foo.m>

function foo(a)
tic;
     au=unique(a);
     na=numel(a);
     x=cell(na,1);
     ap=repmat({au},1,na);
     [x{1:na,1}]=ndgrid(ap{:});
     x=reshape(cat(na+1,x{:}),[],na);
     xs=whos('x');
     x=x(:,end:-1:1);
     x=x(cellfun(@(x) isequal(x,sort(a)),...
         num2cell(sort(x,2),2),'uni',true),:);
toc
% comparison with a more traditional solution
tic;
     p=perms(a);
     ps=whos('p');
     pu=unique(p,'rows');
toc
% - max temp sizes and <equal>
     disp([xs.bytes;ps.bytes;isequal(x,pu)]);
end

% results of various runs on a wintel os
% c2.2*2.4/2gb/winvista.sp1/r2008a

     foo([3,3,3,6,6,3,3,7]);
% Elapsed time is 0.081403 seconds. OK
% Elapsed time is 0.137693 seconds.
% 419904
% 2580480
% 1
     foo(1:4)
% Elapsed time is 0.029856 seconds. OK
% Elapsed time is 0.058627 seconds.
% 8192
% 768
% 1
     foo(1:7);
% Elapsed time is 10.774500 seconds. !!!!!
% Elapsed time is 0.009430 seconds.
% 46118408
% 282240
% 1
     foo([1,1,1,1,1,1,1,1,1]);
% Elapsed time is 0.004127 seconds. !!!!!
% Elapsed time is 1.924781 seconds.
% 72 !!!!!
% 26127360
% 1
     foo([1,1,1,1,1,2,1,1,1]);
% Elapsed time is 0.072507 seconds.
% Elapsed time is 2.280715 seconds. !!!!!
% 36864
% 26127360
% 1

Subject: Re: eliminate identical raws

From: Roger Stafford

Date: 17 Feb, 2008 07:16:03

Message: 12 of 12

"us " <us@neurol.unizh.ch> wrote in message <fp8c8e$ami
$1@fred.mathworks.com>...
\> YES... i KNEW one of the very CSSM seniors( :-) ) would ..........

  There is another aspect of the above (corrected) solution #1 that bothers
me, Urs, though I admittedly don't have a better alternative ready to put
forth. With some of the larger numbers of elements in array 'a', one comes
up with some discouragingly large arrays for 'x' that have to be greatly
reduced in size. Suppose we have a = [1 2 3 4 5 6 7 7 7 7 7 7]. Then the
number of rows in x before it is reduced would be 7^12, which is a hefty
1.38e10 rows, whereas the number of valid combinations is only 12!/6!,
which is a very much smaller 6.65e5 rows. This is a very severe rejection
process with only one survivor in every twenty thousand. In this case, even
'perms' gives rise to a smaller 12!, which would be only 4.8e8 rows.

  My intuition tells me that for a robust, general solution to this problem
suitable for all cases for which there is a reasonable number of rows in the
final solution, it would be best to use an algorithm which does not require
that any combinations be rejected later on, even if much despised for-loops
have to be used. I envision it as a generalization and adaptation of the
algorithm which is apparently used in the 'nchoosek' function. John D'Errico
has written 'loopchoose', which if I understand it correctly, creates only valid
combinations successively each time it takes a step, with none needing to be
rejected. The idea would be to generalize such methods to any number of
distinct values in the array rather than just the two which, in effect,
'nchoosek' and 'loopchoose' deal with.

Roger Stafford

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
num2cell us 16 Feb, 2008 23:15:03
cellfun us 16 Feb, 2008 23:15:03
sort us 16 Feb, 2008 23:15:03
perms us 16 Feb, 2008 13:20:11
ndgrid us 16 Feb, 2008 13:20:11
evergreen us 16 Feb, 2008 13:20:11
code us 16 Feb, 2008 13:20:11
rssFeed for this Thread

envelope graphic E-mail this page to a colleague

Public Submission Policy
NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.
Related Topics