Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
speed up for long list matching

Subject: speed up for long list matching

From: Eric

Date: 6 Jan, 2012 03:36:08

Message: 1 of 7

Is there a better way to do this as the length of idtsall is more than 30k?

                for n = 1:length(idtsall)
                                [found, ind] = ismember(idts, idtsall{n});
                                if found
                                        ranktsall(n) = rankts(ind);
                                else
                                        ranktsall(n) = 0;
                                end
                end

Could we have something like this..

ranktsall = zeros(length(idtsall), 1);
[idx, val] = matching(idts, idtsall); % I have no idea about this part
ranktsall(idx) = val;

idts is a subset of idtsall.
Can Somebody help? Thanks in advance & Happy New Year!

Subject: speed up for long list matching

From: Bruno Luong

Date: 6 Jan, 2012 10:48:08

Message: 2 of 7

"Eric" wrote in message <je5q78$cvf$1@newscl01ah.mathworks.com>...
> Is there a better way to do this as the length of idtsall is more than 30k?
>
> for n = 1:length(idtsall)
> [found, ind] = ismember(idts, idtsall{n});
> if found
> ranktsall(n) = rankts(ind);
> else
> ranktsall(n) = 0;
> end
> end
>
> Could we have something like this..
>
> ranktsall = zeros(length(idtsall), 1);
> [idx, val] = matching(idts, idtsall); % I have no idea about this part
> ranktsall(idx) = val;
>
> idts is a subset of idtsall.

It would be helpful if you explain "idts is a subset of idtsall". As I understand, idrsall is a family of sets and not _one_ set. The variable "found" is an array, what test "if found ..." suppose to do? Similar question for "ind", which could be an array. What is "rankts"?

Bruno

Subject: speed up for long list matching

From: Eric

Date: 9 Jan, 2012 04:09:10

Message: 3 of 7

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <je6jh8$nut$1@newscl01ah.mathworks.com>...
> "Eric" wrote in message <je5q78$cvf$1@newscl01ah.mathworks.com>...
> > Is there a better way to do this as the length of idtsall is more than 30k?
> >
> > for n = 1:length(idtsall)
> > [found, ind] = ismember(idts, idtsall{n});
> > if found
> > ranktsall(n) = rankts(ind);
> > else
> > ranktsall(n) = 0;
> > end
> > end
> >
> > Could we have something like this..
> >
> > ranktsall = zeros(length(idtsall), 1);
> > [idx, val] = matching(idts, idtsall); % I have no idea about this part
> > ranktsall(idx) = val;
> >
> > idts is a subset of idtsall.
>
> It would be helpful if you explain "idts is a subset of idtsall". As I understand, idrsall is a family of sets and not _one_ set. The variable "found" is an array, what test "if found ..." suppose to do? Similar question for "ind", which could be an array. What is "rankts"?
>
> Bruno

Thanks, Bruno.
"idtsall" has all the id's. For example,
001
003
004
010
...

and "idts" has part of the id's in "idtsall". For example,
001
004
010
...

I understand the variable "found" and "ind" could be array. But there will be only one matching in my case, so I treat them as an ordinary variable, not array.
"rankts" is a variable with values of double, corresponding pair to "idts".
My code now is as follows:
                 ranktsall = zeros(length(idtsall), 1);
                 for n = 1:length(idtsall)
                                 [found, ind] = ismember(idts, idtsall{n});
                                 if found
                                         ranktsall(n) = rankts(ind);
                                 end
                 end

But I'm wondering whether we can speed it up as follows:
    ranktsall = zeros(length(idtsall), 1);
    [idx, val] = matching(idts, idtsall); % I have no idea about this part
    ranktsall(idx) = val;

Thanks again.

Subject: speed up for long list matching

From: Eric

Date: 9 Jan, 2012 04:25:09

Message: 4 of 7

"Eric" wrote in message <jedp96$8ti$1@newscl01ah.mathworks.com>...
> "Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <je6jh8$nut$1@newscl01ah.mathworks.com>...
> > "Eric" wrote in message <je5q78$cvf$1@newscl01ah.mathworks.com>...
> > > Is there a better way to do this as the length of idtsall is more than 30k?
> > >
> > > for n = 1:length(idtsall)
> > > [found, ind] = ismember(idts, idtsall{n});
> > > if found
> > > ranktsall(n) = rankts(ind);
> > > else
> > > ranktsall(n) = 0;
> > > end
> > > end
> > >
> > > Could we have something like this..
> > >
> > > ranktsall = zeros(length(idtsall), 1);
> > > [idx, val] = matching(idts, idtsall); % I have no idea about this part
> > > ranktsall(idx) = val;
> > >
> > > idts is a subset of idtsall.
> >
> > It would be helpful if you explain "idts is a subset of idtsall". As I understand, idrsall is a family of sets and not _one_ set. The variable "found" is an array, what test "if found ..." suppose to do? Similar question for "ind", which could be an array. What is "rankts"?
> >
> > Bruno
>
> Thanks, Bruno.
> "idtsall" has all the id's. For example,
> 001
> 003
> 004
> 010
> ...
>
> and "idts" has part of the id's in "idtsall". For example,
> 001
> 004
> 010
> ...
>
> I understand the variable "found" and "ind" could be array. But there will be only one matching in my case, so I treat them as an ordinary variable, not array.
> "rankts" is a variable with values of double, corresponding pair to "idts".
> My code now is as follows:
> ranktsall = zeros(length(idtsall), 1);
> for n = 1:length(idtsall)
> [found, ind] = ismember(idts, idtsall{n});
> if found
> ranktsall(n) = rankts(ind);
> end
> end
>
> But I'm wondering whether we can speed it up as follows:
> ranktsall = zeros(length(idtsall), 1);
> [idx, val] = matching(idts, idtsall); % I have no idea about this part
> ranktsall(idx) = val;
>
> Thanks again.

Sorry. I made a mistake. The code now is as follows:
                  ranktsall = zeros(length(idtsall), 1);
                  for n = 1:length(idtsall)
                                  [found, ind] = ismember(idtsall{n}, idts);
                                  if found
                                          ranktsall(n) = rankts(ind);
                                  end
                  end

Subject: speed up for long list matching

From: Bruno Luong

Date: 9 Jan, 2012 08:15:09

Message: 5 of 7

I'm confused, if idtsall and idts are cell of strings, why bother with the for-loop and not call ISMEMBER only once?

idtsall = {'001'
'003';
'004';
'010' }

idts = {'001';
'004';
'010' }

rankts = 1:length(idts);

% Engine
[found ind] = ismember(idtsall, idts);
ranktsall = zeros(length(idtsall), 1);
ranktsall(found) = rankts(ind(found))

% Bruno

Subject: speed up for long list matching

From: Eric

Date: 9 Jan, 2012 09:30:10

Message: 6 of 7

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <jee7md$jhb$1@newscl01ah.mathworks.com>...
> I'm confused, if idtsall and idts are cell of strings, why bother with the for-loop and not call ISMEMBER only once?
>
> idtsall = {'001'
> '003';
> '004';
> '010' }
>
> idts = {'001';
> '004';
> '010' }
>
> rankts = 1:length(idts);
>
> % Engine
> [found ind] = ismember(idtsall, idts);
> ranktsall = zeros(length(idtsall), 1);
> ranktsall(found) = rankts(ind(found))
>
> % Bruno

Dear Bruno, you are right. I'm the confused one. :D Thank you so much.

Subject: speed up for long list matching

From: Rune Allnor

Date: 9 Jan, 2012 10:24:33

Message: 7 of 7

On 9 Jan, 05:09, "Eric " <chunc...@gmail.com> wrote:
> "Bruno Luong" <b.lu...@fogale.findmycountry> wrote in message <je6jh8$nu...@newscl01ah.mathworks.com>...
> > "Eric" wrote in message <je5q78$cv...@newscl01ah.mathworks.com>...
> > > Is there a better way to do this as the length of idtsall is more than 30k?
>
> > >                 for n = 1:length(idtsall)
> > >                                 [found, ind] = ismember(idts, idtsall{n});
> > >                                 if found
> > >                                         ranktsall(n) = rankts(ind);
> > >                                 else
> > >                                         ranktsall(n) = 0;
> > >                                 end
> > >                 end
>
> > > Could we have something like this..
>
> > > ranktsall = zeros(length(idtsall), 1);
> > > [idx, val] = matching(idts, idtsall);        % I have no idea about this part
> > > ranktsall(idx) = val;
>
> > > idts is a subset of idtsall.
>
> > It would be helpful if you explain "idts is a subset of idtsall". As I understand, idrsall is a family of sets and not _one_ set. The variable "found" is an array, what test "if found ..." suppose to do? Similar question for "ind", which could be an array. What is "rankts"?
>
> > Bruno
>
> Thanks, Bruno.
> "idtsall" has all the id's. For example,
> 001
> 003
> 004
> 010
> ...
>
> and "idts" has part of the id's in "idtsall". For example,
> 001
> 004
> 010
> ...
>
> I understand the variable "found" and "ind" could be array. But there will be only one matching in my case, so I treat them as an ordinary variable, not array.
> "rankts" is a variable with values of double, corresponding pair to "idts".
> My code now is as follows:
>                  ranktsall = zeros(length(idtsall), 1);
>                  for n = 1:length(idtsall)
>                                  [found, ind] = ismember(idts, idtsall{n});
>                                  if found
>                                          ranktsall(n) = rankts(ind);
>                                  end
>                  end
>
> But I'm wondering whether we can speed it up as follows:
>     ranktsall = zeros(length(idtsall), 1);
>     [idx, val] = matching(idts, idtsall);        % I have no idea about this part
>     ranktsall(idx) = val;

Yes, it can be sped up considerably. This is a standard
student excercise in intro classes on proramming.

If the rows aren't already sorted, do that first.
Then start with the first element in the reference
series and scan forward in the data series till
you either find the reference element, or an element
that is after it in the sorted sequence. Then select
the next element in the refrence sequence, and
repeat the search but now starting from the present
location in the data series.

This is the basic idea behind functions like INTERSECT
and SETDIFF.

Rune

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us