Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Recoding a vector to string (a big one)

Subject: Recoding a vector to string (a big one)

From: Kirill

Date: 20 May, 2011 21:58:58

Message: 1 of 7

Dear all,
I need to recode vector v into string array given give a recoding
table lev and str. I used to do something like that:
v = [1 1 2 2 3];

lev = [1 2 3];
str = {'a' 'b' 's'};

[~, idx] = ismember(v, lev);
v_recoded = str(idx)

Once number of levels hits hundreds of thousands this becomes too slow
to live with it. I would really prefer to keep lev sorted in advance
so I could reduce time for searching elements within this array. I
looked at the map object and the dataset objects in the statistical
toolbox but I cannot find desired functionality.
Do I need to write custom code or I am missing something here?
Thank you in advance,
Kirill Andreev

Subject: Recoding a vector to string (a big one)

From: Steven_Lord

Date: 23 May, 2011 15:08:04

Message: 2 of 7



"Kirill" <kirillandreev@gmail.com> wrote in message
news:d080f2d7-2789-409e-9dde-b0a471ab89fd@l6g2000vbn.googlegroups.com...
> Dear all,
> I need to recode vector v into string array given give a recoding
> table lev and str. I used to do something like that:
> v = [1 1 2 2 3];
>
> lev = [1 2 3];
> str = {'a' 'b' 's'};
>
> [~, idx] = ismember(v, lev);
> v_recoded = str(idx)
>
> Once number of levels hits hundreds of thousands this becomes too slow
> to live with it. I would really prefer to keep lev sorted in advance
> so I could reduce time for searching elements within this array. I
> looked at the map object and the dataset objects in the statistical
> toolbox but I cannot find desired functionality.
> Do I need to write custom code or I am missing something here?

Are all the levels positive integer values? If so, just use indexing
directly.

str = {'a', 'b', 's'};
v = [1 1 2 2 3];
c = str(v)

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Subject: Recoding a vector to string (a big one)

From: Kirill

Date: 31 May, 2011 21:52:24

Message: 3 of 7

On May 23, 11:08 am, "Steven_Lord" <sl...@mathworks.com> wrote:
> "Kirill" <kirillandr...@gmail.com> wrote in message
>
> news:d080f2d7-2789-409e-9dde-b0a471ab89fd@l6g2000vbn.googlegroups.com...
>
>
>
>
>
> > Dear all,
> > I need to recode vector v into string array given give a recoding
> > table lev and str.   I used to do something like that:
> > v = [1 1 2 2 3];
>
> > lev = [1 2 3];
> > str = {'a' 'b' 's'};
>
> > [~, idx] = ismember(v, lev);
> > v_recoded = str(idx)
>
> > Once number of levels hits hundreds of thousands this becomes too slow
> > to live with it.  I would really prefer to keep lev sorted in advance
> > so I could reduce time for searching elements within this array.  I
> > looked at the map object and the dataset objects in the statistical
> > toolbox but I cannot find desired functionality.
> > Do I need to write custom code or I am missing something here?
>
> Are all the levels positive integer values? If so, just use indexing
> directly.
>
> str = {'a', 'b', 's'};
> v = [1 1 2 2 3];
> c = str(v)
>
> --
> Steve Lord
> sl...@mathworks.com
> To contact Technical Support use the Contact Us link onhttp://www.mathworks.com- Hide quoted text -
>
> - Show quoted text -

Thank you, Steve. The idea is to attach a database-like index to lev
variable to make lookup faster so ismember() function could take
advantage of it. This would be useful for very large lev vectors/
matrices/datasets. Maybe it’s been implemented already in Statistics
Toolbox – I need to check it out.

Kirill

Subject: Recoding a vector to string (a big one)

From: Roger Stafford

Date: 31 May, 2011 23:41:04

Message: 4 of 7

Kirill <kirillandreev@gmail.com> wrote in message <b0478a76-c879-4594-a92f-c202ad701aca@v31g2000vbs.googlegroups.com>...
> Thank you, Steve. The idea is to attach a database-like index to lev
> variable to make lookup faster so ismember() function could take
> advantage of it. This would be useful for very large lev vectors/
> matrices/datasets. Maybe it’s been implemented already in Statistics
> Toolbox – I need to check it out.
>
> Kirill
- - - - - - - - - -
  If Steve's indexing suggestion cannot be achieved, perhaps the following might serve. Instead of using 'lev' as it is, prepare 'u' and 'm' in advance:

 [u,m] = unique(lev);

Then for each vector v you can do this:

 [~,p] = histc(v,u);
 v_recoded = str(m(p));

  This should (in theory at least) be faster than 'ismember', since 'histc' can count on 'u' being in ascending order, whereas in your use of 'ismember' it must assume that 'lev' is unsorted (even if it is actually sorted.)

Roger Stafford

Subject: Recoding a vector to string (a big one)

From: Kirill

Date: 1 Jun, 2011 01:39:45

Message: 5 of 7

On May 31, 7:41 pm, "Roger Stafford"
<ellieandrogerxy...@mindspring.com.invalid> wrote:
> Kirill<kirillandr...@gmail.com> wrote in message <b0478a76-c879-4594-a92f-c202ad701...@v31g2000vbs.googlegroups.com>...
> > Thank you, Steve.  The idea is to attach a database-like index to lev
> > variable to make lookup faster so ismember() function could take
> > advantage of it.  This would be useful for very large lev vectors/
> > matrices/datasets.  Maybe it’s been implemented already in Statistics
> > Toolbox – I need to check it out.
>
> >Kirill
>
> - - - - - - - - - -
>   If Steve's indexing suggestion cannot be achieved, perhaps the following might serve.  Instead of using 'lev' as it is, prepare 'u' and 'm' in advance:
>
>  [u,m] = unique(lev);
>
> Then for each vector v you can do this:
>
>  [~,p] = histc(v,u);
>  v_recoded = str(m(p));
>
>   This should (in theory at least) be faster than 'ismember', since 'histc' can count on 'u' being in ascending order, whereas in your use of 'ismember' it must assume that 'lev' is unsorted (even if it is actually sorted.)
>
> Roger Stafford

Thanks, Roger. This may work -- I will try it.
Kirill

Subject: Recoding a vector to string (a big one)

From: Rune Allnor

Date: 1 Jun, 2011 04:44:13

Message: 6 of 7

On May 20, 11:58 pm, Kirill <kirillandr...@gmail.com> wrote:
> Dear all,
> I need to recode vector v into string array given give a recoding
> table lev and str.   I used to do something like that:
> v = [1 1 2 2 3];
>
> lev = [1 2 3];
> str = {'a' 'b' 's'};
>
> [~, idx] = ismember(v, lev);
> v_recoded = str(idx)
>
> Once number of levels hits hundreds of thousands this becomes too slow
> to live with it.  I would really prefer to keep lev sorted in advance
> so I could reduce time for searching elements within this array.

That's the wrong solution.

The problem statement calls for some sort of search tree.
In C++ I'd use std::map, I have no idea if or how this
could be done in matlab.

So be prepared to either ditch matlab in favour of a
high-performance programming language, or accept the
poor run-time.

Rune

Subject: Recoding a vector to string (a big one)

From: Kirill

Date: 2 Jun, 2011 01:49:49

Message: 7 of 7

On Jun 1, 12:44 am, Rune Allnor <all...@tele.ntnu.no> wrote:
> On May 20, 11:58 pm, Kirill <kirillandr...@gmail.com> wrote:
>
> > Dear all,
> > I need to recode vector v into string array given give a recoding
> > table lev and str.   I used to do something like that:
> > v = [1 1 2 2 3];
>
> > lev = [1 2 3];
> > str = {'a' 'b' 's'};
>
> > [~, idx] = ismember(v, lev);
> > v_recoded = str(idx)
>
> > Once number of levels hits hundreds of thousands this becomes too slow
> > to live with it.  I would really prefer to keep lev sorted in advance
> > so I could reduce time for searching elements within this array.
>
> That's the wrong solution.
>
> The problem statement calls for some sort of search tree.
> In C++ I'd use std::map, I have no idea if or how this
> could be done in matlab.
>
> So be prepared to either ditch matlab in favour of a
> high-performance programming language, or accept the
> poor run-time.
>
> Rune

In Matlab one can actually create map containers that provide a very
fast way for data retrieval (according to documentation, of course).
I used them recently to cache results of sql queries and they worked
as expected. Maybe this would be a road to explore.

Kirill

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us