Path: news.mathworks.com!newsfeed-00.mathworks.com!newsfeed2.dallas1.level3.net!news.level3.com!postnews.google.com!t33g2000yqe.googlegroups.com!not-for-mail
From: arun <aragorn168b@gmail.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: manipulating strings
Date: Tue, 7 Jul 2009 10:30:14 -0700 (PDT)
Organization: http://groups.google.com
Lines: 62
Message-ID: <d0f2fd13-6e88-4512-b490-fc81c0dd03c2@t33g2000yqe.googlegroups.com>
References: <87c67726-964b-48ce-80f0-a50d24b62cd1@26g2000yqk.googlegroups.com> 
	<h2vf0h$jqu$1@fred.mathworks.com>
NNTP-Posting-Host: 192.124.26.250
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1246987814 31267 127.0.0.1 (7 Jul 2009 17:30:14 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Tue, 7 Jul 2009 17:30:14 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: t33g2000yqe.googlegroups.com; posting-host=192.124.26.250; 
	posting-account=fyqXpgoAAABqt-0BifyaNxmZhzggFACu
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1) 
	Gecko/20090624 Firefox/3.5,gzip(gfe),gzip(gfe)
Xref: news.mathworks.com comp.soft-sys.matlab:553488

On Jul 7, 2:28 pm, "nor ki" <kinor.remov...@gmx.de> wrote:
> arun <aragorn1...@gmail.com> wrote in message <87c67726-964b-48ce-80f0-a50d24b62...@26g2000yqk.googlegroups.com>...
> > Hi,
>
> > suppose I have astringA whose size is 1*10^7. I would now like to
> > remove certain characters in thestring. I tried strfind and regexprep
> > as follows
>
> > A(strfind(A, ',')) = ''; %replace entries with a comma with nothing
> > and then i repeat this for all numbers from 0 to 9 and for "space".
>
> > Alternative efficient way i hoped would be,
> > A = regexprep(A, "[0-9, ]", '');
> > but the first expression takes for ever as the vector is long and the
> > second one strangely gives me "out of memory" error...
>
> > any ways to speed up?
>
> > thank you very much,
> > arun.
>
> Hi Arun,
> as you only look for single characters you could build a lookup table of type logical which contains true for each of the desired characters and false for the characters which should be removed.
> call this one just lut
>
> then you make an array for the positions of your desired characters:
>
> idx = lut(A);
>
> and get them back in A
>
> A = A(idx);
>
> or in short:
>
> A = A(lut(A));
>
> hth
> kinor

Hi Kinor,

Thank you for the suggestion. I just have some trouble understanding
how to construct this lut. Is it like a map? because I have to know
this character has a true and this character has a false...

suppose A = "1,1600,A,G,G,G,A,A,A,G,A,A,G";

and here I don't need the comma, and the numbers 1 and 1600, that is,
the desired string is A = "AGGGAAAGAAG"
if i don't have a map, then my look up table should consist of values
for all entries, right? I don't think you suggested that way.... I
mean,

lut = [0,0,0,0,0,0,0,1,0,1,0,1...] and then use A = lut(A)...
is this what you suggested?
thank you very much,
arun.