Skip to Main Content Skip to Search
Login
File Exchange
MATLAB Newsgroup
Link Exchange
  Blogs  
 Contest 
MathWorks.com

Thread Subject: REGEXP: no AND in sight?

Subject: REGEXP: no AND in sight?

From: Dimitri Shvorob

Date: 11 Jul, 2007 07:40:53

Message: 1 of 6

.'Logical operations' page of REGEXP documentation lists OR, but not AND! Is it true that one cannot combine patterns in AND (hence AND NOT) fashion in Matlab ? If not, can anybody suggest how I could code a (silly but simple) pattern like 'a or b or c, but not c'? [abc]..?

Thank you.

Subject: Re: REGEXP: no AND in sight?

From: Yair Altman

Date: 11 Jul, 2007 04:05:14

Message: 2 of 6

Dimitri Shvorob wrote:
>
>
> .'Logical operations' page of REGEXP documentation lists OR, but
> not AND! Is it true that one cannot combine patterns in AND (hence
> AND NOT) fashion in Matlab ? If not, can anybody suggest how I
> could code a (silly but simple) pattern like 'a or b or c, but not
> c'? [abc]..?
>
> Thank you.
  

Regular Expressions are not a Matlab invention but a well-known
standard for STRING-based (=expression) processing, existing in the
programming world for several decades. Being string-based, the
standard does not need the AND construct and therefore has one
(namely, '|') defined only for OR. For AND, there's simple
concatenation with wildcards. Here's an example:

str = 'your name is Dimitri Shvorob';
index = regexp(str,'Dimitri|Dima'); % => index=14, OR
index = regexp(str,'name.*Dimitri'); % => index=6, AND

Yair Altman
 <http://www.ymasoftware.com>

Subject: Re: REGEXP: no AND in sight?

From: Jason Breslau

Date: 11 Jul, 2007 12:24:20

Message: 3 of 6

You can simulate AND in regular expressions using lookaround operators.

For example, to match consonant letters, you could list them
individually in a set:

>> regexp('this is text', '[bcdfghjklmnpqrstvwxyz]', 'match')

ans =

    't' 'h' 's' 's' 't' 'x' 't'



Which is clunky. It is easier to match not vowels, since there are less
of them:

>> regexp('this is text', '[^aeiou]', 'match')

ans =

    't' 'h' 's' ' ' 's' ' ' 't' 'x' 't'

But that doesn't work, since it includes characters that are not
letters. So use lookahead to ensure that the next character is a
letter, before checking to see that it isn't a vowel:

>> regexp('this is text', '(?=[a-z])[^aeiou]', 'match')

ans =

    't' 'h' 's' 's' 't' 'x' 't'

Hope that helps,

-=>J

Subject: REGEXP: no AND in sight?

From: Dimitri Shvorob

Date: 12 Jul, 2007 08:03:08

Message: 4 of 6

Yair, thank you for help. I must admit I am not sure in what sense the second example
index = regexp(str,'name.*Dimitri');
implements AND.
I can check a string's correspondence to n > 1 patterns with something like
if regexp(string,pattern1) & regexp(string,pattern2) ...
The difficulty is with *selecting* text, as in
regexp(string,pattern,'match')
The task that I am working on is selecting all e-mail addresses, except, say, bill.gates@microsoft.com, from a text file. The command immediately above, supplied with a half-decent pattern, gives me a cell array of strings, containing the matches. I need to loop through it, and eliminate any occurrences of bill.gates@microsoft.com. No big deal, but it would be much nicer to skip them in the first place.

Subject: REGEXP: no AND in sight?

From: Yair Altman

Date: 12 Jul, 2007 09:39:13

Message: 5 of 6

No need to loop. Here's a simple vectorized one-line filter (assume your positive matches are stored in cell array c):

c(~cellfun('isempty',regexpi(c,'bill.gates@microsoft.com')))=[];

Yair Altman
http://www.ymasoftware.com

Subject: REGEXP: no AND in sight?

From: Dimitri Shvorob

Date: 12 Jul, 2007 10:14:00

Message: 6 of 6

:) Thank you very much.

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
regexp Matthew Simoneau 11 Jul, 2007 13:26:50
regexp Stephen Lienhard 11 Jul, 2007 07:25:32
regexp regular expression Dimitri Shvorob 11 Jul, 2007 03:45:05
rssFeed for this Thread

envelope graphic E-mail this page to a colleague

Public Submission Policy
NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.
Related Topics