Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Lookahead Assertions in Regular Expressions

Lookahead Assertions

There are two types of lookaround assertions for regular expressions: lookahead and lookbehind. In both cases, the assertion is a condition that must be satisfied to return a match to the expression.

A lookahead assertion has the form (?=test) and can appear anywhere in a regular expression. MATLAB® looks ahead of the current location in the text for the test condition. If MATLAB matches the test condition, it continues processing the rest of the expression to find a match.

For example, look ahead in a character vector specifying a path to find the name of the folder that contains a program file (in this case, fileread.m).

chr = which('fileread')
chr =
   matlabroot\toolbox\matlab\iofun\fileread.m
regexp(chr,'\w+(?=\\\w+\.[mp])','match')
ans = 
    'iofun'

The match expression, \w+, searches for one or more alphanumeric or underscore characters. Each time regexp finds a term that matches this condition, it looks ahead for a backslash (specified with two backslashes, \\), followed by a file name (\w+) with an .m or .p extension (\.[mp]). The regexp function returns the match that satisfies the lookahead condition, which is the folder name iofun.

Overlapping Matches

Lookahead assertions do not consume any characters in the text. As a result, you can use them to find overlapping character sequences.

For example, use lookahead to find every sequence of six nonwhitespace characters in a character vector by matching initial characters that precede five additional characters:

chr = 'Locate several 6-char. phrases';
startIndex = regexpi(chr,'\S(?=\S{5})')
startIndex =
     1     8     9    16    17    24    25

The starting indices correspond to these phrases:

Locate   severa   everal   6-char   -char.   phrase   hrases

Without the lookahead operator, MATLAB parses a character vector from left to right, consuming the vector as it goes. If matching characters are found, regexp records the location and resumes parsing the character vector from the location of the most recent match. There is no overlapping of characters in this process.

chr = 'Locate several 6-char. phrases';
startIndex = regexpi(chr,'\S{6}')
startIndex =
     1     8    16    24

The starting indices correspond to these phrases:

Locate   severa   6-char   phrase

Logical AND Conditions

Another way to use a lookahead operation is to perform a logical AND between two conditions. This example initially attempts to locate all lowercase consonants in a character array consisting of the first 50 characters of the help for the normest function:

helptext = help('normest');
chr = helptext(1:50)
chr =
 NORMEST Estimate the matrix 2-norm.
    NORMEST(S

Merely searching for non-vowels ([^aeiou]) does not return the expected answer, as the output includes capital letters, space characters, and punctuation:

c = regexp(chr,'[^aeiou]','match')
c = 
  Columns 1 through 14

    ' '    'N'    'O'    'R'    'M'    'E'    'S'    'T'    ' '    
        'E'    's'    't'    'm'    't'
  ...

Try this again, using a lookahead operator to create the following AND condition:

(lowercase letter) AND (not a vowel)

This time, the result is correct:

c = regexp(chr,'(?=[a-z])[^aeiou]','match')
c = 
  's'  't'  'm '  't'  't'  'h'  'm'  't'  'r'  'x'
     'n'  'r'  'm'

Note that when using a lookahead operator to perform an AND, you need to place the match expression expr after the test expression test:

(?=test)expr or (?!test)expr

See Also

| |

More About

Was this topic helpful?