regexp

Match regular expression (case sensitive)

Syntax

startIndex = regexp(str,expression)

[startIndex,endIndex]
= regexp(str,expression)

out = regexp(str,expression,outkey)

[out1,...,outN]
= regexp(str,expression,outkey1,...,outkeyN)

___ = regexp(___,option1,...,optionM)

___ = regexp(___,'forceCellOutput')

Description

startIndex = regexp(str,expression) returns the starting index of each substring of str that matches the character patterns specified by the regular expression. If there are no matches, startIndex is an empty array. If there are substrings that match overlapping pieces of text, only the index of the first match will be returned.

example

[startIndex,endIndex] = regexp(str,expression) returns the starting and ending indices of all matches.

out = regexp(str,expression,outkey) returns the output specified by outkey. For example, if outkey is 'match', then regexp returns the substrings that match the expression rather than their starting indices.

example

[out1,...,outN] = regexp(str,expression,outkey1,...,outkeyN) returns the outputs specified by multiple output keywords, in the specified order. For example, if you specify 'match','tokens', then regexp returns substrings that match the entire expression and tokens that match parts of the expression.

example

___ = regexp(___,option1,...,optionM) modifies the search using the specified option flags. For example, specify 'ignorecase' to perform a case-insensitive match. You can include any of the inputs and request any of the outputs from previous syntaxes.

example

___ = regexp(___,'forceCellOutput') returns each output argument as a scalar cell. The cells contain the numeric arrays or substrings that are described as the outputs of the previous syntaxes. You can include any of the inputs and request any of the outputs from previous syntaxes.

example

Examples

collapse all

Find Patterns in Text

Open Live Script

Find words that start with c, end with t, and contain one or more vowels between them.

str = 'bat cat can car coat court CUT ct CAT-scan';
expression = 'c[aeiou]+t';
startIndex = regexp(str,expression)

startIndex = 1×2

     5    17

The regular expression 'c[aeiou]+t' specifies this pattern:

c must be the first character.
c must be followed by one of the characters inside the brackets, [aeiou].
The bracketed pattern must occur one or more times, as indicated by the + operator.
t must be the last character, with no characters between the bracketed pattern and the t.

Values in startIndex indicate the index of the first character of each word that matches the regular expression. The matching word cat starts at index 5, and coat starts at index 17. The words CUT and CAT do not match because they are uppercase.

Find Patterns in Multiple Pieces of Text

Open Live Script

Find the location of capital letters and spaces within character vectors in a cell array.

str = {'Madrid, Spain','Romeo and Juliet','MATLAB is great'};
capExpr = '[A-Z]';
spaceExpr = '\s';

capStartIndex = regexp(str,capExpr);
spaceStartIndex = regexp(str,spaceExpr);

capStartIndex and spaceStartIndex are cell arrays because the input str is a cell array.

View the indices for the capital letters.

celldisp(capStartIndex)

 
capStartIndex{1} =
 
     1     9

 
 
capStartIndex{2} =
 
     1    11

 
 
capStartIndex{3} =
 
     1     2     3     4     5     6

View the indices for the spaces.

celldisp(spaceStartIndex)

 
spaceStartIndex{1} =
 
     8

 
 
spaceStartIndex{2} =
 
     6    10

 
 
spaceStartIndex{3} =
 
     7    10

Return Substrings Using match Keyword

Open Live Script

Capture words within a character vector that contain the letter x.

str = 'EXTRA! The regexp function helps you relax.';
expression = '\w*x\w*';
matchStr = regexp(str,expression,'match')

matchStr = 1×2 cell
    {'regexp'}    {'relax'}

The regular expression '\w*x\w*' specifies that the character vector:

Begins with any number of alphanumeric or underscore characters, \w*.
Contains the lowercase letter x.
Ends with any number of alphanumeric or underscore characters after the x, including none, as indicated by \w*.

Split Text at Delimiter Using split Keyword

Open Live Script

Split a character vector into several substrings, where each substring is delimited by a ^ character.

str = ['Split ^this text into ^several pieces'];
expression = '\^';
splitStr = regexp(str,expression,'split')

splitStr = 1×3 cell
    {'Split '}    {'this text into '}    {'several pieces'}

Because the caret symbol has special meaning in regular expressions, precede it with the escape character, a backslash (\). To split a character vector at other delimiters, such as a semicolon, you do not need to include the backslash.

Return Both Matching and Nonmatching Substrings

Open Live Script

Capture parts of a character vector that match a regular expression using the 'match' keyword, and the remaining parts that do not match using the 'split' keyword.

str = 'She sells sea shells by the seashore.';
expression = '[Ss]h.';
[match,noMatch] = regexp(str,expression,'match','split')

match = 1×3 cell
    {'She'}    {'she'}    {'sho'}

noMatch = 1×4 cell
    {0×0 char}    {' sells sea '}    {'lls by the sea'}    {'re.'}

The regular expression '[Ss]h.' specifies that:

S or s is the first character.
h is the second character.
The third character can be anything, including a space, as indicated by the dot (.).

When the first (or last) character in a character vector matches a regular expression, the first (or last) return value from the 'split' keyword is an empty character vector.

Optionally, reassemble the original character vector from the substrings.

combinedStr = strjoin(noMatch,match)

combinedStr = 
'She sells sea shells by the seashore.'

Capture Substrings of Matches Using Ordinal Tokens

Open Live Script

Find the names of HTML tags by defining a token within a regular expression. Tokens are indicated with parentheses, ().

str = '<title>My Title</title><p>Here is some text.</p>';
expression = '<(\w+).*>.*</\1>';
[tokens,matches] = regexp(str,expression,'tokens','match');

The regular expression <(\w+).*>.*</\1> specifies this pattern:

<(\w+) finds an opening angle bracket followed by one or more alphanumeric or underscore characters. Enclosing \w+ in parentheses captures the name of the HTML tag in a token.
.*> finds any number of additional characters, such as HTML attributes, and a closing angle bracket.
</\1> finds the end tag corresponding to the first token (indicated by \1). The end tag has the form </tagname>.

View the tokens and matching substrings.

celldisp(tokens)

 
tokens{1}{1} =
 
title
 
 
tokens{2}{1} =
 
p

celldisp(matches)

 
matches{1} =
 
<title>My Title</title>
 
 
matches{2} =
 
<p>Here is some text.</p>

Capture Substrings of Matches Using Named Tokens

Open Live Script

Parse dates that can appear with either the day or the month first, in these forms: mm/dd/yyyy or dd-mm-yyyy. Use named tokens to identify each part of the date.

str = '01/11/2000  20-02-2020  03/30/2000  16-04-2020';
expression = ['(?<month>\d+)/(?<day>\d+)/(?<year>\d+)|'...
              '(?<day>\d+)-(?<month>\d+)-(?<year>\d+)'];
tokenNames = regexp(str,expression,'names');

The regular expression specifies this pattern:

(?<name>\d+) finds one or more numeric digits and assigns the result to the token indicated by name.
| is the logical or operator, which indicates that there are two possible patterns for dates. In the first pattern, slashes (/) separate the tokens. In the second pattern, hyphens (-) separate the tokens.

View the named tokens.

for k = 1:length(tokenNames)
   disp(tokenNames(k))
end

    month: '01'
      day: '11'
     year: '2000'

    month: '02'
      day: '20'
     year: '2020'

    month: '03'
      day: '30'
     year: '2000'

    month: '04'
      day: '16'
     year: '2020'

Perform Case-Insensitive Matches

Open Live Script

Find both uppercase and lowercase instances of a word.

By default, regexp performs case-sensitive matching.

str = 'A character vector with UPPERCASE and lowercase text.';
expression = '\w*case';
matchStr = regexp(str,expression,'match')

matchStr = 1×1 cell array
    {'lowercase'}

The regular expression specifies that the character vector:

Begins with any number of alphanumeric or underscore characters, \w*.
Ends with the literal text case.

The regexpi function uses the same syntax as regexp, but performs case-insensitive matching.

matchWithRegexpi = regexpi(str,expression,'match')

matchWithRegexpi = 1×2 cell
    {'UPPERCASE'}    {'lowercase'}

Alternatively, disable case-sensitive matching for regexp using the 'ignorecase' option.

matchWithIgnorecase = regexp(str,expression,'match','ignorecase')

matchWithIgnorecase = 1×2 cell
    {'UPPERCASE'}    {'lowercase'}

For multiple expressions, disable case-sensitive matching for selected expressions using the (?i) search flag.

expression = {'(?-i)\w*case';...
              '(?i)\w*case'};
matchStr = regexp(str,expression,'match');
celldisp(matchStr)

 
matchStr{1}{1} =
 
lowercase
 
 
matchStr{2}{1} =
 
UPPERCASE
 
 
matchStr{2}{2} =
 
lowercase

Parse Text with Newline Characters

Open Live Script

Create a character vector that contains a newline, \n, and parse it using a regular expression. Since regexp returns matchStr as a cell array containing text that has multiple lines, you can take the text out of the cell array to display all lines.

str = sprintf('abc\n de');
expression = '.*';
matchStr = regexp(str,expression,'match');
matchStr{:}

ans = 
    'abc
      de'

By default, the dot (.) matches every character, including the newline, and returns a single match that is equivalent to the original character vector.

Exclude newline characters from the match using the 'dotexceptnewline' option. This returns separate matches for each line of text.

matchStrNoNewline = regexp(str,expression,'match','dotexceptnewline')

matchStrNoNewline = 1×2 cell
    {'abc'}    {' de'}

Find the first or last character of each line using the ^ or $ metacharacters and the 'lineanchors' option.

expression = '.$';
lastInLine = regexp(str,expression,'match','lineanchors')

lastInLine = 1×2 cell
    {'c'}    {'e'}

Return Matches in Cell

Open Live Script

Find matches within a piece of text and return the output in a scalar cell.

Find words that start with c, end with t, and contain one or more vowels between them. Return the starting indices in a scalar cell.

str = 'bat cat can car coat court CUT ct CAT-scan';
expression = 'c[aeiou]+t';
startIndex = regexp(str,expression,'forceCellOutput')

startIndex = 1×1 cell array
    {[5 17]}

To access the starting indices as a numeric array, index into the cell.

startIndex{1}

ans = 1×2

     5    17

Return the matching and nonmatching substrings. Each output is in its own scalar cell.

[match,noMatch] = regexp(str,expression,'match','split','forceCellOutput')

match = 1×1 cell array
    {1×2 cell}

noMatch = 1×1 cell array
    {1×3 cell}

To access the array of matches, index into match.

match{1}

ans = 1×2 cell
    {'cat'}    {'coat'}

To access the substrings that do not match, index into noMatch.

noMatch{1}

ans = 1×3 cell
    {'bat '}    {' can car '}    {' court CUT ct CAT-scan'}

Input Arguments

collapse all

`str` — Input text
character vector | cell array of character vectors | string array

Input text, specified as a character vector, a cell array of character vectors, or a string array. Each character vector in a cell array, or each string in a string array, can be of any length and contain any characters.

If str and expression are string arrays or cell arrays, they must have the same dimensions.

Data Types: string | char | cell

`expression` — Regular expression
character vector | cell array of character vectors | string array

Regular expression, specified as a character vector, a cell array of character vectors, or a string array. Each expression can contain characters, metacharacters, operators, tokens, and flags that specify patterns to match in str.

The following tables describe the elements of regular expressions.

Metacharacters

Metacharacters represent letters, letter ranges, digits, and space characters. Use them to construct a generalized pattern of characters.

Metacharacter	Description	Example
`.`	Any single character, including white space	`'..ain'` matches sequences of five consecutive characters that end with `'ain'`.
`[c₁c₂c₃]`	Any character contained within the square brackets. The following characters are treated literally: `$ \| . * + ?` and `-` when not used to indicate a range.	`'[rp.]ain'` matches `'rain'` or `'pain'` or `'.ain'`.
`[^c₁c₂c₃]`	Any character not contained within the square brackets. The following characters are treated literally: `$ \| . * + ?` and `-` when not used to indicate a range.	`'[^rp]ain'` matches all four-letter sequences that end in `'ain'`, except `'rain'` and `'pain'` and `'ain'`. For example, it matches `'gain'`, `'lain'`, or `'vain'`.
`[c`₁`-c`₂`]`	Any character in the range of `c`₁ through `c`₂	`'[A-G]'` matches a single character in the range of `A` through `G`.
`\w`	Any alphabetic, numeric, or underscore character. For English character sets, `\w` is equivalent to `[a-zA-Z_0-9]`	`'\w*'` identifies a word comprised of any grouping of alphabetic, numeric, or underscore characters.
`\W`	Any character that is not alphabetic, numeric, or underscore. For English character sets, `\W` is equivalent to `[^a-zA-Z_0-9]`	`'\W*'` identifies a term that is not a word comprised of any grouping of alphabetic, numeric, or underscore characters.
`\s`	Any white-space character; equivalent to `[ \f\n\r\t\v]`	`'\w*n\s'` matches words that end with the letter `n`, followed by a white-space character.
`\S`	Any non-white-space character; equivalent to `[^ \f\n\r\t\v]`	`'\d\S'` matches a numeric digit followed by any non-white-space character.
`\d`	Any numeric digit; equivalent to `[0-9]`	`'\d*'` matches any number of consecutive digits.
`\D`	Any nondigit character; equivalent to `[^0-9]`	`'\w*\D\>'` matches words that do not end with a numeric digit.
`\oN` or `\o{N}`	Character of octal value `N`	`'\o{40}'` matches the space character, defined by octal `40`.
`\xN` or `\x{N}`	Character of hexadecimal value `N`	`'\x2C'` matches the comma character, defined by hex `2C`.

Character Representation

Operator	Description
`\a`	Alarm (beep)
`\b`	Backspace
`\f`	Form feed
`\n`	New line
`\r`	Carriage return
`\t`	Horizontal tab
`\v`	Vertical tab
`\char`	Any character with special meaning in regular expressions that you want to match literally (for example, use `\\` to match a single backslash)

Quantifiers

Quantifiers specify the number of times a pattern must occur in the matching text.

Quantifier	Number of Times Expression Occurs	Example
`expr*`	0 or more times consecutively.	`'\w*'` matches a word of any length.
`expr?`	0 times or 1 time.	`'\w*(\.m)?'` matches words that optionally end with the extension `.m`.
`expr+`	1 or more times consecutively.	`'<img src="\w+\.gif">'` matches an `<img>` HTML tag when the file name contains one or more characters.
`expr{m,n}`	At least `m` times, but no more than `n` times consecutively. `{0,1}` is equivalent to `?`.	`'\S{4,8}'` matches between four and eight non-white-space characters.
`expr{m,}`	At least `m` times consecutively. `{0,}` and `{1,}` are equivalent to `*` and `+`, respectively.	`'<a href="\w{1,}\.html">'` matches an `<a>` HTML tag when the file name contains one or more characters.
`expr{n}`	Exactly `n` times consecutively. Equivalent to `{n,n}`.	`'\d{4}'` matches four consecutive digits.

Quantifiers can appear in three modes, described in the following table. q represents any of the quantifiers in the previous table.

Mode	Description	Example
`expr`q	Greedy expression: match as many characters as possible.	Given the text `'<tr><td><p>text</p></td>'`, the expression `'</?t.*>'` matches all characters between `<tr` and `/td>`: `'<tr><td><p>text</p></td>'`
`expr`q`?`	Lazy expression: match as few characters as necessary.	Given the text`'<tr><td><p>text</p></td>'`, the expression `'</?t.*?>'` ends each match at the first occurrence of the closing angle bracket (`>`): `'<tr>' '<td>' '</td>'`
`expr`q+	Possessive expression: match as much as possible, but do not rescan any portions of the text.	Given the text`'<tr><td><p>text</p></td>'`, the expression `'</?t.+>'` does not return any matches, because the closing angle bracket is captured using `.`, and is not rescanned.

Mode

Description

Example

exprq

Greedy expression: match as many characters as possible.

Given the text '<tr><td>text</td>', the expression '</?t.*>' matches all characters between <tr and /td>:

'<tr><td><p>text</p></td>'

exprq?

Lazy expression: match as few characters as necessary.

Given the text'<tr><td>text</td>', the expression '</?t.*?>' ends each match at the first occurrence of the closing angle bracket (>):

'<tr>'   '<td>'   '</td>'

exprq+

Possessive expression: match as much as possible, but do not rescan any portions of the text.

Given the text'<tr><td>text</td>', the expression '</?t.*+>' does not return any matches, because the closing angle bracket is captured using .*, and is not rescanned.

Grouping Operators

Grouping operators allow you to capture tokens, apply one operator to multiple elements, or disable backtracking in a specific group.

Grouping Operator	Description	Example
`(expr)`	Group elements of the expression and capture tokens.	`'Joh?n\s(\w*)'` captures a token that contains the last name of any person with the first name `John` or `Jon`.
`(?:expr)`	Group, but do not capture tokens.	`'(?:[aeiou][^aeiou]){2}'` matches two consecutive patterns of a vowel followed by a nonvowel, such as `'anon'`. Without grouping, `'[aeiou][^aeiou]{2}'`matches a vowel followed by two nonvowels.
`(?>expr)`	Group atomically. Do not backtrack within the group to complete the match, and do not capture tokens.	`'A(?>.)Z'` does not match `'AtoZ'`, although `'A(?:.)Z'` does. Using the atomic group, `Z` is captured using `.*` and is not rescanned.
`(expr1\|expr2)`	Match expression `expr1` or expression `expr2`. If there is a match with `expr1`, then `expr2` is ignored. You can include `?:` or `?>` after the opening parenthesis to suppress tokens or group atomically.	`'(let\|tel)\w+'` matches words that contain, but do not end, with `let` or `tel`.

Anchors

Anchors in the expression match the beginning or end of the input text or word.

Anchor	Matches the...	Example
`^expr`	Beginning of the input text.	`'^M\w*'` matches a word starting with `M` at the beginning of the text.
`expr$`	End of the input text.	`'\w*m$'` matches words ending with `m` at the end of the text.
`\<expr`	Beginning of a word.	`'\<n\w*'` matches any words starting with `n`.
`expr\>`	End of a word.	`'\w*e\>'` matches any words ending with `e`.

Lookaround Assertions

Lookaround assertions look for patterns that immediately precede or follow the intended match, but are not part of the match.

The pointer remains at the current location, and characters that correspond to the test expression are not captured or discarded. Therefore, lookahead assertions can match overlapping character groups.

Lookaround Assertion	Description	Example
`expr(?=test)`	Look ahead for characters that match `test`.	`'\w*(?=ing)'` matches terms that are followed by `ing`, such as `'Fly'` and `'fall'` in the input text `'Flying, not falling.'`
`expr(?!test)`	Look ahead for characters that do not match `test`.	`'i(?!ng)'` matches instances of the letter `i` that are not followed by `ng`.
`(?<=test)expr`	Look behind for characters that match `test`.	`'(?<=re)\w*'` matches terms that follow `'re'`, such as `'new'`, `'use'`, and `'cycle'` in the input text `'renew, reuse, recycle'`
`(?<!test)expr`	Look behind for characters that do not match `test`.	`'(?<!\d)(\d)(?!\d)'` matches single-digit numbers (digits that do not precede or follow other digits).

If you specify a lookahead assertion before an expression, the operation is equivalent to a logical AND.

Operation	Description	Example
`(?=test)expr`	Match both `test` and `expr`.	`'(?=[a-z])[^aeiou]'` matches consonants.
`(?!test)expr`	Match `expr` and do not match `test`.	`'(?![aeiou])[a-z]'` matches consonants.

Logical and Conditional Operators

Logical and conditional operators allow you to test the state of a given condition, and then use the outcome to determine which pattern, if any, to match next. These operators support logical OR, and if or if/else conditions.

Conditions can be tokens, lookaround operators, or dynamic expressions of the form (?@cmd). Dynamic expressions must return a logical or numeric value.

Conditional Operator	Description	Example
`expr1\|expr2`	Match expression `expr1` or expression `expr2`. If there is a match with `expr1`, then `expr2` is ignored.	`'(let\|tel)\w+'` matches words that start with `let` or `tel`.
`(?(cond)expr)`	If condition `cond` is `true`, then match `expr`.	`'(?(?@ispc)[A-Z]:\\)'` matches a drive name, such as `C:\`, when run on a Windows^® system.
`(?(cond)expr1\|expr2)`	If condition `cond` is `true`, then match `expr1`. Otherwise, match `expr2`.	`'Mr(s?)\..?(?(1)her\|his) \w'` matches text that includes `her` when the text begins with `Mrs`, or that includes `his` when the text begins with `Mr`.

Conditional Operator

Description

Example

expr1|expr2

Match expression expr1 or expression expr2.

If there is a match with expr1, then expr2 is ignored.

'(let|tel)\w+' matches words that start with let or tel.

(?(cond)expr)

If condition cond is true, then match expr.

'(?(?@ispc)[A-Z]:\\)' matches a drive name, such as C:\, when run on a Windows^® system.

(?(cond)expr1|expr2)

If condition cond is true, then match expr1. Otherwise, match expr2.

'Mr(s?)\..*?(?(1)her|his) \w*' matches text that includes her when the text begins with Mrs, or that includes his when the text begins with Mr.

Token Operators

Tokens are portions of the matched text that you define by enclosing part of the regular expression in parentheses. You can refer to a token by its sequence in the text (an ordinal token), or assign names to tokens for easier code maintenance and readable output.

Ordinal Token Operator	Description	Example
`(expr)`	Capture in a token the characters that match the enclosed expression.	`'Joh?n\s(\w*)'` captures a token that contains the last name of any person with the first name `John` or `Jon`.
`\N`	Match the `N`th token.	`'<(\w+).>.</\1>'` captures tokens for HTML tags, such as `'title'` from the text `'<title>Some text</title>'`.
`(?(N)expr1\|expr2)`	If the `N`th token is found, then match `expr1`. Otherwise, match `expr2`.	`'Mr(s?)\..?(?(1)her\|his) \w'` matches text that includes `her` when the text begins with `Mrs`, or that includes `his` when the text begins with `Mr`.

Named Token Operator	Description	Example
`(?<name>expr)`	Capture in a named token the characters that match the enclosed expression.	`'(?<month>\d+)-(?<day>\d+)-(?<yr>\d+)'` creates named tokens for the month, day, and year in an input date of the form `mm-dd-yy`.
`\k<name>`	Match the token referred to by `name`.	`'<(?<tag>\w+).>.</\k<tag>>'` captures tokens for HTML tags, such as `'title'` from the text `'<title>Some text</title>'`.
`(?(name)expr1\|expr2)`	If the named token is found, then match `expr1`. Otherwise, match `expr2`.	`'Mr(?<sex>s?)\..?(?(sex)her\|his) \w'` matches text that includes `her` when the text begins with `Mrs`, or that includes `his` when the text begins with `Mr`.

Note

If an expression has nested parentheses, MATLAB^® captures tokens that correspond to the outermost set of parentheses. For example, given the search pattern '(and(y|rew))', MATLAB creates a token for 'andrew' but not for 'y' or 'rew'.

Dynamic Regular Expressions

Dynamic expressions allow you to execute a MATLAB command or a regular expression to determine the text to match.

The parentheses that enclose dynamic expressions do not create a capturing group.

Operator	Description	Example
`(??expr)`	Parse `expr` and include the resulting term in the match expression. When parsed, `expr` must correspond to a complete, valid regular expression. Dynamic expressions that use the backslash escape character (`\`) require two backslashes: one for the initial parsing of `expr`, and one for the complete match.	`'^(\d+)((??\\w{$1}))'` determines how many characters to match by reading a digit at the beginning of the match. The dynamic expression is enclosed in a second set of parentheses so that the resulting match is captured in a token. For instance, matching `'5XXXXX'` captures tokens for `'5'` and `'XXXXX'`.
`(??@cmd)`	Execute the MATLAB command represented by `cmd`, and include the output returned by the command in the match expression.	`'(.{2,}).?(??@fliplr($1))'` finds palindromes that are at least four characters long, such as `'abba'`.
`(?@cmd)`	Execute the MATLAB command represented by `cmd`, but discard any output the command returns. (Helpful for diagnosing regular expressions.)	`'\w?(\w)(?@disp($1))\1\w'` matches words that include double letters (such as `pp`), and displays intermediate results.

Operator

Description

Example

(??expr)

Parse expr and include the resulting term in the match expression.

When parsed, expr must correspond to a complete, valid regular expression. Dynamic expressions that use the backslash escape character (\) require two backslashes: one for the initial parsing of expr, and one for the complete match.

'^(\d+)((??\\w{$1}))' determines how many characters to match by reading a digit at the beginning of the match. The dynamic expression is enclosed in a second set of parentheses so that the resulting match is captured in a token. For instance, matching '5XXXXX' captures tokens for '5' and 'XXXXX'.

(??@cmd)

Execute the MATLAB command represented by cmd, and include the output returned by the command in the match expression.

'(.{2,}).?(??@fliplr($1))' finds palindromes that are at least four characters long, such as 'abba'.

(?@cmd)

Execute the MATLAB command represented by cmd, but discard any output the command returns. (Helpful for diagnosing regular expressions.)

'\w*?(\w)(?@disp($1))\1\w*' matches words that include double letters (such as pp), and displays intermediate results.

Within dynamic expressions, use the following operators to define replacement text.

Replacement Operator	Description
`$&` or `$0`	Portion of the input text that is currently a match
$`	Portion of the input text that precedes the current match
`$'`	Portion of the input text that follows the current match (use `$''` to represent `$'`)
`$N`	`N`th token
`$<name>`	Named token
`${cmd}`	Output returned when MATLAB executes the command, `cmd`

Comments

Characters	Description	Example
`(?#comment)`	Insert a comment in the regular expression. The comment text is ignored when matching the input.	`'(?# Initial digit)\<\d\w+'` includes a comment, and matches words that begin with a number.

Search Flags

Search flags modify the behavior for matching expressions. An alternative to using a search flag within an expression is to pass an option input argument.

Flag	Description
`(?-i)`	Match letter case (default for `regexp` and `regexprep`).
`(?i)`	Do not match letter case (default for `regexpi`).
`(?s)`	Match dot (`.`) in the pattern with any character (default).
`(?-s)`	Match dot in the pattern with any character that is not a newline character.
`(?-m)`	Match the `^` and `$` metacharacters at the beginning and end of text (default).
`(?m)`	Match the `^` and `$` metacharacters at the beginning and end of a line.
`(?-x)`	Include space characters and comments when matching (default).
`(?x)`	Ignore space characters and comments when matching. Use `'\ '` and `'\#'` to match space and `#` characters.

The expression that the flag modifies can appear either after the parentheses, such as

(?i)\w*

or inside the parentheses and separated from the flag with a colon (:), such as

(?i:\w*)

The latter syntax allows you to change the behavior for part of a larger expression.

Data Types: char | cell | string

`outkey` — Keyword that indicates which outputs to return
`'start'` (default) | `'end'` | `'tokenExtents'` | `'match'` | `'tokens'` | `'names'` | `'split'`

Keyword that indicates which outputs to return, specified as one of the following character vectors.

Output Keyword	Returns
`'start'` (default)	Starting indices of all matches, `startIndex`
`'end'`	Ending indices of all matches, `endIndex`
`'tokenExtents'`	Starting and ending indices of all tokens
`'match'`	Text of each substring that matches the pattern in `expression`
`'tokens'`	Text of each captured token in `str`
`'names'`	Name and text of each named token
`'split'`	Text of nonmatching substrings of `str`

Data Types: char | string

`option` — Search option
`'once'` | `'warnings'` | `'ignorecase'` | `'emptymatch'` | `'dotexceptnewline'` | `'lineanchors'` | ...

Search option, specified as a character vector. Options come in pairs: one option that corresponds to the default behavior, and one option that allows you to override the default. Specify only one option from a pair. Options can appear in any order.

Default	Override	Description
`'all'`	`'once'`	Match the expression as many times as possible (default), or only once.
`'nowarnings'`	`'warnings'`	Suppress warnings (default), or display them.
`'matchcase'`	`'ignorecase'`	Match letter case (default), or ignore case.
`'noemptymatch'`	`'emptymatch'`	Ignore zero length matches (default), or include them.
`'dotall'`	`'dotexceptnewline'`	Match dot with any character (default), or all except newline (`\n`).
`'stringanchors'`	`'lineanchors'`	Apply `^` and `$` metacharacters to the beginning and end of a character vector (default), or to the beginning and end of a line. The newline character (`\n`) specifies the end of a line. The beginning of a line is specified as the first character, or any character that immediately follows a newline character.
`'literalspacing'`	`'freespacing'`	Include space characters and comments when matching (default), or ignore them. With `freespacing`, use `'\ '` and `'\#'` to match space and `#` characters.

Data Types: char | string

Output Arguments

collapse all

`startIndex` — Starting index of each match
row vector | cell array of row vectors

Starting indices of each match, returned as a row vector or cell array, as follows:

If str and expression are both character vectors or string scalars, the output is a row vector (or, if there are no matches, an empty array).
If either str or expression is a cell array of character vectors or a string array, and the other is a character vector or a string scalar, the output is a cell array of row vectors. The output cell array has the same dimensions as the input array.
If str and expression are both cell arrays or string arrays, they must have the same dimensions. The output is a cell array with the same dimensions.

`endIndex` — Ending index of each match
row vector | cell array of row vectors

Ending index of each match, returned as a row vector or cell array, as follows:

If str and expression are both character vectors or string scalars, the output is a row vector (or, if there are no matches, an empty array).
If either str or expression is a cell array of character vectors or a string array, and the other is a character vector or a string scalar, the output is a cell array of row vectors. The output cell array has the same dimensions as the input array.
If str and expression are both cell arrays or string arrays, they must have the same dimensions. The output is a cell array with the same dimensions.

`out` — Information about matches
numeric array | cell array | string array | structure array

Information about matches, returned as a numeric, cell, string, or structure array. The information in the output depends upon the value you specify for outkey, as follows.

Output Keyword	Output Description	Output Type and Dimensions
`'start'`	Starting indices of matches	For both `'start'` and `'end'`: If `str` and `expression` are both character vectors or string scalars, the output is a row vector (or, if there are no matches, an empty array). If either `str` or `expression` is a cell array of character vectors or a string array, and the other is a character vector or a string scalar, the output is a cell array of row vectors. The output cell array has the same dimensions as the input array. If `str` and `expression` are both cell arrays or string arrays, they must have the same dimensions. The output is a cell array with the same dimensions.
`'end'`	Ending indices of matches
`'tokenExtents'`	Starting and ending indices of all tokens	By default, when returning all matches: If `str` and `expression` are both character vectors or string scalars, the output is a 1-by-`n` cell array, where `n` is the number of matches. Each cell contains an `m`-by-2 numeric array of indices, where `m` is the number of tokens in the match. If either `str` or `expression` is a cell array of character vectors or a string array, the output is a cell array with the same dimensions as the input array. Each cell contains a 1-by-`n` cell array, where each inner cell contains an `m`-by-2 numeric array. If `str` and `expression` are both cell arrays or string arrays, they must have the same dimensions. The output is a cell array with the same dimensions. When you specify the `'once'` option to return only one match, the output is either an `m`-by-2 numeric array or a cell array with the same dimensions as `str` and/or `expression`. If a token is expected at a particular index `N`, but is not found, then MATLAB returns extents for that token of `[N,N-1]`.
`'match'`	Text of each substring that matches the pattern in `expression`	By default, when returning all matches: If `str` and `expression` are both character vectors or string scalars, the output is a 1-by-`n` array, where `n` is the number of matches. If `str` is a character vector, then the output is a cell array of character vectors. If `str` is a string scalar, then the output is a string array. If either `str` or `expression` is a cell array of character vectors or a string array, and the other is a character vector or a string scalar, then the output is a cell array with the same dimensions as the argument that is an array. If `str` is a character vector or a cell array of character vectors, then the output is a cell array of character vectors. If `str` is a string array, then the output is a cell array in which each cell contains a string array. If `str` and `expression` are both cell arrays or string arrays, they must have the same dimensions. The output is a cell array with the same dimensions. If `str` is a cell array of character vectors, then so is the output. If `str` is a string array, then the output is a cell array in which each cell contains a string array. When you specify the `'once'` option to return only one match, the output is either a character vector, a string array, or a cell array with the same dimensions as `str` and `expression`.
`'tokens'`	Text of each captured token in `str`	By default, when returning all matches: If `str` and `expression` are both character vectors or string scalars, the output is a 1-by-`n` cell array, where `n` is the number of matches. Each cell contains a 1-by-`m` cell array of matches, where `m` is the number of tokens in the match. If `str` is a character vector, then the output is a cell array of character vectors. If `str` is a string array, then the output is a cell array in which each cell contains a string array. If either `str` or `expression` is a cell array of character vectors or a string array, and the other is a character vector or a string scalar, then the output is a cell array with the same dimensions as the argument that is an array. Each cell contains a 1-by-`n` cell array, where each inner cell contains a 1-by-`m` array. If `str` is a character vector or a cell array of character vectors, then each inner cell contains a 1-by-`m` cell array. If `str` is a string array, then each inner cell contains a 1-by-`m` string array. If `str` and `expression` are both cell arrays or string arrays, they must have the same dimensions. The output is a cell array with the same dimensions. If `str` is a cell array of character vectors, then so is the output. If `str` is a string array, then the output is a cell array in which the innermost cells contain string arrays. When you specify the `'once'` option to return only one match, the output is a 1-by-`m` string array, cell array of character vectors, or a cell array that has the same dimensions as `str` and/or `expression`. If a token is expected at a particular index, but is not found, then MATLAB returns an empty value for the token, `''` for character vectors, or `""` for strings.
`'names'`	Name and text of each named token	For all matches: If `str` and `expression` are both character vectors or string scalars, the output is a 1-by-`n` structure array, where `n` is the number of matches. The structure field names correspond to the token names. If `str` or `expression` is a cell array of character vectors or a string array, and the other is a character vector or a string scalar. then the output is a cell array with the same dimensions as the argument that is an array. Each cell contains a 1-by-`n` structure array. If `str` and `expression` are both cell arrays or string arrays, they must have the same dimensions. The output is a cell array with the same dimensions.
`'split'`	Text of nonmatching substrings of `str`	For all matches: If `str` and `expression` are both character vectors or string scalars, the output is a 1-by-`n` array, where `n` is the number of nonmatches. If `str` is a character vector, then the output is a cell array of character vectors. If `str` is a string scalar, then the output is a string array. If either `str` or `expression` is a cell array of character vectors or a string array, and the other is a character vector or a string scalar, then the output is a cell array with the same dimensions as the input array. Each cell contains a 1-by-`n` cell array of character vectors. If `str` is a character vector or a cell array of character vectors, then the output is a cell array of character vectors. If `str` is a string array, then the output is a cell array in which each cell contains a string array. If `str` and `expression` are both cell arrays, they must have the same dimensions. The output is a cell array with the same dimensions. If `str` is a cell array of character vectors, then so is the output. If `str` is a string array, then the output is a cell array in which each cell contains a string array.

More About

collapse all

Tokens

Tokens are portions of the matched text that correspond to portions of the regular expression. To create tokens, enclose part of the regular expression in parentheses.

For example, this expression finds a date of the form dd-mmm-yyyy, including tokens for the day, month, and year.

str = 'Here is a date: 01-Apr-2020';
expression = '(\d+)-(\w+)-(\d+)';

mydate = regexp(str,expression,'tokens');
mydate{:}

ans =

  1×3 cell array

    {'01'}    {'Apr'}    {'2020'}

You can associate names with tokens so that they are more easily identifiable:

str = 'Here is a date: 01-Apr-2020';
expression = '(?<day>\d+)-(?<month>\w+)-(?<year>\d+)';

mydate = regexp(str,expression,'names')

mydate = 

  struct with fields:

      day: '01'
    month: 'Apr'
     year: '2020'

For more information, see Tokens in Regular Expressions.

Tips

Use contains or strfind to find an exact character match within text. Use regexp to look for a pattern of characters.

Algorithms

MATLAB parses each input character vector or string from left to right, attempting to match the text in the character vector or string with the first element of the regular expression. During this process, MATLAB skips over any text that does not match.

When MATLAB finds the first match, it continues parsing to match the second piece of the expression, and so on.

Extended Capabilities

expand all

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.

The regexp function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment.

Version History

Introduced before R2006a

regexp

Syntax

Description

Examples

Find Patterns in Text

Find Patterns in Multiple Pieces of Text

Return Substrings Using match Keyword

Split Text at Delimiter Using split Keyword

Return Both Matching and Nonmatching Substrings

Capture Substrings of Matches Using Ordinal Tokens

Capture Substrings of Matches Using Named Tokens

Perform Case-Insensitive Matches

Parse Text with Newline Characters

Return Matches in Cell

Input Arguments

`str` — Input text
character vector | cell array of character vectors | string array

`expression` — Regular expression
character vector | cell array of character vectors | string array

`outkey` — Keyword that indicates which outputs to return
`'start'` (default) | `'end'` | `'tokenExtents'` | `'match'` | `'tokens'` | `'names'` | `'split'`

`option` — Search option
`'once'` | `'warnings'` | `'ignorecase'` | `'emptymatch'` | `'dotexceptnewline'` | `'lineanchors'` | ...

Output Arguments

`startIndex` — Starting index of each match
row vector | cell array of row vectors

`endIndex` — Ending index of each match
row vector | cell array of row vectors

`out` — Information about matches
numeric array | cell array | string array | structure array

More About

Tokens

Tips

Algorithms

Extended Capabilities

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.

Version History

See Also

Topics

regexp

Syntax

Description

Examples

Find Patterns in Text

Find Patterns in Multiple Pieces of Text

Return Substrings Using match Keyword

Split Text at Delimiter Using split Keyword

Return Both Matching and Nonmatching Substrings

Capture Substrings of Matches Using Ordinal Tokens

Capture Substrings of Matches Using Named Tokens

Perform Case-Insensitive Matches

Parse Text with Newline Characters

Return Matches in Cell

Input Arguments

str — Input text character vector | cell array of character vectors | string array

expression — Regular expression character vector | cell array of character vectors | string array

outkey — Keyword that indicates which outputs to return 'start' (default) | 'end' | 'tokenExtents' | 'match' | 'tokens' | 'names' | 'split'

option — Search option 'once' | 'warnings' | 'ignorecase' | 'emptymatch' | 'dotexceptnewline' | 'lineanchors' | ...

Output Arguments

startIndex — Starting index of each match row vector | cell array of row vectors

endIndex — Ending index of each match row vector | cell array of row vectors

out — Information about matches numeric array | cell array | string array | structure array

More About

Tokens

Tips

Algorithms

Extended Capabilities

Thread-Based Environment Run code in the background using MATLAB® backgroundPool or accelerate code with Parallel Computing Toolbox™ ThreadPool.

Version History

See Also

Topics

`str` — Input text
character vector | cell array of character vectors | string array

`expression` — Regular expression
character vector | cell array of character vectors | string array

`outkey` — Keyword that indicates which outputs to return
`'start'` (default) | `'end'` | `'tokenExtents'` | `'match'` | `'tokens'` | `'names'` | `'split'`

`option` — Search option
`'once'` | `'warnings'` | `'ignorecase'` | `'emptymatch'` | `'dotexceptnewline'` | `'lineanchors'` | ...

`startIndex` — Starting index of each match
row vector | cell array of row vectors

`endIndex` — Ending index of each match
row vector | cell array of row vectors

`out` — Information about matches
numeric array | cell array | string array | structure array

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.