Search and Replace Text
Processing text data often involves finding and replacing substrings. There are several functions that find text and return different information: some functions confirm that the text exists, while others count occurrences, find starting indices, or extract substrings. These functions work on character vectors and string scalars, such as
"yes", as well as character and string arrays, such as [
"xyz"]. In addition, you can use patterns to define rules for searching, such as one or more letter or digit characters.
Search for Text
To determine if text is present, use a function that returns logical values, like
endsWith. Logical values of
1 correspond to true, and
0 corresponds to false.
txt = "she sells seashells by the seashore"; TF = contains(txt,"sea")
TF = logical 1
Calculate how many times the text occurs using the
n = count(txt,"sea")
n = 2
To locate where the text occurs, use the
strfind function, which returns starting indices.
idx = strfind(txt,"sea")
idx = 1×2 11 28
Find and extract text using extraction functions, such as
mid = extractBetween(txt,"sea","shore")
mid = "shells by the sea"
Optionally, include the boundary text.
mid = extractBetween(txt,"sea","shore","Boundaries","inclusive")
mid = "seashells by the seashore"
Find Text in Arrays
The search and replacement functions can also find text in multi-element arrays. For example, look for color names in several song titles.
songs = ["Yellow Submarine"; "Penny Lane"; "Blackbird"]; colors =["Red","Yellow","Blue","Black","White"]; TF = contains(songs,colors)
TF = 3x1 logical array 1 0 1
To list the songs that contain color names, use the logical
TF array as indices into the original
songs array. This technique is called logical indexing.
colorful = songs(TF)
colorful = 2x1 string "Yellow Submarine" "Blackbird"
Use the function
replace to replace text in
songs that matches elements of
colors with the string
ans = 3x1 string "Orange Submarine" "Penny Lane" "Orangebird"
In addition to searching for literal text, like “sea” or “yellow”, you can search for text that matches a pattern. There are many predefined patterns, such as
digitsPattern to find numeric digits.
address = "123a Sesame Street, New York, NY 10128"; nums = extract(address,digitsPattern)
nums = 2x1 string "123" "10128"
For additional precision in searches, you can combine patterns. For example, locate words that start with the character “S”. Use a string to specify the “S” character, and
lettersPattern to find additional letters after that character.
pat = "S" + lettersPattern; StartWithS = extract(address,pat)
StartWithS = 2x1 string "Sesame" "Street"
For more information, see Build Pattern Expressions.