editDistanceSearcher

Edit distance nearest neighbor searcher

Description

An edit distance searcher performs a nearest neighborhood search in a list of known strings, using edit distance.

Creation

Description

example

eds = editDistanceSearcher(vocabulary,maxDist) creates an edit distance searcher and sets the Vocabulary and MaximumDistance properties. The returned object searches the words in vocabulary and with maximum edit distance maxDist.

example

eds = editDistanceSearcher(vocabulary,maxDist,Name,Value) specifies additional options using one or more name-value pair arguments.

Properties

expand all

Words to compare against, specified as a string vector, character vector, or a cell array of character vectors.

Data Types: char | string | cell

Maximum edit distance, specified as a positive scalar.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Cost to insert grapheme, specified as a nonnegative scalar or a function handle.

If InsertCost is a function handle, then the function must accept a single input and return the cost of inserting the input to the source. The cost function must have the form cost = func(grapheme), where the function returns the cost of inserting grapheme into the source string.

If you specify a custom cost function, then the searcher perform exhaustive searching. For large vocabularies, the functions knnsearch and rangesearch can take a long time to find matches.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | function_handle

Cost to delete grapheme, specified as a nonnegative scalar or a function handle.

If DeleteCost is a function handle, then the function must accept a single input and return the cost of deleting the input from the source. The cost function must have the form cost = func(grapheme), where the function returns the cost of deleting grapheme from the source string.

If you specify a custom cost function, then the searcher perform exhaustive searching. For large vocabularies, the functions knnsearch and rangesearch can take a long time to find matches.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | function_handle

Cost to substitute grapheme, specified as a nonnegative scalar or a function handle.

If SubstituteCost is a function handle, then the function must accept exactly two inputs and return the cost of substituting the first input to the second in the source. The cost function must have the form cost = func(grapheme1,grapheme2), where the function returns the cost of substituting grapheme1 with grapheme2 in the source.

If you specify a custom cost function, then the searcher perform exhaustive searching. For large vocabularies, the functions knnsearch and rangesearch can take a long time to find matches.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | function_handle

Cost to swap adjacent graphemes, specified as a nonnegative scalar or a function handle.

If SwapCost is a function handle, then the function must accept exactly two inputs and return the cost of swapping the first input with the second in the source. The cost function must have the form cost = func(grapheme1,grapheme2), where the function returns the cost of swapping the adjacent graphemes grapheme1 and grapheme2 in the source.

If you specify a custom cost function, then the searcher perform exhaustive searching. For large vocabularies, the functions knnsearch and rangesearch can take a long time to find matches.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | function_handle

Object Functions

rangesearchFind nearest neighbors by edit distance range
knnsearchFind nearest neighbors by edit distance

Examples

collapse all

Create an edit distance searcher with a maximum edit distance 3 from the words "MathWorks", "MATLAB", and "Analytics".

vocabulary = ["MathWorks" "MATLAB" "Analytics"];
eds = editDistanceSearcher(vocabulary,3)
eds = 
  editDistanceSearcher with properties:

         Vocabulary: ["MathWorks"    "MATLAB"    "Analytics"]
    MaximumDistance: 3
         InsertCost: 1
         DeleteCost: 1
     SubstituteCost: 1
           SwapCost: Inf

Create an edit distance searcher using the Damerau-Levenshtein edit distance. The Damerau-Levenshtein edit distance is the lowest number of insertions, deletions, substitutions, and swaps.

Create the edit distance searcher from the words "MathWorks", "MATLAB", and "Analytics" and specify a maximum distance of 3. To specify the Damerau-Levenshtein edit distance, set 'SwapCost' to 1.

vocabulary = ["MathWorks" "MATLAB" "Analytics"];
eds = editDistanceSearcher(vocabulary,3,'SwapCost',1)
eds = 
  editDistanceSearcher with properties:

         Vocabulary: ["MathWorks"    "MATLAB"    "Analytics"]
    MaximumDistance: 3
         InsertCost: 1
         DeleteCost: 1
     SubstituteCost: 1
           SwapCost: 1

Create an edit distance searcher.

vocabulary = ["MathWorks" "MATLAB" "Simulink"];
eds = editDistanceSearcher(vocabulary,2);

Find the nearest words to "MALTAB" and "MatWorks".

words = ["MALTAB" "MatWorks"];
idx = knnsearch(eds,words)
idx = 2×1

     2
     1

Get the words from the vocabulary using the returned indices.

nearestWords = eds.Vocabulary(idx)
nearestWords = 1x2 string array
    "MATLAB"    "MathWorks"

Create an edit distance searcher and specify a maximum edit distance of 3.

vocabulary = ["MathWorks" "MATLAB" "Simulink" "text" "analytics" "analysis"];
maxDist = 3;
eds = editDistanceSearcher(vocabulary,maxDist);

Find the nearest words to "MALTAB" and "MatWorks" with edit distance less than or equal to 1.

words = ["MALTAB" "MatWorks" "analytcs"];
maxDist = 1;
idx = rangesearch(eds,words,maxDist)
idx=3×1 cell
    {1x0 double}
    {[       1]}
    {[       5]}

For "MALTAB", there are no words in the searcher within the specified range. For "MatWorks" and "analytics", there is one result. View the corresponding word for "MatWorks" using the returned index.

nearestWords = eds.Vocabulary(idx{2})
nearestWords = 
"MathWorks"

Find the nearest words to "MALTAB", "MatWorks", and "analytcs" with edit distance less than or equal to 3 and their corresponding edit distances.

words = ["MALTAB" "MatWorks" "analytcs"];
maxDist = 3;
[idx,d] = rangesearch(eds,words,maxDist)
idx=3×1 cell
    {[       2]}
    {[       1]}
    {1x2 double}

d=3×1 cell
    {[       2]}
    {[       1]}
    {1x2 double}

For both "MALTAB" and "MatWorks", there is one word in the searcher within the specified range. For "analytcs", there are two results. View the corresponding words for "analytcs" using the returned indices and their edit distances.

nearestWords = eds.Vocabulary(idx{3})
nearestWords = 1x2 string array
    "analytics"    "analysis"

d{3}
ans = 1×2

     1     2

Algorithms

expand all

Introduced in R2019a