MATLAB Examples

WORDS2NUM Examples

The function WORDS2NUM converts a string (with a number given in English words) into a numeric value, e.g. 'one thousand and twenty-four' -> 1024. Optional arguments control many string formatting and dialect options. The options are explained in this document, together with examples.

The string format is based on http://www.blackwasp.co.uk/NumberToWords.aspx

Contents

Basic Usage

For many integer and decimal values WORDS2NUM can be called without any options. WORDS2NUM will match integers, decimal digits following the word 'point', and multipliers ('million', 'billion', etc.) in sequence:

words2num('zero')
words2num('infinity')
words2num('negative one thousand and twenty-four')
words2num('one point two three')
words2num('nine point eight million'), format longg
words2num('five billion, six million, seven thousand and eight'), format shortg
ans =
     0
ans =
   Inf
ans =
       -1024
ans =
         1.23
ans =
     9800000
ans =
                5006007008

Numeric Class

The class of the numeric output can be selected using the class option. All relevant internal numeric operations are performed in this class, but the string detection (based on REGEXP) does not change. This means information may be lost during conversion from string to numeric value:

words2num('one centillion', 'class','double') % default
words2num('one centillion', 'class','uint8')
words2num('infinity', 'class','uint8')
ans =
       1e+303
ans =
  255
ans =
  255

Outputs

Because the string detection is based on REGEXP it is possible to detect any number strings inside of longer strings. WORDS2NUM returns a vector of the converted numbers, and a cell array of the input string parts that were split by the detected number strings:

[num,spl] = words2num('HelloOneThousandAndTwenty-FourWorld!')
[num,spl] = words2num('before one hundred middle two hundred after')
num =
        1024
spl = 
    'Hello'    'World!'
num =
   100   200
spl = 
    'before '    ' middle '    ' after'

Character Case

The number strings can be matched depending on the character case:

words2num('One Thousand and TWENTY-four', 'case','ignore') % default
words2num('One Thousand and TWENTY-four', 'case','title')
words2num('One Thousand and TWENTY-four', 'case','upper')
words2num('One Thousand and TWENTY-four', 'case','lower')
ans =
        1024
ans =
        1000
ans =
    20
ans =
     4

Sign Prefix

By default the words 'positive' or 'negative' are automatically detected. It is possible to select to require the sign, or to ignore it:

words2num('positive one, two, negative three','sign',[]) % default
words2num('positive one, two, negative three','sign',true) % require
words2num('positive one, two, negative three','sign',false) % ignore
ans =
     1     2    -3
ans =
     1    -3
ans =
     1     2     3

Number Formatting

Other features or string formatting may be selected to be required or excluded from the number strings. These features are optional by default, but may be excluded by specifying the corresponding option:

words2num('nine million, eight thousand', 'comma',true) % require
words2num('nine million, eight thousand', 'comma',false) % exclude
words2num('one thousand and twenty-four', 'hyphen',false) % exclude
words2num('one thousand and twenty-four', 'space',false) % exclude
words2num('one thousand and twenty-four', 'and',false) % exclude
ans =
     9008000
ans =
     9000000        8000
ans =
        1020           4
ans =
     1    24
ans =
        1000          24

Whitespace Characters

One or more whitespace characters may also be specified:

words2num('one_thousand_and_twenty_four', 'white','_')
words2num('one+thousand and twenty-four', 'white',' +')
ans =
        1024
ans =
        1024

Prefix and Suffix

Using REGEXP allows the number string to only be matched when the requested prefix and/or suffix is also present. Note that these are not interpreted literally, but are interpreted as regular expressions, which means that it is possible to specify lookarounds that must be matched:

[num,spl] = words2num('two cats three hats')
[num,spl] = words2num('two cats three hats','prefix','^') % only match start of string
[num,spl] = words2num('two cats three hats','suffix','?= h') % lookaround: ' h'
num =
     2     3
spl = 
    ''    ' cats '    ' hats'
num =
     2
spl = 
    ''    ' cats three hats'
num =
     3
spl = 
    'two cats '    ' hats'

Number Scale

Several common and not-so-common number scales are supported:

  • short and long scales are explained in many location on the internet. Most contemporary english dialects use the short scale (and is the WORDS2NUM default).
  • peletier scale is used in many non-english speaking european countries.
  • rowlett scale was designed to avoid the ambiguity of the short and long scales.
  • knuth scale (aka -yllion) uses a logarithmic naming system to use very few names to cover a very wide range of values.
words2num('one billion', 'scale','short')
words2num('one thousand million', 'scale','long')
words2num('one milliard', 'scale','peletier')
words2num('one gillion', 'scale','rowlett')
words2num('ten myllion', 'scale','knuth')
ans =
       1e+009
ans =
       1e+009
ans =
       1e+009
ans =
       1e+009
ans =
       1e+009

Compound Multipliers

This is still a little bit experimental, but there is an option to allow parsing of compound multipliers:

words2num('one million', 'mult','simple') % default
words2num('one thousand thousand', 'mult','compound')
words2num('two point three trillion trillion trillion', 'mult','compound')
ans =
     1000000
ans =
     1000000
ans =
     2.3e+036

Reverse Conversion: NUM2WORDS

The function NUM2WORDS converts a numeric scalar into a string with the number value given in English words.