MATLAB Examples

NATSORT Examples

The function NATSORT sorts a cell array of strings, taking into account any number values within the strings. This is known as a "natural order sort" or an "alphanumeric sort". Note that MATLAB's inbuilt SORT function only sorts by character order (as per sort in most programming languages).

For sorting filenames or filepaths use NATSORTFILES.

For sorting the rows of a cell array of strings use NATSORTROWS.

Contents

Basic Usage: Integer Numbers

By default NATSORT interprets consecutive digits as being part of a single integer, each number is considered to be as wide as one letter:

A = {'a2', 'a10', 'a1'};
sort(A)
natsort(A)
B = {'v10.6', 'v9.10', 'v9.5', 'v10.10', 'v9.10.20', 'v9.10.8'};
sort(B)
natsort(B)
ans = 
    'a1'    'a10'    'a2'
ans = 
    'a1'    'a2'    'a10'
ans = 
    'v10.10'    'v10.6'    'v9.10'    'v9.10.20'    'v9.10.8'    'v9.5'
ans = 
    'v9.5'    'v9.10'    'v9.10.8'    'v9.10.20'    'v10.6'    'v10.10'

Output 2: Sort Index

The second output argument is a numeric array of the sort indices ndx, such that Y = X(ndx) where Y = natsort(X):

[~,ndx] = natsort(A)
ndx =
     3     1     2

Output 3: Debugging Array

The third output is a cell array containing the individual characters and numbers (after converting to numeric). This is useful for confirming that the numbers are being correctly identified and parsed into numeric values. Note that the rows of the array are linear indexed from the input cell array.

[~,~,dbg] = natsort(B)
dbg = 
    'v'    [10]    '.'    [ 6]     []      []
    'v'    [ 9]    '.'    [10]     []      []
    'v'    [ 9]    '.'    [ 5]     []      []
    'v'    [10]    '.'    [10]     []      []
    'v'    [ 9]    '.'    [10]    '.'    [20]
    'v'    [ 9]    '.'    [10]    '.'    [ 8]

Regular Expression: Decimal Numbers, E-notation, +/- Sign.

The NATSORT algorithm uses REGEXP to detect numbers in the strings, and so provides a convenient way to specify the format of the numbers, e.g. decimal, +/- sign, etc.: by providing an appropriate regular expression as the second input argument:

C = {'test+Inf', 'test11.5', 'test-1.4', 'test', 'test-Inf', 'test+0.3'};
sort(C)
natsort(C, '(-|+)?(Inf|\d+(\.\d+)?)')
D = {'0.56e007', '', '4.3E-2', '10000', '9.8'};
sort(D)
natsort(D, '\d+(\.\d+)?(E(+|-)?\d+)?')
ans = 
    'test'    'test+0.3'    'test+Inf'    'test-1.4'    'test-Inf'    'test11.5'
ans = 
    'test'    'test-Inf'    'test-1.4'    'test+0.3'    'test11.5'    'test+Inf'
ans = 
    ''    '0.56e007'    '10000'    '4.3E-2'    '9.8'
ans = 
    ''    '4.3E-2'    '9.8'    '10000'    '0.56e007'

Regular Expression: Hexadecimal, Octal, and Binary Numbers.

Numbers encoded in hexadecimal, octal, or binary may also be parsed and sorted correctly. This requires both an appropriate regular expression that can detect the numbers correctly, and also a suitable SSCANF format string (see the section " SSCANF Format String"):

E = {'a0X7C4z', 'a0X5z', 'a0X18z', 'aFz'};
sort(E)
natsort(E, '(?<=a)(0X)?[0-9A-F]+', '%x')
F = {'a11111000100z', 'a0B101z', 'a0B000000000011000z', 'a1111z'};
sort(F)
natsort(F, '(0B)?[01]+', '%b')
ans = 
    'a0X18z'    'a0X5z'    'a0X7C4z'    'aFz'
ans = 
    'a0X5z'    'aFz'    'a0X18z'    'a0X7C4z'
ans = 
    'a0B000000000011000z'    'a0B101z'    'a11111000100z'    'a1111z'
ans = 
    'a0B101z'    'a1111z'    'a0B000000000011000z'    'a11111000100z'

SSCANF Format String: Hexadecimal, Octal, and 64 Bit Numbers.

The default format string of %f will correctly parse many common number substrings. This includes decimal integers, decimal digits, NaN, Inf, and numbers written in E-notation. For hexadecimal, octal, and for large integers the format string must be specified as an input argument: the supported SSCANF formats are shown in this table:

Format String Number Types
%e, %f, %g floating point numbers
%d signed decimal
%i signed decimal, octal, or hexadecimal
%ld, %li signed 64 bit, decimal, octal, or hexadecimal
%u unsigned decimal
%o unsigned octal
%x unsigend hexadecimal
%lu, %lo, %lx unsigned 64-bit decimal, octal, or hexadecimal

For example large integers can be converted to 64-bit numerics, with their full precision:

natsort({'a18446744073709551615z', 'a18446744073709551614z'}, [], '%lu')
ans = 
    'a18446744073709551614z'    'a18446744073709551615z'

Sort Options: Case Sensitivity

By default NATSORT provides a case-insensitive sort of the input strings. An optional argument controls the case sensitivity (the option ignorecase sorts all letter characters as being upper-case):

G = {'a2', 'A20', 'A1', 'a10','A2', 'a1'};
natsort(G, [], 'ignorecase') % default
natsort(G, [], 'matchcase')
ans = 
    'A1'    'a1'    'a2'    'A2'    'a10'    'A20'
ans = 
    'A1'    'A2'    'A20'    'a1'    'a2'    'a10'

Sort Options: Sort Direction

By default NATSORT provides an ascending sort of the input strings. An optional argument controls the sort direction (characters and numbers are either both ascending or both descending):

H = {'2', 'a', '3', 'B', '1'};
natsort(H, [], 'ascend') % default
natsort(H, [], 'descend')
ans = 
    '1'    '2'    '3'    'a'    'B'
ans = 
    'B'    'a'    '3'    '2'    '1'

Sort Options: Order of Numbers Relative to Characters

By default NATSORT treats the detected numbers as if they sorted with the digit characters. An optional argument allows the numbers to be sorted before or after all characters:

X = num2cell(char(32+randperm(63)));
cell2mat(natsort(X, [], 'asdigit')) % default
cell2mat(natsort(X, [], 'beforechar'))
cell2mat(natsort(X, [], 'afterchar'))
ans =
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
ans =
0123456789!"#$%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
ans =
!"#$%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_0123456789