## Customizable Natural-Order Sort

version 1.11.0.0 (12.5 KB) by Stephen Cobeldick

### Stephen Cobeldick (view profile)

Natural-order sort of a cell array of strings, with customizable numeric format.

Updated 25 Mar 2018

Editor's Note: This file was selected as MATLAB Central Pick of the Week

### Summary ###

Alphanumeric sort of a cell array of strings (1xN char). Sorts the strings taking into account the values of any numeric substrings occurring within those strings. Compare for example:

X = {'a2', 'a10', 'a1'};
sort(X)
ans = 'a1' 'a10' 'a2'
natsort(X)
ans = 'a1' 'a2' 'a10'

By default NATSORT simply treats all consecutive digits as integer values, however NATSORT also provides optional user-control over the numeric substring recognition and parsing via a regular expression, allowing the numeric substrings to have:
* a +/- sign
* a decimal point and decimal fraction
* E-notation exponent
* decimal, octal, hexadecimal or binary notation
* prefixes/suffixes/literals which can be ignored
* Inf or NaN value
* any feature supported by regular expressions, including look-arounds, quantifiers, etc.

The numeric class can be chosen to suit the substrings' numeric data:
* DOUBLE
* INT*
* UINT*

And of course the sorting itself can also be controlled:
* ascending/descending sort direction
* character case sensitivity/insensitivity
* relative order of numeric substrings vs. characters

### Examples ###

The default is for integer numeric substrings, as shown in the example in the introduction.

%% Multiple integer substrings (e.g. release version numbers):
B = {'v10.6', 'v9.10', 'v9.5', 'v10.10', 'v9.10.20', 'v9.10.8'};
sort(B)
ans = 'v10.10' 'v10.6' 'v9.10' 'v9.10.20' 'v9.10.8' 'v9.5'
natsort(B)
ans = 'v9.5' 'v9.10' 'v9.10.8' 'v9.10.20' 'v10.6' 'v10.10'

%% Integer, decimal or Inf number substrings, possibly with +/- signs:
C = {'test+Inf', 'test11.5', 'test-1.4', 'test', 'test-Inf', 'test+0.3'};
sort(C)
ans = 'test' 'test+0.3' 'test+Inf' 'test-1.4' 'test-Inf' 'test11.5'
natsort(C, '(-|+)?(Inf|\d+\.?\d*)')
ans = 'test' 'test-Inf' 'test-1.4' 'test+0.3' 'test11.5' 'test+Inf'

%% Integer or decimal number substrings, possibly with an exponent:
D = {'0.56e007', '', '4.3E-2', '10000', '9.8'};
sort(D)
ans = '' '0.56e007' '10000' '4.3E-2' '9.8'
natsort(D, '\d+\.?\d*(E(+|-)?\d+)?')
ans = '' '4.3E-2' '9.8' '10000' '0.56e007'

%% Hexadecimal number substrings (possibly with '0X' prefix):
E = {'a0X7C4z', 'a0X5z', 'a0X18z', 'aFz'};
sort(E)
ans = 'a0X18z' 'a0X5z' 'a0X7C4z' 'aFz'
natsort(E, '(?<=a)(0X)?[0-9A-F]+', '%x')
ans = 'a0X5z' 'aFz' 'a0X18z' 'a0X7C4z'

%% Binary number substrings (possibly with '0B' prefix):
F = {'a11111000100z', 'a0B101z', 'a0B000000000011000z', 'a1111z'};
sort(F)
ans = 'a0B000000000011000z' 'a0B101z' 'a11111000100z' 'a1111z'
natsort(F, '(0B)?+', '%b')
ans = 'a0B101z' 'a1111z' 'a0B000000000011000z' 'a11111000100z'

%% UINT64 number substrings (with full precision!):
natsort({'a18446744073709551615z', 'a18446744073709551614z'}, [], '%lu')
ans = 'a18446744073709551614z' 'a18446744073709551615z'

%% Case sensitivity:
G = {'a2', 'A20', 'A1', 'a10', 'A2', 'a1'};
natsort(G, [], 'ignorecase') % default
ans = 'A1' 'a1' 'a2' 'A2' 'a10' 'A20'
natsort(G, [], 'matchcase')
ans = 'A1' 'A2' 'A20' 'a1' 'a2' 'a10'

%% Sort direction:
H = {'2', 'a', '3', 'B', '1'};
natsort(H, [], 'ascend') % default
ans = '1' '2' '3' 'a' 'B'
natsort(H, [], 'descend')
ans = 'B' 'a' '3' '2' '1'

%% Relative sort-order of number substrings compared to characters:
V = num2cell(char(32+randperm(63)));
cell2mat(natsort(V, [], 'asdigit')) % default
ans = '!"#\$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_'
cell2mat(natsort(V, [], 'beforechar'))
ans = '0123456789!"#\$%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_'
cell2mat(natsort(V, [], 'afterchar'))
ans = '!"#\$%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_0123456789'

