Code covered by the BSD License

### Highlights from Customizable Natural-Order Sort

Be the first to rate this file! 61 Downloads (last 30 days) File Size: 5.08 KB File ID: #34464 Version: 1.8

# Customizable Natural-Order Sort

### Stephen Cobeldick (view profile)

05 Jan 2012 (Updated )

Natural-order sort of a cell array of strings, with customizable numeric format.

### Editor's Notes:

This file was selected as MATLAB Central Pick of the Week

File Information
Description

To sort filenames or filepaths use "natsortfiles":
http://de.mathworks.com/matlabcentral/fileexchange/47434-natural-order-filename-sort
To sort the rows of a cell array of strings use "natsortrows":
http://de.mathworks.com/matlabcentral/fileexchange/47433-natural-order-row-sort

### Summary ###

A natural-order sort is a string-sort that takes into account the values of any numeric substrings occurring within those strings. Compare for example:

A = {'a2', 'a', 'a10', 'a1'};
sort(A)
ans = {'a', 'a1', 'a10', 'a2'}
natsort(A)
ans = {'a', 'a1', 'a2', 'a10'}

By default this function simply treats all consecutive digits as integer values, however it also provides optional user-control over the numeric substring recognition and parsing via a regular expression, allowing the numeric substrings to have:
* a +/- sign
* a decimal point and decimal fraction
* E-notation exponent
* decimal, octal, hexadecimal or binary notation
* prefixes/suffixes/literals which can be ignored
* Inf or NaN value
* any feature supported by regexp, including look-arounds, quantifiers, etc.

The numeric class can be chosen to suit the substrings' numeric data:
* double
* int
* uint

And of course the sorting itself can also be controlled:
* ascending/descending sort direction
* character case sensitivity/insensitivity
* relative order of numeric substrings vs. characters

### Examples ###

The default is for integer numeric substrings, as shown in the example in the introduction.

% Multiple numeric substrings (e.g. version numbers):
B = {'v10.6', 'v9.10', 'v9.5', 'v10.10', 'v9.10.20', 'v9.10.8'};
sort(B)
ans = {'v10.10', 'v10.6', 'v9.10', 'v9.10.20', 'v9.10.8', 'v9.5'}
natsort(B)
ans = {'v9.5', 'v9.10', 'v9.10.8', 'v9.10.20', 'v10.6', 'v10.10'}

% Integer, decimal or Inf numeric substrings, possibly with +/- signs:
C = {'test102', 'test11.5', 'test-1.4', 'test', 'test-Inf', 'test+0.3'};
sort(C)
ans = {'test', 'test+0.3', 'test-1.4', 'test-Inf', 'test102', 'test11.5'}
natsort(C, '(-|+)?(Inf|\d+(\.\d+)?)')
ans = {'test', 'test-Inf', 'test-1.4', 'test+0.3', 'test11.5', 'test102'}

% Integer or decimal numeric substrings, possibly with an exponent:
D = {'0.56e007', '', '4.3E-2', '10000', '9.8'};
sort(D)
ans = {'', '0.56e007', '10000', '4.3E-2', '9.8'}
natsort(D, '\d+(\.\d+)?(e(+|-)?\d+)?')
ans = {'', '4.3E-2', '9.8', '10000', '0.56e007'}

% Hexadecimal numeric substrings (possibly with '0X' prefix):
E = {'a0X7C4z', 'a0X5z', 'a0X18z', 'aFz'};
sort(E)
ans = {'a0X18z', 'a0X5z', 'a0X7C4z', 'aFz'}
natsort(E, '(?<=a)(0X)?[0-9A-F]+', '%x')
ans = {'a0X5z', 'aFz', 'a0X18z', 'a0X7C4z'}

% Binary numeric substrings (possibly with '0B' prefix):
F = {'a11111000100z', 'a0B101z', 'a0B000000000000011000z', 'a1111z'};
sort(F)
ans = {'a0B000000000000011000z', 'a0B101z', 'a11111000100z', 'a1111z'}
natsort(F, '(0B)?[01]+', '%b')
ans = {'a0B101z', 'a1111z', 'a0B000000000000011000z', 'a11111000100z'}

% uint64 numeric substrings (with full precision!):
natsort({'a18446744073709551615z', 'a18446744073709551614z'}, '\d+', '%lu')
ans = {'a18446744073709551614z', 'a18446744073709551615z'}

% Case sensitivity:
G = {'a2', 'A20', 'A1', 'a10', 'A2', 'a1'};
natsort(G, '\d+', 'ignorecase') % default
ans = {'A1', 'a1', 'a2', 'A2', 'a10', 'A20'}
natsort(G, '\d+', 'matchcase')
ans = {'A1', 'A2', 'A20', 'a1', 'a2', 'a10'}

% Sort direction:
H = {'2', 'a', '3', 'B', '1'};
natsort(H, '\d+', 'ascend') % default
ans = {'1', '2', '3', 'a', 'B'}
natsort(H, '\d+', 'descend')
ans = {'B', 'a', '3', '2', '1'}

% Relative sort-order of numeric substrings compared to characters:
X = num2cell(char(32+randperm(63)));
cell2mat(natsort(X, '\d+', 'asdigit')) % default
ans = '!"#\$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_'
cell2mat(natsort(X, '\d+', 'beforechar'))
ans = '0123456789!"#\$%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_'
cell2mat(natsort(X, '\d+', 'afterchar'))
ans = '!"#\$%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_0123456789'

Acknowledgements
Required Products MATLAB
MATLAB release MATLAB 7.11 (R2010b)
MATLAB Search Path
`/`
14 Feb 2012 1.1

- Add examples showing different numeric tokens.
- Case-insensitive sort is now default.

24 Aug 2012 1.3

- Implement more compact sort algorithm.
- "sscanf" numeric format can be controlled by an optional input argument.
- Provide use examples.
- Output debugging arrays now char+numeric.

28 Apr 2014 1.4

- Now parses hexadecimal and octal substrings.
- int64 and uint64 parsed at full precision.
- Allow <options> in any order.
- For debugging: return indices of character and numeric arrays.

02 Jul 2014 1.5

- Correct output summary.

05 Aug 2014 1.6

- Improve input checking.
- Replace multiple debugging output arrays with one cell array.
- Allow lookarounds in regular expression.

20 Dec 2014 1.7

- Update documentation only, improve examples.

25 Feb 2015 1.8

* Improved binary substring parsing.
* Better examples.