Code covered by the BSD License  

Highlights from
Customizable Natural-Order Sort

Be the first to rate this file! 39 Downloads (last 30 days) File Size: 4.75 KB File ID: #34464
image thumbnail

Customizable Natural-Order Sort

by

 

05 Jan 2012 (Updated )

Natural-order sort of a cell array of strings, with customizable numeric format.

Editor's Notes:

This file was selected as MATLAB Central Pick of the Week

| Watch this File

File Information
Description

To sort filenames or filepaths use "natsortfiles":
http://www.mathworks.com/matlabcentral/fileexchange/47434
To sort the rows of a cell array of strings use "natsortrows":
http://www.mathworks.com/matlabcentral/fileexchange/47433

### Summary ###

A natural-order sort is a string-sort that takes into account the values of any numeric substrings occurring within those strings. Compare for example:

A = {'a2', 'a', 'a10', 'a1'};
sort(A)
 ans = {'a', 'a1', 'a10', 'a2'}
natsort(A)
 ans = {'a', 'a1', 'a2', 'a10'}

By default this function simply treats all consecutive digits as integer values, however it also provides optional user-control over the numeric substring recognition and parsing via a regular expression, allowing the numeric substrings to have:
* a +/- sign
* a decimal point
* exponent notation
* decimal, octal, hexadecimal or binary notation
* prefixes/suffixes/literals which can be ignored
* Inf or NaN value
* min/max limits to the number of digits

The numeric class can be chosen to suit the substrings' numeric data:
* double
* int
* uint

And of course the sorting itself can also be controlled:
* ascending/descending sort direction
* character case sensitivity/insensitivity
* relative order of numeric substrings vs. characters

### Examples ###

The default is for integer numeric substrings, as shown in the example in the introduction.

% Multiple numeric substrings (e.g. version numbers):
B = {'v10.6', 'v10.10', 'v2.10', 'v2.6', 'v2.10.20', 'v2.10.8'};
sort(B)
 ans = {'v10.10', 'v10.6', 'v2.10', 'v2.10.20', 'v2.10.8', 'v2.6'}
natsort(B)
 ans = {'v2.6', 'v2.10', 'v2.10.8', 'v2.10.20', 'v10.6', 'v10.10'}

% Integer, decimal or Inf numeric substrings, possibly with +/- signs:
C = {'test102', 'test11.5', 'test-1.4', 'test', 'test-Inf', 'test+0.3'};
sort(C)
 ans = {'test', 'test+0.3', 'test-1.4', 'test-Inf', 'test102', 'test11.5'}
natsort(C, '(-|+)?(Inf|\d+(\.\d+)?)')
 ans = {'test', 'test-Inf', 'test-1.4', 'test+0.3', 'test11.5', 'test102'}

% Integer or decimal numeric substrings, possibly with an exponent:
D = {'0.56e007', '', '4.3E-2', '10000', '9.8'};
sort(D)
 ans = {'', '0.56e007', '10000', '4.3E-2', '9.8'}
natsort(D, '\d+(\.\d+)?(e(+|-)?\d+)?')
 ans = {'', '4.3E-2', '9.8', '10000', '0.56e007'}

% Hexadecimal numeric substrings (possibly with '0X' prefix):
E = {'a0X7C4z', 'a0X5z', 'a0X18z', 'aFz'};
sort(E)
 ans = {'a0X18z', 'a0X5z', 'a0X7C4z', 'aFz'}
natsort(E, '(?<=a)(0X)?[0-9A-F]+', '%x')
 ans = {'a0X5z', 'aFz', 'a0X18z', 'a0X7C4z'}

% Binary numeric substrings (possibly with '0B' prefix):
F = {'a0B011111000100z', 'a0B101z', 'a0B000000010010z', 'a1111z'};
sort(F)
 ans = {'a0B000000010010z', 'a0B011111000100z', 'a0B101z', 'a1111z'}
natsort(F, '(0B)?[01]+', '%b')
 ans = {'a0B101z', 'a1111z', 'a0B000000010010z', 'a0B011111000100z'}

% uint64 numeric substrings (with full precision!):
natsort({'a18446744073709551615z', 'a18446744073709551614z'}, '\d+', '%lu')
 ans = {'a18446744073709551614z', 'a18446744073709551615z'}

% Case sensitivity:
G = {'a2', 'A20', 'A1', 'a10', 'A2', 'a1'};
natsort(G, '\d+', 'matchcase')
 ans = {'A1', 'A2', 'A20', 'a1', 'a2', 'a10'}
natsort(G, '\d+', 'ignorecase')
 ans = {'A1', 'a1', 'a2', 'A2', 'a10', 'A20'}

% Sort direction:
H = {'2', 'a', '3', 'B', '1'};
natsort(H, '\d+', 'ascend')
 ans = {'1', '2', '3', 'a', 'B'}
natsort(H, '\d+', 'descend')
 ans = {'B', 'a', '3', '2', '1'}

% Relative sort-order of numeric substrings compared to characters:
X = cellstr(char(32+randperm(63)).');
Y = natsort(X, '\d+', 'beforechar'); [Y{:}]
 ans = '0123456789!"#$%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_'
Y = natsort(X, '\d+', 'afterchar'); [Y{:}]
 ans = '!"#$%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_0123456789'
Y = natsort(X, '\d+', 'asdigit'); [Y{:}]
 ans = '!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_'

Acknowledgements

Asort: A Pedestrian Alphanumeric String Sorter, Sort Nat: Natural Order Sort, Natural Order Row Sort, and Natural Order Filename Sort inspired this file.

This file inspired Numeric To English Words, Natural Order Row Sort, and Natural Order Filename Sort.

Required Products MATLAB
MATLAB release MATLAB 7.11 (R2010b)
MATLAB Search Path
/
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Updates
14 Feb 2012

- Add examples showing different numeric tokens.
- Case-insensitive sort is now default.

24 Aug 2012

- Implement more compact sort algorithm.
- "sscanf" numeric format can be controlled by an optional input argument.
- Provide use examples.
- Output debugging arrays now char+numeric.

28 Apr 2014

- Now parses hexadecimal and octal substrings.
- int64 and uint64 parsed at full precision.
- Allow <options> in any order.
- For debugging: return indices of character and numeric arrays.

02 Jul 2014

- Simplify hexadecimal example.
- Correct output summary.

05 Aug 2014

- Add binary numeric parsing.
- Improve input checking.
- Replace multiple debugging output arrays with one cell array.
- Allow lookarounds in regular expression.

20 Dec 2014

- Update documentation only, improve examples.

Contact us