Code covered by the BSD License  

Highlights from
Customizable Natural-Order Sort

Be the first to rate this file! 28 Downloads (last 30 days) File Size: 5.08 KB File ID: #34464 Version: 1.8
image thumbnail

Customizable Natural-Order Sort

by

 

05 Jan 2012 (Updated )

Natural-order sort of a cell array of strings, with customizable numeric format.

Editor's Notes:

This file was selected as MATLAB Central Pick of the Week

| Watch this File

File Information
Description

To sort filenames or filepaths use "natsortfiles":
http://de.mathworks.com/matlabcentral/fileexchange/47434-natural-order-filename-sort
To sort the rows of a cell array of strings use "natsortrows":
http://de.mathworks.com/matlabcentral/fileexchange/47433-natural-order-row-sort

### Summary ###

A natural-order sort is a string-sort that takes into account the values of any numeric substrings occurring within those strings. Compare for example:

A = {'a2', 'a', 'a10', 'a1'};
sort(A)
 ans = {'a', 'a1', 'a10', 'a2'}
natsort(A)
 ans = {'a', 'a1', 'a2', 'a10'}

By default this function simply treats all consecutive digits as integer values, however it also provides optional user-control over the numeric substring recognition and parsing via a regular expression, allowing the numeric substrings to have:
* a +/- sign
* a decimal point and decimal fraction
* E-notation exponent
* decimal, octal, hexadecimal or binary notation
* prefixes/suffixes/literals which can be ignored
* Inf or NaN value
* any feature supported by regexp, including look-arounds, quantifiers, etc.

The numeric class can be chosen to suit the substrings' numeric data:
* double
* int
* uint

And of course the sorting itself can also be controlled:
* ascending/descending sort direction
* character case sensitivity/insensitivity
* relative order of numeric substrings vs. characters

### Examples ###

The default is for integer numeric substrings, as shown in the example in the introduction.

% Multiple numeric substrings (e.g. version numbers):
B = {'v10.6', 'v9.10', 'v9.5', 'v10.10', 'v9.10.20', 'v9.10.8'};
sort(B)
 ans = {'v10.10', 'v10.6', 'v9.10', 'v9.10.20', 'v9.10.8', 'v9.5'}
natsort(B)
 ans = {'v9.5', 'v9.10', 'v9.10.8', 'v9.10.20', 'v10.6', 'v10.10'}

% Integer, decimal or Inf numeric substrings, possibly with +/- signs:
C = {'test102', 'test11.5', 'test-1.4', 'test', 'test-Inf', 'test+0.3'};
sort(C)
 ans = {'test', 'test+0.3', 'test-1.4', 'test-Inf', 'test102', 'test11.5'}
natsort(C, '(-|+)?(Inf|\d+(\.\d+)?)')
 ans = {'test', 'test-Inf', 'test-1.4', 'test+0.3', 'test11.5', 'test102'}

% Integer or decimal numeric substrings, possibly with an exponent:
D = {'0.56e007', '', '4.3E-2', '10000', '9.8'};
sort(D)
 ans = {'', '0.56e007', '10000', '4.3E-2', '9.8'}
natsort(D, '\d+(\.\d+)?(e(+|-)?\d+)?')
 ans = {'', '4.3E-2', '9.8', '10000', '0.56e007'}

% Hexadecimal numeric substrings (possibly with '0X' prefix):
E = {'a0X7C4z', 'a0X5z', 'a0X18z', 'aFz'};
sort(E)
 ans = {'a0X18z', 'a0X5z', 'a0X7C4z', 'aFz'}
natsort(E, '(?<=a)(0X)?[0-9A-F]+', '%x')
 ans = {'a0X5z', 'aFz', 'a0X18z', 'a0X7C4z'}

% Binary numeric substrings (possibly with '0B' prefix):
F = {'a11111000100z', 'a0B101z', 'a0B000000000000011000z', 'a1111z'};
sort(F)
 ans = {'a0B000000000000011000z', 'a0B101z', 'a11111000100z', 'a1111z'}
natsort(F, '(0B)?[01]+', '%b')
 ans = {'a0B101z', 'a1111z', 'a0B000000000000011000z', 'a11111000100z'}

% uint64 numeric substrings (with full precision!):
natsort({'a18446744073709551615z', 'a18446744073709551614z'}, '\d+', '%lu')
 ans = {'a18446744073709551614z', 'a18446744073709551615z'}

% Case sensitivity:
G = {'a2', 'A20', 'A1', 'a10', 'A2', 'a1'};
natsort(G, '\d+', 'ignorecase') % default
 ans = {'A1', 'a1', 'a2', 'A2', 'a10', 'A20'}
natsort(G, '\d+', 'matchcase')
 ans = {'A1', 'A2', 'A20', 'a1', 'a2', 'a10'}

% Sort direction:
H = {'2', 'a', '3', 'B', '1'};
natsort(H, '\d+', 'ascend') % default
 ans = {'1', '2', '3', 'a', 'B'}
natsort(H, '\d+', 'descend')
 ans = {'B', 'a', '3', '2', '1'}

% Relative sort-order of numeric substrings compared to characters:
X = num2cell(char(32+randperm(63)));
cell2mat(natsort(X, '\d+', 'asdigit')) % default
 ans = '!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_'
cell2mat(natsort(X, '\d+', 'beforechar'))
 ans = '0123456789!"#$%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_'
cell2mat(natsort(X, '\d+', 'afterchar'))
 ans = '!"#$%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_0123456789'

Acknowledgements

Asort: A Pedestrian Alphanumeric String Sorter, Sort Nat: Natural Order Sort, Prefixed String Conversion (Si Or Binary), Date Vector/Number To Iso 8601 Date String, Iso 8601 Date String To Serial Date Number, Natural Order Row Sort, and Natural Order Filename Sort inspired this file.

This file inspired Iso 8601 Date String To Serial Date Number, Date Vector/Number To Iso 8601 Date String, Regular Expression Helper, Natural Order Filename Sort, Natural Order Row Sort, Numeric To English Words, and Number To Myriad.

Required Products MATLAB
MATLAB release MATLAB 7.11 (R2010b)
MATLAB Search Path
/
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Updates
14 Feb 2012 1.1

- Add examples showing different numeric tokens.
- Case-insensitive sort is now default.

24 Aug 2012 1.3

- Implement more compact sort algorithm.
- "sscanf" numeric format can be controlled by an optional input argument.
- Provide use examples.
- Output debugging arrays now char+numeric.

28 Apr 2014 1.4

- Now parses hexadecimal and octal substrings.
- int64 and uint64 parsed at full precision.
- Allow <options> in any order.
- For debugging: return indices of character and numeric arrays.

02 Jul 2014 1.5

- Simplify hexadecimal example.
- Correct output summary.

05 Aug 2014 1.6

- Add binary numeric parsing.
- Improve input checking.
- Replace multiple debugging output arrays with one cell array.
- Allow lookarounds in regular expression.

20 Dec 2014 1.7

- Update documentation only, improve examples.

25 Feb 2015 1.8

* Improved binary substring parsing.
* Better examples.

Contact us