File Exchange

image thumbnail

Customizable Natural-Order Sort

version 1.10 (11.7 KB) by

Natural-order sort of a cell array of strings, with customizable numeric format.

33 Downloads

Updated

View License

Editor's Note: This file was selected as MATLAB Central Pick of the Week

To sort filenames or filepaths use NATSORTFILES:
http://www.mathworks.com/matlabcentral/fileexchange/47434-natural-order-filename-sort
To sort the rows of a cell array of strings use NATSORTROWS:
http://www.mathworks.com/matlabcentral/fileexchange/47433-natural-order-row-sort

### Summary ###

Alphanumeric sort of a cell array of strings. Sorts the strings taking into account the values of any numeric substrings occurring within those strings. Compare for example:

A = {'a2', 'a10', 'a1'};
sort(A)
ans = 'a1' 'a10' 'a2'
natsort(A)
ans = 'a1' 'a2' 'a10'

By default NATSORT simply treats all consecutive digits as integer values, however NATSORT also provides optional user-control over the numeric substring recognition and parsing via a regular expression, allowing the numeric substrings to have:
* a +/- sign
* a decimal point and decimal fraction
* E-notation exponent
* decimal, octal, hexadecimal or binary notation
* prefixes/suffixes/literals which can be ignored
* Inf or NaN value
* any feature supported by regexp, including look-arounds, quantifiers, etc.

The numeric class can be chosen to suit the substrings' numeric data:
* double
* int
* uint

And of course the sorting itself can also be controlled:
* ascending/descending sort direction
* character case sensitivity/insensitivity
* relative order of numeric substrings vs. characters

### Examples ###

The default is for integer numeric substrings, as shown in the example in the introduction.

% Multiple number substrings (e.g. release version numbers):
B = {'v10.6', 'v9.10', 'v9.5', 'v10.10', 'v9.10.20', 'v9.10.8'};
sort(B)
ans = 'v10.10' 'v10.6' 'v9.10' 'v9.10.20' 'v9.10.8' 'v9.5'
natsort(B)
ans = 'v9.5' 'v9.10' 'v9.10.8' 'v9.10.20' 'v10.6' 'v10.10'

% Integer, decimal or Inf number substrings, possibly with +/- signs:
C = {'test+Inf', 'test11.5', 'test-1.4', 'test', 'test-Inf', 'test+0.3'};
sort(C)
ans = 'test' 'test+0.3' 'test+Inf' 'test-1.4' 'test-Inf' 'test11.5'
natsort(C, '(-|+)?(Inf|\d+(\.\d+)?)')
ans = 'test' 'test-Inf' 'test-1.4' 'test+0.3' 'test11.5' 'test+Inf'

% Integer or decimal number substrings, possibly with an exponent:
D = {'0.56e007', '', '4.3E-2', '10000', '9.8'};
sort(D)
ans = '' '0.56e007' '10000' '4.3E-2' '9.8'
natsort(D, '\d+(\.\d+)?(E(+|-)?\d+)?')
ans = '' '4.3E-2' '9.8' '10000' '0.56e007'

% Hexadecimal number substrings (possibly with '0X' prefix):
E = {'a0X7C4z', 'a0X5z', 'a0X18z', 'aFz'};
sort(E)
ans = 'a0X18z' 'a0X5z' 'a0X7C4z' 'aFz'
natsort(E, '(?<=a)(0X)?[0-9A-F]+', '%x')
ans = 'a0X5z' 'aFz' 'a0X18z' 'a0X7C4z'

% Binary number substrings (possibly with '0B' prefix):
F = {'a11111000100z', 'a0B101z', 'a0B000000000011000z', 'a1111z'};
sort(F)
ans = 'a0B000000000011000z' 'a0B101z' 'a11111000100z' 'a1111z'
natsort(F, '(0B)?[01]+', '%b')
ans = 'a0B101z' 'a1111z' 'a0B000000000011000z' 'a11111000100z'

% uint64 number substrings (with full precision!):
natsort({'a18446744073709551615z', 'a18446744073709551614z'}, [], '%lu')
ans = 'a18446744073709551614z' 'a18446744073709551615z'

% Case sensitivity:
G = {'a2', 'A20', 'A1', 'a10', 'A2', 'a1'};
natsort(G, [], 'ignorecase') % default
ans = 'A1' 'a1' 'a2' 'A2' 'a10' 'A20'
natsort(G, [], 'matchcase')
ans = 'A1' 'A2' 'A20' 'a1' 'a2' 'a10'

% Sort direction:
H = {'2', 'a', '3', 'B', '1'};
natsort(H, [], 'ascend') % default
ans = '1' '2' '3' 'a' 'B'
natsort(H, [], 'descend')
ans = 'B' 'a' '3' '2' '1'

% Relative sort-order of number substrings compared to characters:
X = num2cell(char(32+randperm(63)));
cell2mat(natsort(X, [], 'asdigit')) % default
ans = '!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_'
cell2mat(natsort(X, [], 'beforechar'))
ans = '0123456789!"#$%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_'
cell2mat(natsort(X, [], 'afterchar'))
ans = '!"#$%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_0123456789'

Comments and Ratings (4)

Junjie Wang

qiap chen

It's good tool

Chang hsiung

Updates

1.10

* Minor help edit.

1.10

* Add HTML documentation.

1.10

* Improve input checking.

1.9

* Improve binary numeric handling.
* Improve handling of skipped fields.
* Add an example of skipped field usage.

1.8

* Improved binary substring parsing.
* Better examples.

1.7

- Update documentation only, improve examples.

1.6

- Add binary numeric parsing.
- Improve input checking.
- Replace multiple debugging output arrays with one cell array.
- Allow lookarounds in regular expression.

1.5

- Simplify hexadecimal example.
- Correct output summary.

1.4

- Now parses hexadecimal and octal substrings.
- int64 and uint64 parsed at full precision.
- Allow <options> in any order.
- For debugging: return indices of character and numeric arrays.

1.3

- Implement more compact sort algorithm.
- "sscanf" numeric format can be controlled by an optional input argument.
- Provide use examples.
- Output debugging arrays now char+numeric.

1.1

- Add examples showing different numeric tokens.
- Case-insensitive sort is now default.

MATLAB Release
MATLAB 7.11 (R2010b)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video

Win prizes and improve your MATLAB skills

Play today