File Exchange

image thumbnail

Natural-Order Filename Sort

version 1.5 (13.2 KB) by

Natural-order sort of filenames or filepaths, with customizable numeric format.

203 Downloads

Updated

View License

Editor's Note: This file was selected as MATLAB Central Pick of the Week

To sort all of the strings in a cell array use NATSORT:
http://www.mathworks.com/matlabcentral/fileexchange/34464-customizable-natural-order-sort
To sort the rows of a cell array of strings use NATSORTROWS:
http://www.mathworks.com/matlabcentral/fileexchange/47433-natural-order-row-sort

### Summary ###

Alphanumeric sort of a cell array of strings, where the strings are filenames or filepaths. Sorts the strings taking into account the values of any numeric substrings occurring within those strings. Compare for example:

A = {'a2.txt', 'a10.txt', 'a1.txt'};
sort(A)
ans = 'a1.txt' 'a10.txt' 'a2.txt'
natsortfiles(A)
ans = 'a1.txt' 'a2.txt' 'a10.txt'

By default NATSORTFILES simply treats all consecutive digits as integer values, however number recognition can be controlled by a regular expression: this allows decimal digits, +/- sign, exponent, binary, octal, or hexadecimal notation, and more. There are also options for controlling the sort direction and case sensitivity. See NATSORT for details.

NATSORTFILES does not perform a naive natural-order sort, but sorts the filenames and file extensions separately so that the file extension character does not influence the sort output. This ensures a dictionary sort, where shorter filenames always sort before longer ones. Likewise filepaths are split at each file-separator character, and each level of the file hierarchy is sorted separately, so that the file-separator character does not affect the sort results.

### Example with DIR and Cell Array ###

D = 'C:\Test';
S = dir(fullfile(D,'*.txt'));
N = natsortfiles({S.name});
for k = 1:numel(N)
fullfile(D,N{k})
end

### File Dependency ###

The natural-order sort is provided by the function NATSORT (File Exchange 34464). All of NATSORT's optional inputs are also supported by NATSORTFILES, eg: to define a regular expression that matches the numeric substrings, select case sensitivity, etc.

### Explanation ###

The period character has ASCII value 46, which means with a standard sort all of the characters with values 0:45 (including !"#$%&'()*+,- and space) will be sort before the period character. Because the period is used as the file separator character, applying a standard sort to filenames can result in some longer filenames sorting before shorter filenames, as shown here:

B = {'test_new.m'; 'test-old.m'; 'test.m'};
sort(B)
ans =
'test-old.m'
'test.m'
'test_new.m'

Sorting the filenames and file extensions separately allows the filenames to be sorted into dictionary order, with shorter names before longer names:

natsortfiles(B)
ans =
'test.m'
'test-old.m'
'test_new.m'

NATSORTFILES performs this together with a natural-order sort, so the values of any numeric substrings are also taken into account in the sort:

C = {'test2.m'; 'test10-old.m'; 'test.m'; 'test10.m'; 'test1.m'};
sort(C) % Wrong numeric order:
ans =
'test.m'
'test1.m'
'test10-old.m'
'test10.m'
'test2.m'
natsortfiles(C) % Correct numeric order, shorter names before longer:
ans =
'test.m'
'test1.m'
'test2.m'
'test10.m'
'test10-old.m'

### Filepaths Too! ###

This problems also affects filepaths, where the path separator character can affect the sort order. Without taking into account the file-separator character longer directory names can sort before shorter ones, and files and directories are mixed together:

D = {'A2-old\test.m';'A10\test.m';'A2\test.m';'A1archive.zip';'A1\test.m'};
sort(D) % Wrong numeric order, and '-' sorts before '\':
ans =
'A10\test.m'
'A1\test.m'
'A1archive.zip'
'A2-old\test.m'
'A2\test.m'

NATSORTFILES sorts shorter directory names sort before longer (just like a dictionary), and preserves the subdirectory hierarchy:

natsortfiles(D) % Shorter names before longer:
ans =
'A1archive.zip'
'A1\test.m'
'A2\test.m'
'A2-old\test.m'
'A10\test.m'

Because '\' is treated as a character, a naive sort mixes the subdirectory hierarchy, and longer directory names may come before shorter ones.

Comments and Ratings (14)

Ayu Dyah

thanks a lot!

akb akbar

Thanks a lot,
really helped me.

hugstone

thanks very much!

Sanggon Kim

Thank you very much.

sid.sapien

Thanks, was very handy.

Stephen Cobeldick

@Will: it is not so easy "to list files completely like Windows". Windows places non-alphanumeric printing characters first, but their order is not clearly specified: do you know why Windows sorts '~' before '+', and '_' before '=', even though this is not their character code order? I don't. What other special character orders does Windows have? Can you show me where this order is specified?

Then there is the question of _which_ Windows sort order: the file explorer sort order or the powershell sort order? The Win2K sort order, the MS Excel sort order, or the Vista/Win7 sort order? They are all different... and then why just Windows OS, what about other major OS's? Which ones?

Not only is the "Windows sort order" very vaguely defined, this change would actually break NATSORT: I clearly state that NATSORT sorts according to two simple criteria: character code and numeric value. This means NATSORT can provide the same sort as many other "Natural Order Sort" functions written in other languages (do an internet search), which all sort according to the same basic rules as my submission and can provide the same sort order using the very precisely defined character code order. What you are suggesting is to replace the openly-defined and universally-known character code order with a badly-defined proprietary sort order. Interesting...

But it is certainly _possible_ to write a function that does this: show me a reference that defines the "Windows sort order" and then I can help you.

Will

Will (view profile)

This was really useful, however I've just found that where Windows will list files with "_" higher than those with numbers, natsort and natsortfiles will list "_" after numbers and letters.

E.g. Windows lists:
_File
1. File 1
2. File 2
a File
B File

And in MATLAB:

>> natsort({'1. File 1';'2. File 2';'a File';'B File';'_File'})

ans =

'1. File 1'
'2. File '
'a File'
'B File'
'_File'

Could you please implement this in an update? Or advise how to customise the functions to list files completely like Windows?

Ulysses C

Changjie Guan

It's really useful! Thx.

Moses

Moses (view profile)

Wow, my hats off to you sir. Great function and documentation. Many Thanks!

Moses

Moses (view profile)

sada shiva

Stelios

Thanks, that's exactly what I was looking for to sort namefiles formatted as
1-d...,2-d...,...18-d...etc.

Updates

1.5

* Minor help edit.

1.5

* Add HTML documentation.

1.5

* Improve input checking.
* Include NATSORT function.

1.4

* Clearer description of file dependency.
* Improve example of filepath sorting.

1.3

* Improve function description.
* Better examples.

1.2

- Update documentation only, improve examples.

1.1

- Complete acknowledgements.

MATLAB Release
MATLAB 7.11 (R2010b)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video

Win prizes and improve your MATLAB skills

Play today