List of files sorting

139 views (last 30 days)
Jean
Jean on 18 Aug 2011
Edited: Stephen on 18 Apr 2021
Hello,
I have some mat files in a folder. The files are named form r_000.mat to r_1200.mat. I use the command list = dir(fullfile(cd, '*.mat')) to retrieve the name of the files. The problem is that Matlab returns a list of files that is not in the correct order, in spite of the fact that Windows arranges the files correctly: for example element 107 has to be r_107.mat, but in reality is r_1005.mat.
Is there a command to read the name of the files in the correct order?
Best regards, Jean
  2 Comments
Jan
Jan on 18 Aug 2011
The order DIR replies the file names is is not well defined. Although it is sorted alphabetically in all tests I have performed, this is not explicitely documented anywhere. This concerns the DIR command of the OS also. Therefore I do not think that there is a "correct" order, but only an "expected" order depending on the actual taste.

Sign in to comment.

Accepted Answer

Stephen
Stephen on 20 Nov 2015
Edited: Stephen on 18 Apr 2021
You could download my FEX submission natsortfiles, which was written to deal with this exact problem:
S = dir('*.txt');
S.name
ans = '1.txt'
ans = '10.txt'
ans = '2.txt'
S = natsortfiles(S); % alphanumeric sort by filename
S.name
ans = '1.txt'
ans = '2.txt'
ans = '10.txt'
  1 Comment
Mihir Gajjar
Mihir Gajjar on 7 Mar 2017
Thank You, this helped me a lot. :)

Sign in to comment.

More Answers (3)

Jan
Jan on 18 Aug 2011
To sort the file names according to the numerical order:
list = dir(fullfile(cd, '*.mat'));
name = {list.name};
str = sprintf('%s#', name{:});
num = sscanf(str, 'r_%d.mat#');
[dummy, index] = sort(num);
name = name(index);
See also:
For standard problems it is always a good idea to search in der FEX at first...
  1 Comment
ABDULRAHMAN HAJE KARIM ALNAJAR
This is the solution for my case.

Sign in to comment.


Fangjun Jiang
Fangjun Jiang on 18 Aug 2011
That is because some of your files have 3 digital letters (r_000.mat) in the file name but others have 4 digital letters (r_1200.mat). There is no direct way to make it list in the right order. You probably need to rename the files to have the same number of digital letters.
Follow the example in this post.
  1 Comment
Burkely Pettijohn
Burkely Pettijohn on 20 Nov 2015
^yeah^
r_107 > r_1005 in matlab but r_107 < r_1005 in windows.
Changing the filename to r_0107 will solve it

Sign in to comment.


Walter Roberson
Walter Roberson on 19 Aug 2011
dir lists the files and folders in the MATLAB current folder. Results appear in the order returned by the operating system.
So, the order is explicitly documented: it is whatever the OS returns, which is not certain to be any kind of alphanumeric order.
The order depends upon the order returned by the POSIX readdir() http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir.html which presents a lot of unknowns:
If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified.
The directory entries for dot and dot-dot are optional. This volume of IEEE Std 1003.1-2001 does not provide a way to test a priori for their existence because an application that is portable must be written to look for (and usually ignore) those entries. Writing code that presumes that they are the first two entries does not always work, as many implementations permit them to be other than the first two entries, with a "normal" entry preceding them.
Basically, then, the order returned may be whatever order the internal directory structure uses to order the files. Unix System V filesystems usually did not sort directory entries on disk, and instead would read as many blocks of the directory as needed to find the file they were after, and when writing a new directory entry would use the first available empty slot. This reflected the reality at the time that writing to hard disks was expensive and that keeping the on-disk entries sorted required rewriting half of the blocks representing the directory (presuming uniform random access.) These days with on-controller disk-write queues that optimize drive head movement, it is usually found more efficient to either sort all the entries each time anything in the directory is touched, or else to use an ordered tree system rather than a linear search system. But the ordered tree system doesn't promise alphabetical order in reading the entries: it could instead choose to return the order of the leaf nodes...
In short: if your code assumes that '.' and '..' are the first two entries in a directory, your code has a bug (even in MS Windows). If your code assumes that directory entries are returned in any sorted order, your code has a bug (in all OS.)
  1 Comment
Jan
Jan on 20 Aug 2011
@Walter: You are right: it is "documented", but "presents a lot of unknowns". This is more exact than my formulation "not explicitly documented".

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!