MATLAB Answers

Jean
0

List of files sorting

Asked by Jean
on 18 Aug 2011
Latest activity Edited by Mihir Gajjar on 7 Mar 2017
Hello,
I have some mat files in a folder. The files are named form r_000.mat to r_1200.mat. I use the command list = dir(fullfile(cd, '*.mat')) to retrieve the name of the files. The problem is that Matlab returns a list of files that is not in the correct order, in spite of the fact that Windows arranges the files correctly: for example element 107 has to be r_107.mat, but in reality is r_1005.mat.
Is there a command to read the name of the files in the correct order?
Best regards, Jean

  2 Comments

Are you sure Windows arranges the files as you wanted? I think it is the same as you got from MATLAB.
Jan
on 18 Aug 2011
The order DIR replies the file names is is not well defined. Although it is sorted alphabetically in all tests I have performed, this is not explicitely documented anywhere. This concerns the DIR command of the OS also. Therefore I do not think that there is a "correct" order, but only an "expected" order depending on the actual taste.

Sign in to comment.

4 Answers

Answer by Stephen Cobeldick on 20 Nov 2015
Edited by Stephen Cobeldick on 20 Nov 2015
 Accepted Answer

My FEX submission natsortfiles was written to deal with this exact problem:
It is called just like the sort function, with a cell array of filenames/paths:
B = {'test2.m'; 'test10-old.m'; 'test.m'; 'test10.m'; 'test1.m'};
natsortfiles(B) % correct numeric order and shortest first:
ans = {
'test.m'
'test1.m'
'test2.m'
'test10.m'
'test10-old.m'}

  1 Comment

Thank You, this helped me a lot. :)

Sign in to comment.


Answer by Fangjun Jiang on 18 Aug 2011

That is because some of your files have 3 digital letters (r_000.mat) in the file name but others have 4 digital letters (r_1200.mat). There is no direct way to make it list in the right order. You probably need to rename the files to have the same number of digital letters.
Follow the example in this post.

  1 Comment

^yeah^
r_107 > r_1005 in matlab but r_107 < r_1005 in windows.
Changing the filename to r_0107 will solve it

Sign in to comment.


Answer by Jan
on 18 Aug 2011

To sort the file names according to the numerical order:
list = dir(fullfile(cd, '*.mat'));
name = {list.name};
str = sprintf('%s#', name{:});
num = sscanf(str, 'r_%d.mat#');
[dummy, index] = sort(num);
name = name(index);
See also:
For standard problems it is always a good idea to search in der FEX at first...

  0 Comments

Sign in to comment.


Answer by Walter Roberson
on 19 Aug 2011

dir lists the files and folders in the MATLAB current folder. Results appear in the order returned by the operating system.
So, the order is explicitly documented: it is whatever the OS returns, which is not certain to be any kind of alphanumeric order.
The order depends upon the order returned by the POSIX readdir() http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir.html which presents a lot of unknowns:
If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified.
The directory entries for dot and dot-dot are optional. This volume of IEEE Std 1003.1-2001 does not provide a way to test a priori for their existence because an application that is portable must be written to look for (and usually ignore) those entries. Writing code that presumes that they are the first two entries does not always work, as many implementations permit them to be other than the first two entries, with a "normal" entry preceding them.
Basically, then, the order returned may be whatever order the internal directory structure uses to order the files. Unix System V filesystems usually did not sort directory entries on disk, and instead would read as many blocks of the directory as needed to find the file they were after, and when writing a new directory entry would use the first available empty slot. This reflected the reality at the time that writing to hard disks was expensive and that keeping the on-disk entries sorted required rewriting half of the blocks representing the directory (presuming uniform random access.) These days with on-controller disk-write queues that optimize drive head movement, it is usually found more efficient to either sort all the entries each time anything in the directory is touched, or else to use an ordered tree system rather than a linear search system. But the ordered tree system doesn't promise alphabetical order in reading the entries: it could instead choose to return the order of the leaf nodes...
In short: if your code assumes that '.' and '..' are the first two entries in a directory, your code has a bug (even in MS Windows). If your code assumes that directory entries are returned in any sorted order, your code has a bug (in all OS.)

  1 Comment

Jan
on 20 Aug 2011
@Walter: You are right: it is "documented", but "presents a lot of unknowns". This is more exact than my formulation "not explicitly documented".

Sign in to comment.