Does 'dir' always return sorted names?

11 views (last 30 days)
James Ramm
James Ramm on 16 Oct 2012
Commented: Walter Roberson on 16 May 2023
Hi I can use listing=dir('folder') to list files & folders in a directory, but I am not clear as to whether the resulting listing.name is always sorted?
I have a folder containing a number of subfolders named by date (i.e '20121002') and it would appear they are always listed in numerical order. However, as this is this function will be used more generally by students I cant be sure...

Answers (1)

Image Analyst
Image Analyst on 16 Oct 2012
Edited: Image Analyst on 16 May 2023
Help says "Results appear in the order returned by the operating system." so to be 100% sure, you'd better sort() them before using them if you want them sorted.
% Get a list of all files in the folder starting with '20'.
dirList = dir('20*.*')
% Get all filenames, including files and folders.
allFilenames = {dirList.name}
% Extract only folders names, not file names.
allFolderNames = allFilenames([dirList.isdir])
% Sort alphanumerically. Cast to lower case so lower case letters
% won't all come after all the upper case letters.
[~, sortOrder] = sort(lower(allFolderNames))
% Extract in case insensitive order.
allFolderNames = allFolderNames(sortOrder)
  4 Comments
Walter Roberson
Walter Roberson on 15 May 2023
When I chased this a few years ago:
The order returned by NTFS turned out to depend upon the Language & Region that was in effect at the time the administrator created the NTFS file system. It was not something the person creating the file system specifically set, and it did not depend upon the Language and Region of the user . For example is the sort order ab then eventually ä ? Or is it aäb ? That depends on what the person who created the file system was using (different languages sort in different orders.)
Walter Roberson
Walter Roberson on 16 May 2023
Historically, Unix-like filesystems had fixed-length directory entries that contained information about the file size, the starting location on disk, and the file name. An unused directory entry was indicated by the first character of the file name being a binary 0. When a new file was to be added, the directory would be read block by block, and the content of the block would be scanned looking for an unused directory entry, and then the information would be written there; if the process got to the end of the list and they had all been used, the new entry would be added after the last (potentially requiring adding a new directory block.) This process results in the internal order of the files being unsorted.
More efficient methods were developed later, especially tree data structures, where you could tell in log2(n) time whether a particular file was present or not instead of having to sometimes read all (n) entries to find (or not find) it. But the odd thing about tree data structures is that if you start at the top and descend them, reporting the file names as you go, then the files appear to be out of order: to get sorted order you have to traverse left/back-to-center/right/back-to-center/back-to-center and so on. And in some linux file systems, in-order-encountered is what is reported by the file system instead of the file system doing a full graph traversal in order to deliberately return sorted order.
Not bothering to sort is more efficient for file systems. Though with disks being slower than memory, some file systems have a policy of optimizing the information within any one block whenever the block is re-written.

Sign in to comment.

Categories

Find more on File Operations in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!