Difficulty with using sscanf to perform natural order sort on filename list

3 views (last 30 days)
Mike on 14 Mar 2022
Edited: Stephen23 on 14 Mar 2022
I have a list of complex file names that I am trying so sort with natural order (e.g. TC10 should come after TC 1 and 2, not before as with alphabetical sort). I have found the natorder downloadable tool in other solutions, however the machine I need it on is not internet enabled and I'm having difficulty narrowing down the relevant portion of the code when I open the tool.
I did find an author with a solution here, whose code snippit I will paste below for viewing ease:
% create some example names
filenames = {'xx_1_yy.txt','xx_8_yy.txt','xx_10_yy.txt','xx_2_yy.txt'}
sort(filenames) % wrong!
% extract the numbers
filenum = cellfun(@(x)sscanf(x,'xx_%d_yy.txt'), filenames)
% sort them, and get the sorting order
[~,Sidx] = sort(filenum)
% use to this sorting order to sort the filenames
SortedFilenames = filenames(Sidx)
My issue is that I cannot get sscanf to work properly.
My filenames are in the format rX_longExtraText. The first letter is always 'r' followed by a number X followed by and underscore with copious amounts of alphanumeric text. I only care about getting the one or two digit number X.
I've tried to adapt this to the following code below with placeholder filenames
filenames = {'r10_alkx7b_kn32wikn','r8_kkasdmn0_kds','r1_acvbwb9_892dsf','r2_kak3_827d'}
filenum = cellfun(@(x)sscanf(x,'r%i_%c'), filenames)
[~,Sidx] = sort(filenum)
SortedFilenames = filenames(Sidx)
My thought was to place the r first as it is always there followed by %i for whatever integer follows, the underscore as it is always there and then %c for the rest of the chars I don't care about.
sscanf yields no output for this format. When I try using the MATLAB helpdocs examples it works just fine but other more simpler attempts to pull literally anything out on the same text still yields no results.
I feel like I'm missing something obvious here but I just can't get there on my own.
  1 Comment
Stephen23 on 14 Mar 2022
"however the machine I need it on is not internet enabled..."
"I feel like I'm missing something obvious here"
Ummm... the machine does not need to be internet connected for you to use FEX submissions: simply click on the big blue "DOWNLOAD" button on the top right-hand side of the webpage, and you will get a ZIP file with everything in it:
Unless you have a burning desire to spend your time on this, downloading will be simpler than reinventing the wheel.

Sign in to comment.

Answers (2)

Stephen23 on 14 Mar 2022
Edited: Stephen23 on 14 Mar 2022
Your question makes it unclear if you are looking for a general solution or a solution that is tailored for those specific filenames (probably simpler). Nor do you make it clear if you want to sort only by the numeric part, or also the trailing alphanumeric text. Nor do you mention how many filenames you have.
If you only want to sort by the numeric part after the 'r', then this is easy using SSCANF:
F = {'r10_alkx7b_kn32wikn','r8_kkasdmn0_kds','r1_acvbwb9_892dsf','r2_kak3_827d'};
V = cellfun(@(s)sscanf(s,'r%d'),F)
V = 1×4
10 8 1 2
[~,X] = sort(V);
G = F(X)
G = 1×4 cell array
{'r1_acvbwb9_892dsf'} {'r2_kak3_827d'} {'r8_kkasdmn0_kds'} {'r10_alkx7b_kn32wikn'}
Personally I would prefer a more efficient approach without CELLFUN, e.g.:
V = sscanf(sprintf('%s/',F{:}),'r%d%*[^/]/')
V = 4×1
10 8 1 2
If you also need to sort by the alphanumeric part, then probably using regular expressions is the next step, SSCANF by itself is just not made for splitting up text in that way. At that point you are most of the way to a general solution... remember to sort the split text/numeric in reverse order!
PS: SSCANF starts right at the start of the string. It will stop as soon as something does not match the format string. So your attempts fails at the very first character 'r', because 'r' does is not able to be converted to numeric using the format that you specified. You can easily add a literal 'r' to the start of the format string:
ans = 10
ans = 10
sscanf('r10_alkx7b_kn32wikn','r%c') % this returns one character, not a numeric
ans = '1'

Benjamin Thompson
Benjamin Thompson on 14 Mar 2022
You might have better luck with grep, but this appears to work with your test cases. sscanf stops at the first character that cannot be converted in accordance with your formatting string, so you have to add a %c for it to read through your 'r' character first.
>> sscanf('r10_alkx7b_kn32wikn','%c%d%c')
ans =




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!