Use of dir - too slow!
21 views (last 30 days)
Show older comments
I have a main folder on a network containing a lot of subfolders(~1000), each subfolder has ~1000 DICOM files as well. My code needs to find a string in the DICOM header fields. All the files for each subfolder will have the same field so I only need to compare a file of each subfolder...but the problem I find is that for each subfolder I have to user the dir command and that is time consuming.
My code is:
all_folders=dir(path_browse); %struct containing every folder
no_folders=length(all_folders)-2; %number of folders, excluding '.' and '..'
for i=1:no_folders
name_folder=all_folders(i+2).name; %subfolder to find match
aux_dir=dir(name_folder); %files in subfolder
cd(name_folder) %moves to subfolder
test_file=dicominfo(aux_dir(3,1).name); %DICOM header from first file in the folder
search_field(i)=strcmp(lower(test_file.field),field_query); %compare fields
cd(path_browse) %back to main folder
end
Then I would just need to find the 1s in search_field. Is there any option to open a file without using dir or ls? The code works but I want it to be more efficient.
Regards,
Sergio
1 Comment
Stephen23
on 15 Feb 2018
Edited: Stephen23
on 15 Feb 2018
"I have to user the dir command and that is time consuming."
How do you know that dir is the bottleneck? I can see two cd calls in that code: cd makes debugging harder and is slower than using relative/absolute filepaths.
"Is there any option to open a file without using dir or ls"
It is not required to use dir or ls before opening a file: it is also possible to generate filenames from some sequence. Which method to use depends on those filenames, and how much you know about them. Read the MATLAB documentation to know more:
"The code works but I want it to be more efficient."
Then get rid of cd by using absolute/relative paths, and run the profiler so that you can show us which lines are taking the most time.
Accepted Answer
Jan
on 15 Feb 2018
Do you have any evidence that dir is the time consuming command? This is not likely, but it could happen if you work on a network drive which is connected over a slow connection. Even then dir is not the problem, but the connection.
It is not documented, that '.' and '..' are the first 2 replies of dir. So better remove these special names explicitly.
% UNTESTED CODE!
all_folders = dir(path_browse);
all_folders(ismember({all_folders.name}, {'.', '..'})) = []; % exclude '.' and '..'
no_folders = numel(all_folders);
search_field = false(1, no_folders); % Pre-allocate!!!
for k = 1:no_folders
name_folder = fullfile(path_browse, all_folders(k).name); % subfolder to find match
aux_dir = dir(name_folder); % files in subfolder
aux_dir(ismember({aux_dir.name}, {'.', '..'})) = [];
test_file = dicominfo(fullfile(name_folder, aux_dir(1).name));
search_field(k) = strcmpi(test_file.field, field_query); %compare fields
end
This is the method to use absolute paths instead of hopping through the disk by cd().
strcmpi(a,b) is faster and nicer than strcmp(lower(a), b).
I assume, that this is not much faster than your version, because the most time is spent in dicominfo. But the code is safer.
More Answers (0)
See Also
Categories
Find more on File Operations in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!