fopen cannot read directory names in scientific format
Show older comments
Hi everyone, I have some data that is stored as .csv file in different directories. After looking through this forum, I have managed to find a way of extracting the relevant .csv file from the different directories, using the code provided in this post https://uk.mathworks.com/matlabcentral/answers/278950-i-m-trying-to-write-a-code-that-import-several-data-from-multiple-folder.
The code works fine until it encounters a directory name in scientific notation and it stops with the error:
Error using textscan
Invalid file identifier. Use fopen to generate a valid file identifier.
Error in multipleDirectories_subfolders_test (line 27)
MyData{end+1} = textscan(fileID, formatSpec, 'Delimiter', ',', 'HeaderLines',startRow-1); % format depends on files % this line might need changing
The code is attached. I've also tried using csvread but I still run into problems with this. After setting a few breakpoints in the code I've identified the problem to be the format of some directories being in scientific notation as shown in the following screenshot:

The code reads perfectly fine the .csv files in the 0, 0.0001, 0.00102, 0.000104, 0.000106 directories but the file identifier becomes invalid for the remaining directories.
Does anyone know how to solve this? It looks like fopen can only deal with integer type directories so the error is not due to textscan or csvread.
I'm really at a loss as to how to import the data otherwise.
p.s. The directories are by default saved in that format from a separate software and there's hundreds of them so changing the name manually is not really an option.
19 Comments
Geoff Hayes
on 3 Jun 2020
Jacqueline - on my Mac with R2014a, I was able to open a text file from a folder named 1.2e-05. Are you sure that these folders have files named opposite_01um_grad(U).csv? (I guess that this is supposed to be true for all folders?)
No problems with MATLAB R2012b on Win10:
>> D = '3.69e-05';
>> mkdir(D)
>> dlmwrite(fullfile(D,'test.csv'),[1,2,3],' ')
>> fid = fopen(fullfile(D,'test.csv'),'rt'); fscanf(fid,'%f'), fclose(fid);
ans =
1
2
3
What MATLAB version and OS are you using? Given that the folder names change number format depending on their values, perhaps you chould check those filenames too.
Jacqueline Mifsud
on 3 Jun 2020
Mohammad Sami
on 3 Jun 2020
you can use the dir function to get a list of all files in a specified directory.
files = dir(fullfile(D,'*.csv'));
"Is there a way ... such that fopen would read any .csv files contained in the respective directories?"
File reading/writing functions all require the explicit name of one particular file. They do not accept wildcard characters.
But you can easily call dir and use its output, e.g. inside your loop:
S = dir(fullfile(mypath,SubFold(i).name,'*.csv')); % DIR with wildcard
assert(numel(S)==1,'Only one CSV file is allowed!')
filetoread = fullfile(mypath,SubFold(i).name,S.name);
If you expect multiple CSV files in those folders then you need to decide how to handle them, e.g. use a nested loop to process them all, or filter for one file using a particular name pattern, or skip that folder, etc. We cannot decide that for you.
Jacqueline Mifsud
on 3 Jun 2020
Jacqueline Mifsud
on 3 Jun 2020
"Is it possible to make the loop access subfolders sequentially..."
Given those folder names, you would have to import the names into MATLAB, convert them into numeric or times (e.g. using datetime or duration), sort the numeric/datetime/duration values and get the indices, then use those indices to finally sort the folder names.
One way to do that would be to download my FEX submission natsortfiles and use that:
for which you will need to use a regular expression to match your numbers, e.g. '\d+\.?\d*(e[-+]?\d+)?'
"...or another clever way to reorganise the data?"
If you used ISO8601 timestamps then the the OS would return them in the desired order, as would any trivial character sort of them. Basically if you designed the names a bit better, then your code and file processing is much simpler (in fact, you really wouldn't have to do any sorting at all).
In practice this would mean fixed-width time values complete with leading zeros and no e-notation.
Jacqueline Mifsud
on 3 Jun 2020
Jacqueline Mifsud
on 3 Jun 2020
One of the first steps that natsortfiles does with the input names is to use fileparts to split the names into a filename and a file extension. fileparts splits at the last period/dot character in the name. So some of your folder names are treated as consisting of two parts which are split at the decimal point (one part is the "filename", the other the "extension"), and those two parts are then sorted separately. Unfortunately your use-case is not a scenario I considered, so thank you for discovering that!
You can convert and sort the numeric values yourself, e.g. something like this:
vec = str2double(SubFoldName(1,:)); % convert to numeric
[~,idx] = sort(vec);
sortedNames = SubFoldName(1,idx)
Tips on code: assuming that the first two elements returned by dir are the folder names '.' and '..' is fragile at best and buggy at worst. You should remove those folder names explicitly using setdiff or ismember, e.g.:
S = dir(mypath);
C = {S([S.isdir]).name};
C = setdiff(C,{'.','..'});
Jacqueline Mifsud
on 4 Jun 2020
Jacqueline Mifsud
on 4 Jun 2020
Yes, all you need is some indexing like what you showed in your last comment, no loops or cellfun is required.
More commonly the sort would be applied to the file/folder names before the loop, because then all of the processing and allocation of the imported data automatically occurs in the correct order, and everything matches up.
Applying the sort is also possible after the loop, but increases the risk of different arrays getting out of synch. For example, now your data and folder name arrays are probably in different orders. I recommend avoiding this approach.
Jacqueline Mifsud
on 4 Jun 2020
Jacqueline Mifsud
on 5 Jun 2020
Edited: Jacqueline Mifsud
on 5 Jun 2020
Of course zero is also a perfectly valid value, so it will also get plotted, potentially convering up data underneath.
You could try using NaN instead of zero: NaN values are not plotted, so this is a common way to provide gaps in plot lines or similar. I don't know how it will work with countour, but it is worth a try.
Jacqueline Mifsud
on 5 Jun 2020
Edited: Jacqueline Mifsud
on 5 Jun 2020
Stephen23
on 19 Jan 2022
Regarding this comment:
NATSORTFILES now supports a 'noext' option, which does not split the names at (any) final dot character.
Answers (0)
Categories
Find more on Whos in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!