MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Test existence of files with EXIST

Asked by Jan Simon on 4 Nov 2012

Actually the command exist(FileName, 'file') seems to be sufficient to check the existence of a file. Therefore I used this code to check, if the input of a function is an existing file (thanks to David who has found the bug):

```function Hash = DataHash(Data)
...
if exist(Data, 'file') ~= 2
end
```

The help text of exist explains, when the value 2 is replied:

``` 2 if A is an M-file on MATLAB's search path. It also returns 2 when A is
the full pathname to a file or when A is the name of an ordinary file on
MATLAB's search path```

But "when A is the full pathname to a file" does not match, when A is a MEX-, MDL- or P-file, because in these cases 3, 4 or 6 is replied respectively. So let's try to improve the check:

```if ~any(exist(Data, 'file') == [2, 3, 4, 6])
end
```

But even then, exist() is smarter then expected:

```File1 = fullfile(matlabroot, '\toolbox\matlab\graph2d\plot')
File2 = fullfile(matlabroot, '\toolbox\matlab\graph2d\plot.m')
File3 = fullfile(matlabroot, '\toolbox\signal\signal\@dspdata\plot')
File4 = fullfile(matlabroot, '\toolbox\signal\signal\@dspdata\plot.m')
```
```exist(File1, 'file')  % 0 !
exist(File2, 'file')  % 2
exist(File3, 'file')  % 2 !
exist(File4, 'file')  % 2
```

I guess that File1 is not recognized, because plot is a built-in function, while @dspdata\plot (|File3|) is not a built-in function. But File3 is not an existing file:

```fopen(File1, 'r')  % -1
fopen(File2, 'r')  %  3
fopen(File3, 'r')  % -1  !! inspite of: exist(File3, 'file') ~= 0
fopen(File4, 'r')  %  4
fclose('all')
```

So how can we check the existence of a file in a simple and reliable way?

```function Ex = FileExist(FileName)
FID = fopen(FileName, 'r');
if FID == -1
Ex = false;
else
Ex = true;
fclose(FID);
end
```

But there are still exceptions, because even fopen() is smart also:

```cd(tempdir);
fopen('plot.m', 'r')  % 3, file is *found*!
```

Here fopen() searches in all folders of the Matlab PATH, but actually it should be searched in the current folder only. This has the side-effect, that fopen(name, 'r') is relatively slow. Another idea:

```cd(tempdir);
```

This is faster than the 'r' mode, especially if folders of the PATH are stored on network drives. And requesting write access does restrict the search to the local folder only. But this fails, if the current user does not have write privileges to the file.

The next approach:

```function Ex = FileExist(FileName)
dirFile = dir(FileName);
if length(dirFile) == 1
Ex = ~(dirFile.isdir);
else
Ex = false;
end
```

I could not find a file, where this test fails. It is very slow, if FileName is a folder on a network drive which contain very much files. But this is a rare case such that I prefer this test.

Finally a C-Mex using either GetFileAttributes under Windows or _open or _wopen under Linux/MacOS is faster: 10% for existing files, 90% for missing files. But the handling of the unicode strings is not trivial: 2 bytes per wchar under Windows, 4 bytes per wchar under Linux and MacOS, but under Linux wchar's are not used in common, but utf-8 encoded 1 byte per char strings. See Answers: Matlab string to wchar under Linux. I'm going to publish the Mex functions in the FEX, also a DirExist(), because exist(name, 'dir') has similar problems.

• Did you consider such effects caused by the smartness of exist() in your programs?
• Did a user of your programs run into troubles due to weak tests of file existence, e.g. when the resulting error messages are misleading?
• Do your or your programs profit or suffer from the smartness of exist() and fopen()?
• Do you think the behavior of these function is explained clearly enough in the help and doc text?
• Do you want standard jobs solved reliably by simple commands in Matlab?

NOTE: Usage of the recursive font: I mean that smart is not smart.

Products

No products are associated with this question.

Answer by Malcolm Lidierth on 4 Nov 2012
Edited by Malcolm Lidierth on 4 Nov 2012

Easy with Java:

```File1 = fullfile(matlabroot,'toolbox','matlab','graph2d','plot');
File2 = fullfile(matlabroot, 'toolbox','matlab','graph2d','plot.m');
File3 = fullfile(matlabroot, 'toolbox','signal','signal','@dspdata','plot');
File4 = fullfile(matlabroot, 'toolbox','signal','signal','@dspdata','plot.m');
```
```file=java.io.File(File1);
file.exists()
file=java.io.File(File2);
file.exists()
file=java.io.File(File3);
file.exists()
file=java.io.File(File4);
file.exists()
ans =
```
```     0
ans =```
```     1
ans =```
```     0
ans =```
`     1`

Jan Simon on 4 Nov 2012

File.exists() && File.isFile() is 15% faster than the ~isDirectory method.

The dirFile = dir(FileName); Ex = (length(dirFile) == 1) && ~(dirFile.isdir); method still needs the half time and does not depend on the platform also.

Malcolm Lidierth on 4 Nov 2012

@Jan

File.isFile() alone will do returning false if the entry does not exist or is a folder.

There will always be extra overhead with Java as the strings are passed as copies (to the java.lang.String constructor then by reference to File) not pointers (Java 9 may fix that).

Jan Simon on 5 Nov 2012

@Malcolm: Fine, now I understand you hint "File.isFile()". Timings now to test existence of 981 files, 10 repetitions, existing / not existing files:

• File=java.io.file(Name); Ex=File.isFile(); 0.90 / 0.80sec
• Ex = (length(dirFile) == 1) && ~(dirFile.isdir); 0.70 / 0.60 sec
• C-Mex, 0.29 / 0.21 sec sec

My conclusion concerning speed: These three methods are equivalent, because usual applications do not test millions of files. So we have good workarounds for the weak EXIST. Anyhow, I'm still disappointed by the built-in EXIST, because it is too over-featured to fulfill the simple test of the existence of a file.

Answer by Daniel on 5 Nov 2012

I am not sure you are using EXIST how it was intended to be used. The H1 line is: %EXIST Check if variables or functions are defined. The documentation says little about checking if files exist. I agree that the argument names and output values are confusing. I think, however, that EXIST should not be used for checking if a file exists. Determining if a function exists seems harder than determining if a file exists, therefore I wouldn't expect it to compete in terms of speed.

Jan Simon on 5 Nov 2012

Thanks, Daniel. It is obvious, that the design of EXIST does not match my needs. The documentation is clear and correct: "It also returns 2 when name is the full pathname to a file or the name of an ordinary file on your MATLAB search path." And, in fact, exist() works almost as advertised, when the file name contains the extension and a full path. More than 100 of Matlab's toolbox functions use this command to check the existence of files, e.g. winopen, loadlibrary, open, run, csvread and xlswrite. In these and user-defined functions, exist can fail in the cases I've described already.

I agree with you, that exist should not be used to check the existence of files. But I hesitate to send 117 enhancement requests concerning well established toolbox functions. So at least I want to stress, that user-defined functions should use a more stable test for file existence.

Daniel on 6 Nov 2012

I agree that it isn't good practice and it is these types of bugs that make me an FOSS supporter. That said, it may not be as bad as you think. From your example it seems that the problem with EXIST is that it can sometimes erroneously say that a file without an extension exists when it actually doesn't. Therefore any function that adds an extension automatically will be okay. In other functions EXIST may be used to throw a nice error message and the function will error later when it tries to read/write to the file. This again is not a huge problem. The problem is for functions that do not append an extension and create a new file (or follows an alternative processing path) when the current file does not exist. I think that that use case might be rare.

Jan Simon on 6 Nov 2012

I do not believe that the level of hugeness can be measured. Any unexpected behavior can have severe effects.

A user of DataHash got problems, because the check for existence rejected P-files. Without the chance to modify the code, e.g. when DataHash would be P-coded, the user would need tedious workarounds like renaming the file before calculating the hash. In the real world there can be even files like "D:\MFiles\file.m.p.mex", which should not confuse the detection of the file existence also. The reliability of a function must be proved using non-standard input, because "reliable for standard input" is a very weak label.

I assume, the smartness of fopen() is more dangerous: It opens a file anywhere in the path, when the file name is relative. Lukily this does not concern opening the file with write-access. And again the workaround is a standard good programming practize at all: never work with relative paths, but always use fully qualified path names - therefore I spend so much time in GetFullPath.

So perhaps all I want to say is:

Do not use exist(Name, 'file') with relative paths!!!