Create a empty file very fast

63 views (last 30 days)
Celso Cruz
Celso Cruz on 15 Jul 2013
Commented: Walter Roberson on 13 Oct 2017
Hi, I need to create empty temporary file very fast.
Actually, I'm using:
fclose(fopen(fileName, 'w'));
But it is really slow.
Thanks.
  2 Comments
Cedric
Cedric on 15 Jul 2013
What is the purpose?
Matt J
Matt J on 15 Jul 2013
Maybe you have a slow hard drive?

Sign in to comment.

Answers (2)

Walter Roberson
Walter Roberson on 12 Oct 2017
Time, time, everywhere a time
Blockin' out the witchery, breakin' my mind
Do this, don't do that, can't you read the time?
Test code attached. The code checks several hypothesis:
  • that different access modes such as w vs W might have different timings
  • that it might be either the fopen or the fclose that are slow (slow fclose would imply that it could be better to leave a bunch of files open during the time-critical part until you have leisure to close them)
  • that order of modes tried matters
  • that order of fopen test matters
Run the routine with no arguments to get default behaviour of testing with 200 files non-verbose. First argument (optional) is number of files to test with. Second argument is whether to be verbose or not. Please be sure to run the function repeatedly to try different orders
Test conclusions on my system:
  1. fclose is much slower than fopen.
  2. deleting files takes about as long as closing them (slower than creating them.)
  3. systems might not be able to handle more than 240 simultaneous open files and normal try/catch does not work to handle the problem
  4. it is not consistent as to whether fopen with delayed fclose is faster than fopen with immediate fclose. I thought I was seeing an advantage for it, but when I randomized the order I was often seeing that the first of the tests was the slower one
  5. seeming differences in access mode timings are inconsistent over repeated runs; if there is a true difference then it is subtle
  6. for any given file, the first creation of the file within the process is about 10 times slower! All further deletions and creations of the same file in the same function call run about the same speed, but the very first one is slow, at least up to about 1000 files.
Note: you probably cannot test more than 240 files with this code as-is: if you try to open too many files simultaneously, MATLAB fracks the run beyond the ability of try/catch to handle. I tested up to 1000 files in the time before I added the delayed-close test; the extra time required for the first access to a file was pretty obvious.
So out of all if this... about the only conclusion that can be used to speed about performance is:
"If performance is an issue, then if possible, re-use file names rather than working with different names. Even if that means deleting the files -- but for better performance yet, leave them in place until you are finished (open with 'w' or 'W+' to truncate them.)"
  2 Comments
Jan
Jan on 12 Oct 2017
Edited: Jan on 12 Oct 2017
My cheap test:
tmp = tempdir;
tic; for k = 1:1000
fclose(fopen(sprintf('%sdummy%d', tmp, k), 'w'));
end; toc
tic; for k = 1:1000
java.io.File.createTempFile('dummy', '');
end; toc
Elapsed time is 1.362463 seconds. (second run!!!)
Elapsed time is 0.898448 seconds.
This is a lazy test only. timeit would be more accurate and the created file names are not the same. The first run took 5.3 sec, so I can confirm Walter's observation: truncating an existing file is faster.
In fact, the Java function is faster. If creating temp files is the bottleneck of the code, consider to use java.io.createTempFile() insetad of fclose(fopen()).
Walter Roberson
Walter Roberson on 13 Oct 2017
I have improved the test to include timings of two different java methods of creating files.
  1. Fastest: fopen() alone, without doing corresponding fclose at all. About 0.000063 each. So MATLAB's fopen() operation itself is not the performance barrier, the fclose() is the performance barrier
  2. fastest complete: java.io.File.createNewFile. About 0.000088 each
  3. nio.file.Files.createFile . About 0.000145 each
  4. fopen() as many files as the system supports, fclose them after. Looping doing fclose() or using just fclose('all') is about the same speed. About 0.000335 each
  5. (slowest) fopen() and fclose() immediately. About 0.000360 each
However, on some runs delayed close can turn out slower than immediate close. I see more runs with delayed close slightly faster, but the ranges overlap -- which implies that on other systems that delaying the fclose is not necessarily faster.

Sign in to comment.


Jan
Jan on 15 Jul 2013
Edited: Jan on 15 Jul 2013
Hard drive access is slow. There is no magic trick to accelerate it. But non-magic tricks help: Use a faster hard drive, preferably a SSD. But even then the management of the file system needs a lot of time. Therefore "very fast" operations should not rely on disk accesses.
  4 Comments
Walter Roberson
Walter Roberson on 12 Oct 2017
J Eduardo Mucino comments to Jan Simon:
This answer is antagonistic and not helpful. The OP has a point and shouldn't be told to use different hardware or software if he doesn't like using MATLAB.
Jan
Jan on 12 Oct 2017
Edited: Jan on 12 Oct 2017
@J Eduardo Mucino: How could I suggest a better solution, if there is none? There is no magic trick, which is kept as a secret, like:
fclose(fopen(fileName, 'w', '*muchfaster*'), '*immediately*)
Celso did not mention absolute timings and how they have been measured. I cannot estimate, what he considers as "very slow". But what ever the speed is, the only way to influence it when using Matlab commands, is to accelerate the hardware.
The OP mentioned, that the creation of a file is faster in Java. So what's wrong with suggesting to use these commands from inside Matlab? My FileExchange account is full of such functions, which use other programming languages to solve some jobs faster than with Matlab functions. This is a valid approach.
Of course my answer is not "antagonistic", but I spent the time to post the best ideas I know to solve the problem. I agree, that my answer might not be satisfying, but you cannot blame me for limitations of Matlab.
What do you expect? That I post an update of Matlab's file functions which work twice as fast? I am a voluntary member of this forum. Does this give the OP the right to get a satisfying answer from me?

Sign in to comment.

Categories

Find more on Programming in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!