fopen / fprintf: Write to same file from independent processes

18 views (last 30 days)
Hi,
I have several Matlab instances running which all shall write to the same log file (text, line by line). Unfortunately I did not find a way yet to do that without data loss.
My assumption was that the fopen or at least the fprintf functions throw an error on attempting to open a file which had been opened by another Matlab instance resp. on attempting to write into a file which is currently being written by another Matlab instance.
Unfortunately this is not the case. Even the nbytes feedback of the fprintf function cannot be used to detect a failed writing attempt because it always states the number of bytes even if they have not been physically written.
So is there any simple solution for that?
PS:
I did not mention that I also tried ferror but even that does not help as it seems. Below is the test code I use. I call the function from one Matlab instance with no offset. Without disturbance it writes the numbers 1 to 200000 to the text file whch takes about 10 seconds. But if I call the function from a second instance (using an offset of e.g. 1000000) while the first function is still running, the file contains alternating numbers from both processes (which can be distinguished based on the offset). This would be perfectly fine if no data would be missing. But instead of 400000 lines I get 394942 lines only which contain 2450 blank lines, thus 392492 data lines only which means that 7508 data lines are missing (throughout both instances). Actually. In fact I even have 32 duplicated lines for some reason (all from one instance only). So the complete process is crap in the current implementation.
function sError = FileTest(sFileName_, iOffset)
[fID, e] = fopen(sFileName_, 'at'); % try to open text file for writing
sError = e;
if ~exist('iOffset', 'var')
iOffset = 0;
end % if
for i = 1:200000
fprintf(fID, '%s\n', num2str(i + iOffset));
[m, e] = ferror(fID);
if e ~= 0 % is never true although there is data which is not successfully written
disp([num2str(i + iOffset) ': ' m]);
end
end % for
fclose(fID);
end % FileTest

Answers (1)

Guillaume
Guillaume on 28 Nov 2018
Your problem is not restricted to matlab. You would have the same problem in any other programming language. You basically have several processes writing to the same resource at the same time. Even in a multithreaded contest it's not trivial to manage, in a multiprocess contest it's a nightmare. You need to introduce concepts of locks, mutexes, semaphores, and for multiprocesses, interprocess communication, to manage that properly. With file IO, you also have to contend with the fact that the OS will do some buffering and will probably have a buffer per process and you don't really have much control over when that buffer gets actually written to the file (at least in matlab).
You need to review your design. The simplest is to have one log file per process (matlab instance). Easy to code, no interprocess communication to manage.
If you do need one common log file, then you must have a single matlab instance whose job is to write to the log file. For simplicity, it should be its only job. All the other matlab processes then communicate with that writer process to tell it what to write, so you need some sort of interprocess communication. I've never done that in matlab and there's not many tools available in matlab to develop that efficiently. In matlab, you possibly could do the inter-process communication via udp or tcp over the loopback adapter, or you could use memmapp files to emulate shared memory (see for example Share memory between application. Whatever method you use, it's going to be a lot of work so I'd go with option 1: one log per process.
  3 Comments
Guillaume
Guillaume on 28 Nov 2018
"in worst case you need to close Matlab completely to solve it"
Use fclose('all') to close all files still held open by a matlab process instead of quitting it. Better, make sure that you always fclose a file after a successful fopen. The simplest way to ensure that, in a function, is with this pattern:
function dosomething
%...
fid = fopen(somefile, somepermissions)
if fid == -1
error('failed to open file')
end
cleanup = onCleanup(@() fclose(fid)); %will ensure that file is closed even if the function errors
%.. rest of code
%no need to call fclose. The file will be closed anyway when the function ends
end
"but that would be limited to the same environment". Are your different instances of matlab on different machines? If so, I don't think using lock files over a networked file system is going to work. I would think you need to implement your interprocess communication over tcp or udp but this is outside my expertise there. I've never implemented IPC across machines. If I were to implement it, I'd certainly try communicating with a single writer process over UDP. Matlab doesn't really have any mechanism for locking files efficiently (other than fopen for writing).
Kristian
Kristian on 28 Nov 2018
I did not mean files which I had explicitly opened but any kind of files which Matlab had used. So fclose('all') would not have helped here. However, different story.
Yes, the instances can be on different machines. And yes, the data is on a NAS, so it´s even slower than on a local disk. But the idea of having a kind of collector process is worth thinking about it. Currently I do it manually by merging all text files into one. This could also be done by a process (either Matlab or just a Windows service or something).

Sign in to comment.

Categories

Find more on Startup and Shutdown in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!