fgetl drops part of line
3 views (last 30 days)
Show older comments
I've been having an issue with the fgetl function where it occasionally drops part of a line. I'm reading a 10-15 MB ASCII text file across a LAN. The file is opened in a binary read mode. Each line in the text file is a variable length and in composed of both numeric and character data. There are approximately 100K lines per file. Each line has a character string associated with it. Based on this string, there's a check to make sure the line has the correct number of fields. If it doesn't, an error is generated. I get this error once approximately 5% of the time I read one of these files. If I read the file in again, it works fine. It appears completely random. I don't get any errors from the fgetl function when this happens.
Has anyone ever seen something like this? I don't know if this is a MATLAB problem, a Windows problem (it's XP Pro), or a LAN/server problem.
I thought it might be related to LAN loading, but it happens in the middle of the night as well as during the day. I'm in the process of copying the files to a local drive before reading them to see if this makes any difference. Should I be opening the file using the 'rt' option?
Any help would be greatly appreciated. Thanks.
1 Comment
Wouter
on 26 Mar 2013
strange problem; you could try to first copy the entire file using movefile and then reading it using your function.
Answers (7)
Image Analyst
on 26 Mar 2013
Why are you not opening your ASCII text file in text mode instead of binary mode? I'd open it in text mode if you know it's all text. If the problem remains, upload your file and code.
1 Comment
Jan
on 26 Mar 2013
Opening a text file in binary mode is less susceptible for the side-effects of non-printables. E.g. it is much easier to control the line breaks, avoid stopping at ^Z characters, the i.th file position is the i.th byte without considering CHAR([13,10]) as 1 character, etc. Therefore accessing files in binary mode is faster.
Jan
on 26 Mar 2013
The strangeness of the problem implies, that there is a deterministic reason. Most (to be exact all) magic problems I've seen, had such a reason.
E.g. does another process write to the files while you are reading them?
Or do you use MEX-files, which can corrupt the memory manager? You can provoke any kind of non-reproducable errors by C-Mex functions...
5% of all files are concerned, which is a high value. You can check the network connection by calculating the MD5 of some large (GB) files repeatedly.
When the reading fails, print out the read line, go back the required number of bytes, read the line again and print it also. Then post the difference between these lines. Does this reveal any pattern? E.g. does it depend on the line number or the contents? Do you get any entries in the error log of the operating system?
0 Comments
Cedric
on 26 Mar 2013
Edited: Cedric
on 26 Mar 2013
Could you copy/paste here the first 10-20 lines of one of these files? Have you tried to read the full file in one shot and then process it line y line, instead of reading it line by line?
The fact that it is working with the same file if you re-launch the import after a failure seems to indicate that it's not a problem of file content (e.g. partly binary data that screw FGETL), and my first reaction would be to check that there no process that updates these files in background while you are trying to read them (you'd have to manage this with a lock/semaphore mechanism).
0 Comments
Todd
on 26 Mar 2013
1 Comment
Cedric
on 26 Mar 2013
Edited: Cedric
on 26 Mar 2013
Looks like some network issue; did you try to copy files on your hard drive a read them from there? If you have one file that is fine and another that is not, you can repeat their processing e.g. 1000 times and see whether the outcome is always fine on the good file and always wrong on the bad one. If it is the case, then either something is updating these files in the background (which is not the case up to what you said), or there is something wrong with your network .. If you find no bad file after copying them on your hard drive (which should not happen if you have a network issue), or significantly fewer of them than when you slowly read them line by line, I would still think that some process is accessing the remote files while you are reading them.
See Also
Categories
Find more on Whos in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!