Thread Subject: Mixed text/binary data handling

Subject: Mixed text/binary data handling

From: Andriy Nych

Date: 23 May, 2011 10:09:04

Message: 1 of 4

Hi all
Recently I faced with a problem with handling mixed text/binary data files.
I need to work with files that start with textual description of the contents (pretty similar to *.ini file) followed by binary data.
This is the code that writes the file (under Windows):
    fprintf(fid, 'DescriptionOfTheContents');
    fprintf(fid, '\n[Data]\n');
    fwrite(fid, Data, cc);
To load data from the file text part is loaded first with series of FGETL and then FREAD reads binary part.
And here is the problem.
MatLab's FPRINTF function replaces \n is replaced with 0x0A character (under windows too). Usually it is OK, since FGETL can read the data back.
But in some cases binary data start with 0x0D byte, thus the "[Data]" string is follo0wed by 0x0A and 0x0D bytes.
For reason that is not clear to me FGETL treats the 0x0A+0x0D combination as EndOfLine character reading one extra byte and thus corrupting the binary data.
Is it possible to tell to MatLab's FGETL function to use only one EOL character?
Or is the TEXTSCAN function the only one that allows to set EOL character directly?

Thanks

Subject: Mixed text/binary data handling

From: dpb

Date: 23 May, 2011 13:34:56

Message: 2 of 4

On 5/23/2011 5:09 AM, Andriy Nych wrote:
...

> Is it possible to tell to MatLab's FGETL function to use only one EOL
> character?

...

Shouldn't be fgetl() that needs told...it's fopen() w/ the 't' option
that controls the correct newline character interpretation for the
platform. AFAIK, that works correctly.

But, if you have somehow managed to actually create a hybrid file, you
may have to either parse it all directly or open it first as text then
close it and reopen as binary (having determined the length of the text
in the first instance as well) and then fseek() past that for the stream
data portion.

If at all possible, I'd fix the app that created the files to be
self-consistent.

--

Subject: Mixed text/binary data handling

From: Andriy Nych

Date: 23 May, 2011 15:15:20

Message: 3 of 4

dpb <none@non.net> wrote in message <irdnq1$k84$1@speranza.aioe.org>...
> On 5/23/2011 5:09 AM, Andriy Nych wrote:
> ...
> > Is it possible to tell to MatLab's FGETL function to use only one EOL
> > character?
> ...
> Shouldn't be fgetl() that needs told...it's fopen() w/ the 't' option
> that controls the correct newline character interpretation for the
> platform. AFAIK, that works correctly.
>
> But, if you have somehow managed to actually create a hybrid file, you
> may have to either parse it all directly or open it first as text then
> close it and reopen as binary (having determined the length of the text
> in the first instance as well) and then fseek() past that for the stream
> data portion.
>
> If at all possible, I'd fix the app that created the files to be self-consistent.

Thanks for your reply
100% agree. I'll switch to HDF or other high-level container-like format.

Actually, under windows different programs behave differently: notepad uses CR+LF (0x0D+0x0A) while wordpad uses just LF (0x0A) as EOL cherecter.
Old DOS-format text files should also use CR+LF pair.
But in this case 0x0A+0x0D pair appears ... And this is result.
It was quite difficult to track the error because although all the files have the same data size just some of them are one byte shorter when loaded.

Subject: Mixed text/binary data handling

From: dpb

Date: 23 May, 2011 15:22:10

Message: 4 of 4

On 5/23/2011 10:15 AM, Andriy Nych wrote:
...

> Actually, under windows different programs behave differently: notepad
> uses CR+LF (0x0D+0x0A) while wordpad uses just LF (0x0A) as EOL cherecter.

Why am I not surprised? :( I've never used either for anything real so
wouldn't know.

> Old DOS-format text files should also use CR+LF pair.
> But in this case 0x0A+0x0D pair appears ... And this is result.
> It was quite difficult to track the error because although all the files
> have the same data size just some of them are one byte shorter when loaded.

Yeah, those kinds of gotcha's can be a pita for sure...

Don't see any clean way around it other than to generate the files
consistently.

--

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
file io Andriy Nych 23 May, 2011 06:14:05
fgetl Andriy Nych 23 May, 2011 06:14:05
rssFeed for this Thread

Contact us at files@mathworks.com