Skip to Main Content Skip to Search
Login
File Exchange
MATLAB Newsgroup
Link Exchange
  Blogs  
 Contest 
MathWorks.com

Thread Subject: Binary file from C++ to Matlab

Subject: Binary file from C++ to Matlab

From: Venkat

Date: 16 Mar, 2006 20:55:06

Message: 1 of 19

Hi there,
I am having problems reading binary data files in Matlab. The files
were written out in a C++ code as follows

ofstream foo("vector.dat",ios::out | ios::binary);
foo.write((char *)&dim,sizeof(int));
foo.write((char *)&value,dim*sizeof(double));
foo.close();

In matlab i try to read it as follows
fp = fopen('vector.dat','rb');
N = fread(fp,1,'int32');
d = fread(fp,N,'double');
fclose(fp);

Note that in C++ compiler,
sizeof(int) = 4bytes
sizeof(double) = 8bytes

Thanks in advance,
Venkat

Subject: Binary file from C++ to Matlab

From: Rune Allnor

Date: 17 Mar, 2006 03:04:55

Message: 2 of 19


Venkat wrote:
> Hi there,
> I am having problems reading binary data files in Matlab. The files
> were written out in a C++ code as follows
>
> ofstream foo("vector.dat",ios::out | ios::binary);
> foo.write((char *)&dim,sizeof(int));
> foo.write((char *)&value,dim*sizeof(double));
> foo.close();

Hmmm... seems OK, but I have no C++ compiler available to
check it...

> In matlab i try to read it as follows
> fp = fopen('vector.dat','rb');
> N = fread(fp,1,'int32');
> d = fread(fp,N,'double');
> fclose(fp);
>
> Note that in C++ compiler,
> sizeof(int) = 4bytes

Are you sure about that? Have you checked it?
It *could* be 2.

> sizeof(double) = 8bytes

What is sizeof(char)? Are you sure the answer is 1?
Check the file size.

Apart from these questions, I would check if the file
is written in a little-endian format and read in a big-endian
format, or vice versa.

Rune

Subject: Binary file from C++ to Matlab

From: sturlamolden

Date: 17 Mar, 2006 03:25:20

Message: 3 of 19


Rune Allnor wrote:

> What is sizeof(char)? Are you sure the answer is 1?
> Check the file size.

The size of a char is 1 by definition. The size of wchar_t is usually
2, depending on the number of bits in a char.

Subject: Binary file from C++ to Matlab

From: Rune Allnor

Date: 22 Mar, 2006 01:31:12

Message: 4 of 19


sturlamolden wrote:
> Rune Allnor wrote:
>
> > What is sizeof(char)? Are you sure the answer is 1?
> > Check the file size.
>
> The size of a char is 1 by definition. The size of wchar_t is usually
> 2, depending on the number of bits in a char.

Hopefully, you are right.

The one serious objection I have against C and C++ is that the
binary representations of the various data types are specified
ambiguously. The definition is that the 'char' data type holds
AT LEAST 8 bits; it could be 16 or 32 bits. The definition of the
'int' data type is that it holds AT LEAST 16 bits; it could be 32 bits.

Rune

Subject: Binary file from C++ to Matlab

From: Peter Boettcher

Date: 22 Mar, 2006 09:40:18

Message: 5 of 19

"Rune Allnor" <allnor@tele.ntnu.no> writes:

> sturlamolden wrote:
>> Rune Allnor wrote:
>>
>> > What is sizeof(char)? Are you sure the answer is 1?
>> > Check the file size.
>>
>> The size of a char is 1 by definition. The size of wchar_t is usually
>> 2, depending on the number of bits in a char.
>
> Hopefully, you are right.
>
> The one serious objection I have against C and C++ is that the
> binary representations of the various data types are specified
> ambiguously. The definition is that the 'char' data type holds
> AT LEAST 8 bits; it could be 16 or 32 bits. The definition of the
> 'int' data type is that it holds AT LEAST 16 bits; it could be 32 bits.

ISO C99 repairs this hole by requiring that the compiler/system use
typedefs for int8_t, int16_t, int32_t, uint8_t, etc. These are found
in inttypes.h and/or stdint.h




--
Peter Boettcher <boettcher@ll.mit.edu>
MIT Lincoln Laboratory
MATLAB FAQ: http://www.mit.edu/~pwb/cssm/

Subject: Binary file from C++ to Matlab

From: sturlamolden

Date: 22 Mar, 2006 10:30:33

Message: 6 of 19


Rune Allnor skrev:

> The one serious objection I have against C and C++ is that the
> binary representations of the various data types are specified
> ambiguously. The definition is that the 'char' data type holds
> AT LEAST 8 bits; it could be 16 or 32 bits. The definition of the
> 'int' data type is that it holds AT LEAST 16 bits; it could be 32 bits.


This is an issue of portability between different platforms.
Programatically, you never need to know the exact length of an integer
(and if you think you do, you are wrong!) You only need to know the
lower bound on its length. If you think you need to know that an int is
exacty 32 bits, you are doing something very strange. Most likely the
issue is your logic and not the way C defines its integer types. C's
diverse integer types is the key to its portability. Just look at the
Linux kernel and the variety of architectures it supports.

A char is the smallest adressable unit on a particular system, that is
a byte. Because a char is the smallest addressable unit on the system,
the size of any type can be expressed relative to the char. I.e. if
your char is 8 bits, it is impossible to address 12 bit integers
directly, because 8 bit chars disallow addressing on 12 bit boundaries.
If you do want to work with 12 bit integers, you must either pad them
to at least 16 bits or address them in pairs, using triplets of 8 bit
chars. However, there is nothing in C prohibiting 12 bit chars if the
hardware was designed to work with that. The C sizeof-operator can
therefore safely returns the size of a type relative to a char. Thus,
sizeof(type) will always be an integer larger of equal to 1, and
sizof(char) will always be 1 by definition.

In C99, you will find types such as int16_t and int_least16_t in
stdint.h. Although appealing to some, the use of them are mostly
pathological and better avoided.

Subject: Binary file from C++ to Matlab

From: Venkat

Date: 22 Mar, 2006 20:54:38

Message: 7 of 19

Hi All,

I still cannot read the double values correctly with the matlab code.

I use gnu gcc version 3.4.2 on a AMD Opetron64 bit machine running
Fedora Linux.
The size of the example file with dim=10 and value as double[10],
written in C++ is 84 bytes.

Based on the sizeof(int) and sizeof(double) dumped by the compiler, i
believe of the 84bytes
4 bytes are for the initial int and
the rest 80 bytes are for the 10 double values with 8 bytes each.

The matlab script
  N = fread(fp,1,'int32');
reads in N as 10, but
  d = fread(fp,N,'double');
doesnot result in expected double entries. Matlab documentation for
fread says 'double' as precision reads in 64bits, as needed in this
case, yet the values read are not correct.
Matlab is on Intel Pentium 4, 32bit Windows machine. Not sure if this
could be an issue.

~ Venkat

sturlamolden wrote:
>
>
>
> Rune Allnor skrev:
>
>> The one serious objection I have against C and C++ is that the
>> binary representations of the various data types are specified
>> ambiguously. The definition is that the 'char' data type holds
>> AT LEAST 8 bits; it could be 16 or 32 bits. The definition of
the
>> 'int' data type is that it holds AT LEAST 16 bits; it could be
32
> bits.
>
>
> This is an issue of portability between different platforms.
> Programatically, you never need to know the exact length of an
> integer
> (and if you think you do, you are wrong!) You only need to know the
> lower bound on its length. If you think you need to know that an
> int is
> exacty 32 bits, you are doing something very strange. Most likely
> the
> issue is your logic and not the way C defines its integer types.
> C's
> diverse integer types is the key to its portability. Just look at
> the
> Linux kernel and the variety of architectures it supports.
>
> A char is the smallest adressable unit on a particular system, that
> is
> a byte. Because a char is the smallest addressable unit on the
> system,
> the size of any type can be expressed relative to the char. I.e. if
> your char is 8 bits, it is impossible to address 12 bit integers
> directly, because 8 bit chars disallow addressing on 12 bit
> boundaries.
> If you do want to work with 12 bit integers, you must either pad
> them
> to at least 16 bits or address them in pairs, using triplets of 8
> bit
> chars. However, there is nothing in C prohibiting 12 bit chars if
> the
> hardware was designed to work with that. The C sizeof-operator can
> therefore safely returns the size of a type relative to a char.
> Thus,
> sizeof(type) will always be an integer larger of equal to 1, and
> sizof(char) will always be 1 by definition.
>
> In C99, you will find types such as int16_t and int_least16_t in
> stdint.h. Although appealing to some, the use of them are mostly
> pathological and better avoided.
>
>

Subject: Binary file from C++ to Matlab

From: Peter Boettcher

Date: 23 Mar, 2006 09:15:46

Message: 8 of 19

Venkat <v_rayalu@hotmail.com> writes:

> Hi All,
>
> I still cannot read the double values correctly with the matlab code.
>
> I use gnu gcc version 3.4.2 on a AMD Opetron64 bit machine running
> Fedora Linux.
> The size of the example file with dim=10 and value as double[10],
> written in C++ is 84 bytes.
>
> Based on the sizeof(int) and sizeof(double) dumped by the compiler, i
> believe of the 84bytes
> 4 bytes are for the initial int and
> the rest 80 bytes are for the 10 double values with 8 bytes each.
>
> The matlab script
> N = fread(fp,1,'int32');
> reads in N as 10, but
> d = fread(fp,N,'double');
> doesnot result in expected double entries. Matlab documentation for
> fread says 'double' as precision reads in 64bits, as needed in this
> case, yet the values read are not correct.
> Matlab is on Intel Pentium 4, 32bit Windows machine. Not sure if this
> could be an issue.

I think you are doing this correctly. There should be no endian-ness
issues either, since both are on little-endian PCs.

Can you post your (trimmed) code both for the C-side write and the
MATLAB-side read?

You might also start playing with a hex editor to inspect the file and
see if you can figure out what is going on. Maybe read the file in
MATLAB as 84 uint8's, which you can display, and assemble by eye into
ints and things. If you force the double values all to 1.0, say, then
the bytes will look like: 0x3ff0000000000000

Good luck.

--
Peter Boettcher <boettcher@ll.mit.edu>
MIT Lincoln Laboratory
MATLAB FAQ: http://www.mit.edu/~pwb/cssm/

Subject: Binary file from C++ to Matlab

From: Michael Wild

Date: 23 Mar, 2006 19:11:09

Message: 9 of 19

Peter Boettcher wrote:
> Venkat <v_rayalu@hotmail.com> writes:
>
>> Hi All,
>>
>> I still cannot read the double values correctly with the matlab code.
>>
>> I use gnu gcc version 3.4.2 on a AMD Opetron64 bit machine running
>> Fedora Linux.
>> The size of the example file with dim=10 and value as double[10],
>> written in C++ is 84 bytes.
>>
>> Based on the sizeof(int) and sizeof(double) dumped by the compiler, i
>> believe of the 84bytes
>> 4 bytes are for the initial int and
>> the rest 80 bytes are for the 10 double values with 8 bytes each.
>>
>> The matlab script
>> N = fread(fp,1,'int32');
>> reads in N as 10, but
>> d = fread(fp,N,'double');
>> doesnot result in expected double entries. Matlab documentation for
>> fread says 'double' as precision reads in 64bits, as needed in this
>> case, yet the values read are not correct.
>> Matlab is on Intel Pentium 4, 32bit Windows machine. Not sure if this
>> could be an issue.
>
> I think you are doing this correctly. There should be no endian-ness
> issues either, since both are on little-endian PCs.
>
> Can you post your (trimmed) code both for the C-side write and the
> MATLAB-side read?
>
> You might also start playing with a hex editor to inspect the file and
> see if you can figure out what is going on. Maybe read the file in
> MATLAB as 84 uint8's, which you can display, and assemble by eye into
> ints and things. If you force the double values all to 1.0, say, then
> the bytes will look like: 0x3ff0000000000000
>
> Good luck.
>

how do you write the file in c++? are you using iostream? show us the
code, because writing binary files in c++ is not trivial (despite it's
nature as a rather low level language...).

you might try 'b' as the last argument to fread, in order to get it to
use big-endian, in case you're using strange compiler switches.


michael

Subject: Binary file from C++ to Matlab

From: sturlamolden

Date: 23 Mar, 2006 14:55:18

Message: 10 of 19

#include <fstream>
using namespace std;
int main ()
{
  double value[10] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0,
10.0};
  int dim = 10;
  /* venkat's code below */
  ofstream foo("vector.dat",ios::out | ios::binary);
  foo.write((char *)&dim,sizeof(int));
  foo.write((char *)&value,dim*sizeof(double));
  foo.close();
  /* quit */
  return 0;
}

and then

fp = fopen('vector.dat','rb')
N = fread(fp,1,'int32')
d = fread(fp,N,'double')
fclose(fp)

Here is what happens:

fp =

     3


N =

    10


d =

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10


ans =

     0

As you can see, it works perfectly fine. Let me ask a very silly
question: are you sure your "values" are an array of double and not,
say, an array of float?

As others have noted, you need to post some real code now.


Sturla Molden

Subject: Binary file from C++ to Matlab

From: Venkat

Date: 23 Mar, 2006 21:20:45

Message: 11 of 19

sturlamolden wrote:
>
>
> #include <fstream>
> using namespace std;
> int main ()
> {
> double value[10] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0,
> 10.0};
> int dim = 10;
> /* venkat's code below */
> ofstream foo("vector.dat",ios::out | ios::binary);
> foo.write((char *)&dim,sizeof(int));
> foo.write((char *)&value,dim*sizeof(double));
> foo.close();
> /* quit */
> return 0;
> }

I had a mistake in the C++ code. value was a pointer to double and
the c++ code should have been
  foo.write((char *)value,dim*sizeof(double));

Having done this way, things work fine.

Thankyou for the help and suggestions.

Subject: Binary file from C++ to Matlab

From: sturlamolden

Date: 24 Mar, 2006 08:24:57

Message: 12 of 19


Venkat wrote:

> I had a mistake in the C++ code. value was a pointer to double and
> the c++ code should have been
> foo.write((char *)value,dim*sizeof(double));


Yes, pointers and arrays are convoluted in C/C++, yet very different.
This is one very common way C and C++ allows you to shoot yourself in
the foot. By convoluting array and pointer syntax, C and C++ introduces
an entire class of bugs that are very difficult to trace and impossible
to create with the majority of other languages.

Take a look at this:

    double arr[] = {1.0, 2.0, 3.0}; /* this is an array variable */
    double *ptr = {1.0, 2.0, 3.0}; /* this is a pointer and an
array literal */


1. Analysis of the variable arr:

arr is an array of doubles but also an acronym for &arr. &arr is the
address of the array. The size of arr is the number of elements in the
array times the size of a double. You can safely overwrite the content
of arr:

    arr[0] = 0.0; /* not a bug */


2. Analysis of the variable ptr:

ptr is a pointer to a constant array literal. ptr is not an acronym for
&ptr. &ptr is the adress of the pointer. The size of ptr is the size of
a pointer, e.g. 4 on 32 bits systems. You *cannot* safely overwrite the
content of the array pointed to by ptr. Although the compiler will
allow write operations to the memory pointed to by ptr, it is still an
error to do so. It will produce undefined behaviour according to the
standard:

    *ptr = 0.0; /* undefined behaviour */

My declaration should have been:

    const double *ptr = {1.0, 2.0, 3.0};

Now the compiler will disallow the statement *ptr = 0.0; But since the
C and C++ languages are not strongly typed, the compiler will silently
ignore the type declaration mismatch in

    double *ptr = {1.0, 2.0, 3.0};

and generate code with undefined action is write operations are
attempted.


3. Different meaning of the & (address of) operator:

It is very easy to confuse the meaning of &arr and &ptr, which is how
your problem originated. &arr and arr mean the same. &ptr and ptr do
not mean the same.


4. Different meaning of double arr[] and double arr[] depending on
context:

If double arr[] occurs in the declaration of a local variable, arr is
an array. If double arr[] occurs in the declaration of a function
argument, arr is actually a pointer and not an array. Thus, &arr and
&arr will also have different meaning depending on context.

Subject: Binary file from C++ to Matlab

From: noel

Date: 24 Mar, 2006 12:38:13

Message: 13 of 19

if you want unmodifiable pointers and unmodifiable elements, this is
your statement:

double const * const ptr = {1.0, 2.0, 3.0};

> But since the C and C++ languages are not strongly > typed, the
compiler will silently ignore the type
> declaration mismatch in
>
> double *ptr = {1.0, 2.0, 3.0};
>
> and generate code with undefined action is
> write operations are attempted.

what does strongly or weakly typed have to do with anything?

sturlamolden wrote:
>
>
>
> Venkat wrote:
>
>> I had a mistake in the C++ code. value was a pointer to double
> and
>> the c++ code should have been
>> foo.write((char *)value,dim*sizeof(double));
>
>
> Yes, pointers and arrays are convoluted in C/C++, yet very
> different.
> This is one very common way C and C++ allows you to shoot yourself
> in
> the foot. By convoluting array and pointer syntax, C and C++
> introduces
> an entire class of bugs that are very difficult to trace and
> impossible
> to create with the majority of other languages.
>
> Take a look at this:
>
> double arr[] = {1.0, 2.0, 3.0}; /* this is an array variable
> */
> double *ptr = {1.0, 2.0, 3.0}; /* this is a pointer and an
> array literal */
>
>
> 1. Analysis of the variable arr:
>
> arr is an array of doubles but also an acronym for &arr. &arr is
> the
> address of the array. The size of arr is the number of elements in
> the
> array times the size of a double. You can safely overwrite the
> content
> of arr:
>
> arr[0] = 0.0; /* not a bug */
>
>
> 2. Analysis of the variable ptr:
>
> ptr is a pointer to a constant array literal. ptr is not an acronym
> for
> &ptr. &ptr is the adress of the pointer. The size of ptr is the
> size of
> a pointer, e.g. 4 on 32 bits systems. You *cannot* safely overwrite
> the
> content of the array pointed to by ptr. Although the compiler will
> allow write operations to the memory pointed to by ptr, it is still
> an
> error to do so. It will produce undefined behaviour according to
> the
> standard:
>
> *ptr = 0.0; /* undefined behaviour */
>
> My declaration should have been:
>
> const double *ptr = {1.0, 2.0, 3.0};
>
> Now the compiler will disallow the statement *ptr = 0.0; But since
> the
> C and C++ languages are not strongly typed, the compiler will
> silently
> ignore the type declaration mismatch in
>
> double *ptr = {1.0, 2.0, 3.0};
>
> and generate code with undefined action is write operations are
> attempted.
>
>
> 3. Different meaning of the & (address of) operator:
>
> It is very easy to confuse the meaning of &arr and &ptr, which is
> how
> your problem originated. &arr and arr mean the same. &ptr and ptr
> do
> not mean the same.
>
>
> 4. Different meaning of double arr[] and double arr[] depending on
> context:
>
> If double arr[] occurs in the declaration of a local variable, arr
> is
> an array. If double arr[] occurs in the declaration of a function
> argument, arr is actually a pointer and not an array. Thus, &arr
> and
> &arr will also have different meaning depending on context.
>
>

Subject: Binary file from C++ to Matlab

From: sturlamolden

Date: 24 Mar, 2006 09:50:57

Message: 14 of 19


noel wrote:
> if you want unmodifiable pointers and unmodifiable elements, this is
> your statement:
>
> double const * const ptr = {1.0, 2.0, 3.0};

> what does strongly or weakly typed have to do with anything?

If {1.0, 2.0, 3.0} does not initialize a non-const array, the type of
{1.0, 2.0, 3.0} is const double[3] regardless of what comes before the
assignment. The type of ptr does not affect the type of {1.0, 2.0,
3.0}. double *ptr points to double, {1.0, 2.0, 3.0} is const
double[3]. Thus there is a type mismatch. Weakly typing means that the
compiler does not warn about this. The problem with {1.0, 2.0, 3.0}
being const double[3], is that the compiler is free to do whatever it
wants with it, including optimizing it away. So after statement like

double * const ptr = {1.0, 2.0, 3.0};

you may attemt a write to something that does not even reside in
memory. It may segfault or it may work, the standard says the compiler
is free to do whatever it wants.

But since this is a Matlab and not a C group, this is getting very off
topic.

Subject: Binary file from C++ to Matlab

From: Rune Allnor

Date: 29 Mar, 2006 00:26:59

Message: 15 of 19


sturlamolden wrote:
> Rune Allnor skrev:
>
> > The one serious objection I have against C and C++ is that the
> > binary representations of the various data types are specified
> > ambiguously. The definition is that the 'char' data type holds
> > AT LEAST 8 bits; it could be 16 or 32 bits. The definition of the
> > 'int' data type is that it holds AT LEAST 16 bits; it could be 32 bits.
>
>
> This is an issue of portability between different platforms.

Indeed.

> Programatically, you never need to know the exact length of an integer

I need to know.

> (and if you think you do, you are wrong!)

That's a rather bold statement, that might come back and haunt you.
If I were you, I would insert a "probably" and avoid the exclamation
mark.

I have more than once been in the situation of the OP, where I have
some binary file format where the file fomat is specified in terms of
8-bit chars, 16-bit ints, or 32-bit ints and so on. If one wants to
write
portable code to access those kinds of files -- and I do want to do
that --
one needs exact control of the internal binary data fomats of the
various
data types in the C program.

> You only need to know the
> lower bound on its length. If you think you need to know that an int is
> exacty 32 bits, you are doing something very strange. Most likely the
> issue is your logic and not the way C defines its integer types.

Yeah, right. Why don't you write a piece of C code that solves the OP's

problem where you do NOT rely on exact information of how the various
data types are represented internally. Even if you do use the technique

you indicate below, you do need the unequivocal definition of a char
as exactly 8 bits. That sort of stringent definition is totaly out of
character (pun intended) with the rest of C, so I would prefer to see
a reference to an ANSI specification to that respect.

> C's
> diverse integer types is the key to its portability. Just look at the
> Linux kernel and the variety of architectures it supports.

What does that have to do with C? Linux is a UNIX derivative and C was
originally developed to work under UNIX. That happened pretty early,
before UNIX developed into all sorts of system-specific dialects and
variations. I am tempted to suggest that one can not get a portable
UNIX
while avoiding getting a portable C as a by-product.

> A char is the smallest adressable unit on a particular system, that is
> a byte. Because a char is the smallest addressable unit on the system,
> the size of any type can be expressed relative to the char. I.e. if
> your char is 8 bits, it is impossible to address 12 bit integers
> directly, because 8 bit chars disallow addressing on 12 bit boundaries.

This is trivial. The key statement is "if your char is 8 bits", note
the "if."
If the char was explicitly defined as 8 bits on all systems, there
would
be no problem. The problem is that a char is defined as "at least 8
bits",
allowing for a 16 bit or 32 bit representation on some systems.

> If you do want to work with 12 bit integers, you must either pad them
> to at least 16 bits or address them in pairs, using triplets of 8 bit
> chars. However, there is nothing in C prohibiting 12 bit chars if the
> hardware was designed to work with that. The C sizeof-operator can
> therefore safely returns the size of a type relative to a char. Thus,
> sizeof(type) will always be an integer larger of equal to 1, and
> sizof(char) will always be 1 by definition.

Sure. Why not go one step further and avoid the obfuscating "char"
and work directly with bits?

While that is possible -- people did that 50 years ago -- it is not a
particularly efficient way of working. That's why assembly-level
code and later high-level coding languages were developed. No one
are interested in working at a lower abstarction level than necessary.

So either one avoids type definitions altogether, or they are
unambiguous.

> In C99, you will find types such as int16_t and int_least16_t in
> stdint.h. Although appealing to some, the use of them are mostly
> pathological and better avoided.

I'll check if they are part of the ANSI C specification. If they are,
I'll find use for them.

Rune

Subject: Binary file from C++ to Matlab

From: sturlamolden

Date: 29 Mar, 2006 04:10:49

Message: 16 of 19


Rune Allnor wrote:

> I have more than once been in the situation of the OP, where I have
> some binary file format where the file fomat is specified in terms of
> 8-bit chars, 16-bit ints, or 32-bit ints and so on. If one wants to
> write
> portable code to access those kinds of files -- and I do want to do
> that --
> one needs exact control of the internal binary data fomats of the
> various
> data types in the C program.


You can read a binary file without knowing the exact size of an
integer. If you e.g. know that a char is at least 8 bits, you know it's
safe to read 8 bits from a file into that char. If you know an int is
at least 16 bits, you also know it's safe to read 8 bits from your file
into an int.

Second, "exact control of the internal binary data fomats" can be
achieved using bitfields. However, it is often inefficient and not
portable at all. Reading and writing binary files are commonly done
using bitfields. Using bitfields, you can specify exactly the size of
the data being read or written. A bitfield is a packed struct where the
exact size (in bits) of each member is specified:

http://publications.gbdirect.co.uk/c_book/chapter6/bitfields.html



> you indicate below, you do need the unequivocal definition of a char
> as exactly 8 bits.

No. There are also cases where a char is 16 bits. There are also cases
where a hardware byte is 4 bits and a C byte (char) is 8.

> If the char was explicitly defined as 8 bits on all systems, there
> would
> be no problem.

You need an effeicient mapping between a hardware byte and a C char. If
a hardware byte is 12 bits, an 8 bits char will be very inefficient.
All that C tells you is that a byte is at lest 8 bits. If a hardware
byte is 6 bits, a C char would most likely be 12 bits.


> Sure. Why not go one step further and avoid the obfuscating "char"
> and work directly with bits?

That is why C has bitfields.


> I'll check if they are part of the ANSI C specification. If they are,
> I'll find use for them.

They are not a part of the ANSI C specification (C89). They are a part
of the ISO C specification (C99).

Subject: Binary file from C++ to Matlab

From: Christopher Hulbert

Date: 29 Mar, 2006 07:34:45

Message: 17 of 19

sturlamolden wrote:
> Rune Allnor wrote:
>
>
>>I have more than once been in the situation of the OP, where I have
>>some binary file format where the file fomat is specified in terms of
>>8-bit chars, 16-bit ints, or 32-bit ints and so on. If one wants to
>>write
>>portable code to access those kinds of files -- and I do want to do
>>that --
>>one needs exact control of the internal binary data fomats of the
>>various
>>data types in the C program.
>
>
>
> You can read a binary file without knowing the exact size of an
> integer. If you e.g. know that a char is at least 8 bits, you know it's
> safe to read 8 bits from a file into that char. If you know an int is
> at least 16 bits, you also know it's safe to read 8 bits from your file
> into an int.

How do you read N bits in C? The C standard only allows for reading in terms of
bytes, and a byte is defined as sizeof(char). So, on systems with say
CHAR_BITS=10, reading a byte from a file would read 10 bits.

>
> Second, "exact control of the internal binary data fomats" can be
> achieved using bitfields. However, it is often inefficient and not
> portable at all. Reading and writing binary files are commonly done
> using bitfields. Using bitfields, you can specify exactly the size of
> the data being read or written. A bitfield is a packed struct where the
> exact size (in bits) of each member is specified:
>
> http://publications.gbdirect.co.uk/c_book/chapter6/bitfields.html
>

Bitfields can be less portable than binary files! As your reference states, you
can't get pointers to the bitfields, so how would you read an 8-bit char into an
8-bit bitfield?

>
>
>
>>you indicate below, you do need the unequivocal definition of a char
>>as exactly 8 bits.
>
>
> No. There are also cases where a char is 16 bits. There are also cases
> where a hardware byte is 4 bits and a C byte (char) is 8.
>
>
>>If the char was explicitly defined as 8 bits on all systems, there
>>would
>>be no problem.
>
>
> You need an effeicient mapping between a hardware byte and a C char. If
> a hardware byte is 12 bits, an 8 bits char will be very inefficient.
> All that C tells you is that a byte is at lest 8 bits. If a hardware
> byte is 6 bits, a C char would most likely be 12 bits.
>
>
>
>>Sure. Why not go one step further and avoid the obfuscating "char"
>>and work directly with bits?
>
>
> That is why C has bitfields.
>
>
>
>>I'll check if they are part of the ANSI C specification. If they are,
>>I'll find use for them.
>
>
> They are not a part of the ANSI C specification (C89). They are a part
> of the ISO C specification (C99).
>

Subject: Binary file from C++ to Matlab

From: Andy Johnson

Date: 29 Mar, 2006 08:23:55

Message: 18 of 19

Christopher Hulbert wrote:
>
>
> sturlamolden wrote:
>> Rune Allnor wrote:
>>
>>
>>>I have more than once been in the situation of the OP, where
I
> have
>>>some binary file format where the file fomat is specified in
> terms of
>>>8-bit chars, 16-bit ints, or 32-bit ints and so on. If one
wants
> to
>>>write
>>>portable code to access those kinds of files -- and I do
want to
> do
>>>that --
>>>one needs exact control of the internal binary data fomats
of the
>>>various
>>>data types in the C program.
>>
>>
>>
>> You can read a binary file without knowing the exact size of an
>> integer. If you e.g. know that a char is at least 8 bits, you
> know it's
>> safe to read 8 bits from a file into that char. If you know an
> int is
>> at least 16 bits, you also know it's safe to read 8 bits from
> your file
>> into an int.
>
> How do you read N bits in C? The C standard only allows for
> reading in terms of
> bytes, and a byte is defined as sizeof(char). So, on systems with
> say
> CHAR_BITS=10, reading a byte from a file would read 10 bits.
>
>>
>> Second, "exact control of the internal binary data fomats" can
be
>> achieved using bitfields. However, it is often inefficient and
> not
>> portable at all. Reading and writing binary files are commonly
> done
>> using bitfields. Using bitfields, you can specify exactly the
> size of
>> the data being read or written. A bitfield is a packed struct
> where the
>> exact size (in bits) of each member is specified:
>>
>> <http://publications.gbdirect.co.uk/c_book/chapter6/bitfields.html>
>>
>
> Bitfields can be less portable than binary files! As your
> reference states, you
> can't get pointers to the bitfields, so how would you read an 8-bit
> char into an
> 8-bit bitfield?
>
>>
>>
>>
>>>you indicate below, you do need the unequivocal definition
of a
> char
>>>as exactly 8 bits.
>>
>>
>> No. There are also cases where a char is 16 bits. There are
also
> cases
>> where a hardware byte is 4 bits and a C byte (char) is 8.
>>
>>
>>>If the char was explicitly defined as 8 bits on all systems,
> there
>>>would
>>>be no problem.
>>
>>

I can't but step in here and put in my two cents. If I write code
that must be compiled to run on an "unknown" system and I need to
read/write a binary file with 8 bit integer data (for instance), I
have to be very concerned with the size of char's on the machine it
will be compiled on. I could easily (and incorrectly) write a simple
function that reads/writes the data as chars, only to find that it is
totally NOT portable across machines precisely because of the loose
definition of char. I would agree that the loose definition allows
for creating code that runs efficiently across machines (code will
run on an 8 bit or 12 bit native machine "nicely" when compiled for
that machine), but as soon as you need to interface with a binary
data file that may have been created with a different machine, you
have potentially serious problems. There's a big difference between
reading/writing files always on the same machine vs. reading/writing
files on different machines. And that's where the trap lies. Maybe
you don't need to know the exact size of a char (for internal
portions of the code), but you must have a way to have a type that is
precisely 8 bits in order to read a file like this. That's when it's
too easy to use char and get burned.

Subject: Binary file from C++ to Matlab

From: Rune Allnor

Date: 31 Mar, 2006 01:39:06

Message: 19 of 19


sturlamolden wrote:
> Rune Allnor wrote:
> > If the char was explicitly defined as 8 bits on all systems, there
> > would
> > be no problem.
>
> You need an effeicient mapping between a hardware byte and a C char.

Exactly. As a software programmer, I would appreciate to be relieved
of that effort. The compiler designer, who made the ANSI/ISO C compiler
that runs on my local HW system, ought to take on the task of handling
that mapping. I am not interested in handling all those sorts of
details
on each and every system, I would just want to handle a specified
number of bits in the binary file.

> If
> a hardware byte is 12 bits, an 8 bits char will be very inefficient.

Sure. But that's a problem I leave for the HW people. I deal with
binary
data formats that (usually) specify data fields in multiples of 8 bits.

I need to be able to access such fields in one line of code, not three
pages of conditional compiler directives and bit manipulations.

> > I'll check if they are part of the ANSI C specification. If they are,
> > I'll find use for them.
>
> They are not a part of the ANSI C specification (C89). They are a part
> of the ISO C specification (C99).

OK. What is the relation between ANSI C and ISO C? Is ANSI C a
subset of ISO C? Will the ANSI specification be updated in the future?
Or the ISO specification? Which is/will be the more portable version?

Rune

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

envelope graphic E-mail this page to a colleague

Public Submission Policy
NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.
Related Topics