Thread Subject: Working with huge binary files

Subject: Working with huge binary files

From: Christoph

Date: 15 Oct, 2009 03:43:01

Message: 1 of 3

Hello everyone,

I've got a 1.3 GB binary file full of data, and no idea how to work
with it. It's 16 bit chunks, bits 0-13 hold numerical data, 14 and 15
are one status bit, respectively, so I have 3 categories of data.
Ideally I would like to have this data in a Nx3 matrix.

So, problem #1: How to get the data and parse it meaningfully?
I have tried fopen and fread, and I can convert the whole 16 bit chunk
into a number, but that's not what I want. I also briefly played with
memmapfile. Which leads me to problem #2:
The file is so huge, I can't open it directly, or assign m.data (in
the memmapfile case) to another variable, or I will run out of
memory.

Any ideas? Help is very much appreciated!!

Christoph

Subject: Working with huge binary files

From: Sebastiaan

Date: 15 Oct, 2009 07:59:05

Message: 2 of 3

Christoph <junk@zerodeviation.net> wrote in message <9c0d1413-d8bf-42fc-bb3b-08e73d8f950b@o10g2000yqa.googlegroups.com>...
> Hello everyone,
>
> I've got a 1.3 GB binary file full of data, and no idea how to work
> with it. It's 16 bit chunks, bits 0-13 hold numerical data, 14 and 15
> are one status bit, respectively, so I have 3 categories of data.
> Ideally I would like to have this data in a Nx3 matrix.
>
> So, problem #1: How to get the data and parse it meaningfully?
> I have tried fopen and fread, and I can convert the whole 16 bit chunk
> into a number, but that's not what I want. I also briefly played with
> memmapfile. Which leads me to problem #2:
> The file is so huge, I can't open it directly, or assign m.data (in
> the memmapfile case) to another variable, or I will run out of
> memory.
>
> Any ideas? Help is very much appreciated!!
>
> Christoph

Read the 16 bit chunks with fopen and store it to the appropiate datatype (e.g. uint16).

Then use bitand and bitor to check. Say you have this chunk:
1000111101011010

This is stored in uint16 as the following value:
>> a = uint16(bin2dec('1000111101011010'))
  36698

i.e. you read this number from your file.

To check for bit 14 and 15:
>> check14 = uint16(bin2dec('0100000000000000'))
>> check15 = uint16(bin2dec('1000000000000000'))
>> mask = uint16(bin2dec('0011111111111111'))
>> MyMatrix = zeros(N, 3);

>> MyMatrix(j, 1) = bitand(a, mask);
>> MyMatrix(j, 2) = bitand(a, check14);
>> MyMatrix(j, 3) = bitand(a, check15);

Check:
dec2bin(MyMatrix(j,1), 16)
Left 2 bits should be zero.

Another note: I assumed you use the little-endian scheme (least significant bit is right, so your 0-13 reads from right to left).

If this is the other way around, try reading the file with the big-endian option, or use something custom to flip the bits.

Sebastiaan

Subject: Working with huge binary files

From: Christoph

Date: 15 Oct, 2009 15:09:45

Message: 3 of 3

On Oct 15, 2:59 am, "Sebastiaan "
<s.breedv...@erasmusmc.REMOVE.BOO.BOO.nl> wrote:

> Read the 16 bit chunks with fopen and store it to the appropiate datatype (e.g. uint16).
>
> Then use bitand and bitor to check. Say you have this chunk:
> 1000111101011010
>
> This is stored in uint16 as the following value:>> a = uint16(bin2dec('1000111101011010'))
>
>   36698
>
> i.e. you read this number from your file.
>
> To check for bit 14 and 15:
>
> >> check14 = uint16(bin2dec('0100000000000000'))
> >> check15 = uint16(bin2dec('1000000000000000'))
> >> mask = uint16(bin2dec('0011111111111111'))
> >> MyMatrix = zeros(N, 3);
> >> MyMatrix(j, 1) = bitand(a, mask);
> >> MyMatrix(j, 2) = bitand(a, check14);
> >> MyMatrix(j, 3) = bitand(a, check15);
>
> Check:
> dec2bin(MyMatrix(j,1), 16)
> Left 2 bits should be zero.
>
> Another note: I assumed you use the little-endian scheme (least significant bit is right, so your 0-13 reads from right to left).
>
> If this is the other way around, try reading the file with the big-endian option, or use something custom to flip the bits.
>
> Sebastiaan

Thanks Sebastiaan! That's how I would have done it in assembly, guess
I just didn't make the connection.
Appreciate it!

Christoph

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

Contact us at files@mathworks.com