Thread Subject: matlab typecast function very slow

Subject: matlab typecast function very slow

From: roger

Date: 3 Nov, 2009 11:30:46

Message: 1 of 7

Hi,

I have a function that reads large binary files by reading a couple of
numbers per time in a complicated loop using fread.

Since the function isn't too fast I thought I'd read the entire file
at once as uint8 and then typecast byte by byte to the relevant
datatypes.

To my surprise the typecast method is actually a _lot_ slower than
consecutively using fread, on inspection the profiler shows that the
typecast function is the main culprit.

The function that serves the typecast.m function is a mex file called
typecastc.mexw32 (on my 32 bit system that is). It seems to be a
function which is compiled from an accompying c file that is located
in the same private directory: typecastc.c

When I look at the c source file I see that the function itself
doesn't do any typecasting, it just creates a relevant datatype with
the mex mxCreateNumericMatrix and then leaves the actual typecasting
to the mx libraries.

So all in all the typecasting slowness seems a result of matlab's
inability to deal with c-style pointers from m-files and the penalty
for this is the mx library overhead.

An odd situation.

Anyone have a better method (than the mathworks have ;) for reading/
typecasting?

best regards R

Subject: matlab typecast function very slow

From: Bruno Luong

Date: 3 Nov, 2009 12:49:02

Message: 2 of 7

roger <northsolomonsea@gmail.com> wrote in message <05e24510-1633-4d87-9cbb-39230a6efeec@n35g2000yqm.googlegroups.com>...
> Hi,
>
> I have a function that reads large binary files by reading a couple of
> numbers per time in a complicated loop using fread.
>
> Since the function isn't too fast I thought I'd read the entire file
> at once as uint8 and then typecast byte by byte to the relevant
> datatypes.

Why doing byte-by-byte? Use typecast for the whole array. Many function Matlab function is not designed to called many times, but rather on an array. Example:

b=zeros(8*1024^2,1,'uint8');

tic;
d=typecast(b,'double');
toc % Elapsed time is 0.012408 seconds.

tic;
% a already allocated
k=1;
for i=1:8:length(b)
    d(k)=typecast(b(i+(0:7)),'double');
end
toc % 40 sec

>
> To my surprise the typecast method is actually a _lot_ slower than
> consecutively using fread, on inspection the profiler shows that the
> typecast function is the main culprit.
>
> The function that serves the typecast.m function is a mex file called
> typecastc.mexw32 (on my 32 bit system that is). It seems to be a
> function which is compiled from an accompying c file that is located
> in the same private directory: typecastc.c
>
> When I look at the c source file I see that the function itself
> doesn't do any typecasting, it just creates a relevant datatype with
> the mex mxCreateNumericMatrix and then leaves the actual typecasting
> to the mx libraries.
>
> So all in all the typecasting slowness seems a result of matlab's
> inability to deal with c-style pointers from m-files and the penalty
> for this is the mx library overhead.
>
> An odd situation.
>
> Anyone have a better method (than the mathworks have ;) for reading/
> typecasting?
>

Alternatively use MEMMAPFILE to read binary data.

or MEX it, but first try to use TYPECAST on the whole array.

Bruno

Subject: matlab typecast function very slow

From: roger

Date: 3 Nov, 2009 13:18:39

Message: 3 of 7

On Nov 3, 1:49 pm, "Bruno Luong" <b.lu...@fogale.findmycountry> wrote:
> roger <northsolomon...@gmail.com> wrote in message <05e24510-1633-4d87-9cbb-39230a6ef...@n35g2000yqm.googlegroups.com>...
> > Hi,
>
> > I have a function that reads large binary files by reading a couple of
> > numbers per time in a complicated loop using fread.
>
> > Since the function isn't too fast I thought I'd read the entire file
> > at once as uint8 and then typecast byte by byte to the relevant
> > datatypes.
>
> Why doing byte-by-byte? Use typecast for the whole array. Many function Matlab function is not designed to called many times, but rather on an array. Example:
>
> b=zeros(8*1024^2,1,'uint8');
>
> tic;
> d=typecast(b,'double');
> toc % Elapsed time is 0.012408 seconds.
>
> tic;
> % a already allocated
> k=1;
> for i=1:8:length(b)
>     d(k)=typecast(b(i+(0:7)),'double');
> end
> toc % 40 sec
>
>
>
>
>
>
>
> > To my surprise the typecast method is actually a _lot_ slower than
> > consecutively using fread, on inspection the profiler shows that the
> > typecast function is the main culprit.
>
> > The function that serves the typecast.m function is a mex file called
> > typecastc.mexw32 (on my 32 bit system that is). It seems to be a
> > function which is compiled from an accompying c file that is located
> > in the same private directory: typecastc.c
>
> > When I look at the c source file I see that the function itself
> > doesn't do any typecasting, it just creates a relevant datatype with
> > the mex mxCreateNumericMatrix and then leaves the actual typecasting
> > to the mx libraries.
>
> > So all in all the typecasting slowness seems a result of matlab's
> > inability to deal with c-style pointers from m-files and the penalty
> > for this is the mx library overhead.
>
> > An odd situation.
>
> > Anyone have a better method (than the mathworks have ;) for reading/
> > typecasting?
>
> Alternatively use MEMMAPFILE to read binary data.
>
> or MEX it, but first try to use TYPECAST on the whole array.
>
> Bruno- Hide quoted text -
>
> - Show quoted text -

Hi Bruno,
Thanks for your reply.

The trouble is that the file contains a number of different datatypes
in nested structures so that it is not really possibly to cast the
entire file or parts of it. I guess the same issue applies to using
memmapfile and I doubt it is faster that way.

I guess the fastest reading strategy would just to be to transfer the
code to a C++ mex function and do all the reading/converting there and
then transfer the whole lot back. I was just a bit surprised that
something as elementary as a typecast would take so long in matlab.

R

Subject: matlab typecast function very slow

From: James Tursa

Date: 3 Nov, 2009 14:26:02

Message: 4 of 7

roger <northsolomonsea@gmail.com> wrote in message <05e24510-1633-4d87-9cbb-39230a6efeec@n35g2000yqm.googlegroups.com>...
>
> Since the function isn't too fast I thought I'd read the entire file
> at once as uint8 and then typecast byte by byte to the relevant
> datatypes.
>
> To my surprise the typecast method is actually a _lot_ slower than
> consecutively using fread, on inspection the profiler shows that the
> typecast function is the main culprit.

If you want a fast typecast function, use my FEX submission that can be found here:

http://www.mathworks.com/matlabcentral/fileexchange/17476-typecast-c-mex-function

The MATLAB built-in typecast does a data copy to generate the result. My version of typecast does not do a data copy, but does a shared data copy, so it runs faster. The speed improvement will depend on the size of the data. My files are names typecast.m and typecast.c, but you could rename them to something else if desired.

James Tursa

Subject: matlab typecast function very slow

From: roger

Date: 3 Nov, 2009 14:54:32

Message: 5 of 7

On Nov 3, 3:26 pm, "James Tursa"
<aclassyguy_with_a_k_not_...@hotmail.com> wrote:
> roger <northsolomon...@gmail.com> wrote in message <05e24510-1633-4d87-9cbb-39230a6ef...@n35g2000yqm.googlegroups.com>...
>
> > Since the function isn't too fast I thought I'd read the entire file
> > at once as uint8 and then typecast byte by byte to the relevant
> > datatypes.
>
> > To my surprise the typecast method is actually a _lot_ slower than
> > consecutively using fread, on inspection the profiler shows that the
> > typecast function is the main culprit.
>
> If you want a fast typecast function, use my FEX submission that can be found here:
>
> http://www.mathworks.com/matlabcentral/fileexchange/17476-typecast-c-...
>
> The MATLAB built-in typecast does a data copy to generate the result. My version of typecast does not do a data copy, but does a shared data copy, so it runs faster. The speed improvement will depend on the size of the data. My files are names typecast.m and typecast.c, but you could rename them to something else if desired.
>
> James Tursa

James,
Many thanks, although I'll probably write the whole reading/converting
part in C++ and transfer the lot back to matlab via mx the shared data
copy is very interesting!
regards, R

Subject: matlab typecast function very slow

From: James Tursa

Date: 3 Nov, 2009 16:31:01

Message: 6 of 7

roger <northsolomonsea@gmail.com> wrote in message <0598df9f-fdca-4bb3-bbb0-22b6ff33331c@a32g2000yqm.googlegroups.com>...
>
> ... although I'll probably write the whole reading/converting
> part in C++ and transfer the lot back to matlab via mx the shared data
> copy is very interesting!

Tip if you go this route. First create your return variables with the mxCreate___ functions, then get the data pointers to them with the mxGetPr function. Then use *these* pointers in your reading. That way the data gets read directly into the mxArray variable that you want returned to MATLAB and you will avoid an extra data copy.

James Tursa

Subject: matlab typecast function very slow

From: roger

Date: 3 Nov, 2009 16:54:58

Message: 7 of 7

On Nov 3, 5:31 pm, "James Tursa"
<aclassyguy_with_a_k_not_...@hotmail.com> wrote:
> roger <northsolomon...@gmail.com> wrote in message <0598df9f-fdca-4bb3-bbb0-22b6ff333...@a32g2000yqm.googlegroups.com>...
>
> > ... although I'll probably write the whole reading/converting
> > part in C++ and transfer the lot back to matlab via mx the shared data
> > copy is very interesting!
>
> Tip if you go this route. First create your return variables with the mxCreate___ functions, then get the data pointers to them with the mxGetPr function. Then use *these* pointers in your reading. That way the data gets read directly into the mxArray variable that you want returned to MATLAB and you will avoid an extra data copy.
>
> James Tursa

Thanks.

What I was thinking of is an approach that lets me extract/interpret
data either directly from file or from a "raw" data array depending on
available memory. If there's enough memory I can read everything at
once into the raw data array and extract/intrepret from there, if
there's not I'll have to read part by part from file which may be a
bit slower. The extracted data will go directly into the relevant
mxArray to avoid yet another copy of the data.

In any case, it should be a lot faster than m-code, I've noticed in
the past that especially loops are _much_ quicker in compiled C code.

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

Contact us at files@mathworks.com