Path: news.mathworks.com!newsfeed-00.mathworks.com!newsfeed2.dallas1.level3.net!news.level3.com!postnews.google.com!v30g2000yqm.googlegroups.com!not-for-mail
From: roger <northsolomonsea@gmail.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: matlab typecast function very slow
Date: Tue, 3 Nov 2009 05:18:39 -0800 (PST)
Organization: http://groups.google.com
Lines: 86
Message-ID: <7ea6bffe-47f8-4b60-aab0-18a1766eae71@v30g2000yqm.googlegroups.com>
References: <05e24510-1633-4d87-9cbb-39230a6efeec@n35g2000yqm.googlegroups.com> 
	<hcp8ru$d11$1@fred.mathworks.com>
NNTP-Posting-Host: 195.193.213.214
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1257254319 4921 127.0.0.1 (3 Nov 2009 13:18:39 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Tue, 3 Nov 2009 13:18:39 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: v30g2000yqm.googlegroups.com; posting-host=195.193.213.214; 
	posting-account=BbKioQoAAAAt_SMfLTBT5PYUV9nQycia
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 
	InfoPath.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.04506.648; 
	.NET CLR 3.5.21022),gzip(gfe),gzip(gfe)
Xref: news.mathworks.com comp.soft-sys.matlab:582033


On Nov 3, 1:49 pm, "Bruno Luong" <b.lu...@fogale.findmycountry> wrote:
> roger <northsolomon...@gmail.com> wrote in message <05e24510-1633-4d87-9cbb-39230a6ef...@n35g2000yqm.googlegroups.com>...
> > Hi,
>
> > I have a function that reads large binary files by reading a couple of
> > numbers per time in a complicated loop using fread.
>
> > Since the function isn't too fast I thought I'd read the entire file
> > at once as uint8 and then typecast byte by byte to the relevant
> > datatypes.
>
> Why doing byte-by-byte? Use typecast for the whole array. Many function Matlab function is not designed to called many times, but rather on an array. Example:
>
> b=zeros(8*1024^2,1,'uint8');
>
> tic;
> d=typecast(b,'double');
> toc % Elapsed time is 0.012408 seconds.
>
> tic;
> % a already allocated
> k=1;
> for i=1:8:length(b)
>     d(k)=typecast(b(i+(0:7)),'double');
> end
> toc % 40 sec
>
>
>
>
>
>
>
> > To my surprise the typecast method is actually a _lot_ slower than
> > consecutively using fread, on inspection the profiler shows that the
> > typecast function is the main culprit.
>
> > The function that serves the typecast.m function is a mex file called
> > typecastc.mexw32 (on my 32 bit system that is). It seems to be a
> > function which is compiled from an accompying c file that is located
> > in the same private directory: typecastc.c
>
> > When I look at the c source file I see that the function itself
> > doesn't do any typecasting, it just creates a relevant datatype with
> > the mex mxCreateNumericMatrix and then leaves the actual typecasting
> > to the mx libraries.
>
> > So all in all the typecasting slowness seems a result of matlab's
> > inability to deal with c-style pointers from m-files and the penalty
> > for this is the mx library overhead.
>
> > An odd situation.
>
> > Anyone have a better method (than the mathworks have ;) for reading/
> > typecasting?
>
> Alternatively use MEMMAPFILE to read binary data.
>
> or MEX it, but first try to use TYPECAST on the whole array.
>
> Bruno- Hide quoted text -
>
> - Show quoted text -

Hi Bruno,
Thanks for your reply.

The trouble is that the file contains a number of different datatypes
in nested structures so that it is not really possibly to cast the
entire file or parts of it. I guess the same issue applies to using
memmapfile and I doubt it is faster that way.

I guess the fastest reading strategy would just to be to transfer the
code to a C++ mex function and do all the reading/converting there and
then transfer the whole lot back. I was just a bit surprised that
something as elementary as a typecast would take so long in matlab.

R