Path: news.mathworks.com!newsfeed-00.mathworks.com!newsfeed2.dallas1.level3.net!news.level3.com!postnews.google.com!r29g2000hsg.googlegroups.com!not-for-mail
From: Phil Winder <philipwinder@googlemail.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Matlab Vectorisation Speed - How is it done in c++?
Date: Mon, 17 Dec 2007 06:47:25 -0800 (PST)
Organization: http://groups.google.com
Lines: 133
Message-ID: <3190b6f8-9525-4430-b582-9b9a5b00475b@r29g2000hsg.googlegroups.com>
References: <eb177713-6655-4454-bbf6-92d2c91bb6a6@s19g2000prg.googlegroups.com> 
NNTP-Posting-Host: 150.237.47.38
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Trace: posting.google.com 1197902846 4249 127.0.0.1 (17 Dec 2007 14:47:26 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Mon, 17 Dec 2007 14:47:26 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: r29g2000hsg.googlegroups.com; posting-host=150.237.47.38; 
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.8.1.11) 
X-HTTP-Via: 1.1 new-wwwcache2.hull.ac.uk:3128 (squid/2.6.STABLE16)
Xref: news.mathworks.com comp.soft-sys.matlab:442756



On Dec 17, 1:42 pm, "Tim Davis" <da...@cise.ufl.edu> wrote:
> sturlamolden <sturlamol...@yahoo.no> wrote in message
>
> <08100346-586e-41fb-bb41-9d9342d26...@w40g2000hsb.googlegroups.com>...
>
> > On 17 Des, 01:05, Phil Winder
> <philipwin...@googlemail.com> wrote:
>
> > > Im currently porting some matlab algorithms to c++ code.
>  The test
> > > code I have is testing the vector math capabilities and
> how fast they
> > > can go.  I have found that it can be very, very fast and
> I am
> > > strugglling to reproduce the speed in c++. How does
> matlab do it? And
> > > how can it be reproduced in c++?
>
> > Beating the performance of vectorized Matlab code is very
> hard, and
> > usually not worth the effort.
>
> > Matlab makes calls to optimized C and Fortran libraries
> such as blas/
> > atlas, lapack and fftw. You cannot duplicate their
> efficacies in C++
> > for at least two reasons:
>
> > 1. There are issues related to the language syntax that
> makes Fortran
> > particularly easy to optimize for compilers, such as lack
> of pointer
> > aliasing. This is particularly important for optimal
> allocation of
> > registers when the CPU goes into a tight loop.
>
> > 2. A lot of effort have been put into making these
> libraries as fast
> > as possible. This includes optimal use of cache and branch
> prediction.
> > Duplicating these efforts on your own is going to take the
> rest of
> > your life to complete.
>
> > My advice would be this:
>
> > If you want speed in your C++ app, link and call the same
> libraries as
> > Matlab do. Most of them are available for free. Give the
> C++ compiler
> > pointer aliasing hints wherever possible.
>
> > In addition:
>
> > Use optimization level 3 on numerical code and level 2 on non-
> > numerical code. Process the data in chunks that fit in
> your L1 cache.
> > Force the CPU to prefetch if you know it will help.
> Page-align your
> > arrays in RAM.  Memory access is terribly slow, traverse
> as few times
> > as possible. Never use strided memory access. Manually
> unroll tight
> > loops. Exploit arithmetic pipelining of four subsequent
> operations.
> > Avoid divisions, transform to a multiplcation. Exploit
> multiple CPUs:
> > use MPI or OpenMP, forkjoin with labour-sharing
>
> threadpools, etc. Use
>
>
>
> > inline assembly to access SIMD parallel registers.
>
> > Also remember Hoare's statement about optimization, quoted
> by D.
> > Knuth: "Premature optimization is the root of all evil in
> computer
> > programming." Profile your code. Direct your optimizations
> to the
> > important bottlenecks. They are likely to be few. Do as
> much as you
> > can with the bottlenecks, and never mind the reminding 90%
> of your
> > code.
>
> > But before you begin: ask yourself if the hard work is
> goint to be
> > worth the effort.
>
> Regarding (1):  I write in C and I haven't found (1) to be
> that much of an issue (although I do worry about it and it's
> well worth it for you to mention here).  I think the more
> recent versions of gcc are able to work around this issue.
> More serious for C is the abuse of pointers (indirect
> addressing, which requires lots of memory traffic).  Memory
> traffic is more of a problem than register allocation,
> anyway (which you point out too, regarding the stride issue)..
>
> Regarding (2):  Yes, that's definitely true.  One could
> write an m-file script that does x=A\b without backslash, in
> maybe 50 lines of M (LU factorization if square, QR with
> Householder if rectangular).  Backslash itself has maybe
> 250,000 lines of code (that's a guess, but an educated one
> since I wrote about half that).
>
> Some vector operations are trivial (a = b+c) to write in C
> or Fortran.  If you write a=b*c where b and c are matrices,
> then there's no way you'll match performance in an optimized
> BLAS library.
>
> Rule of thumb: if the work is O(n) where n is the size of
> the data, then there's a decent chance that simple C or
> Fortran code can match (not beat) MATLAB.  If the work is
> higher than O(n) than you probably can't beat MATLAB with
> simple C.  Matrix add fits in the former category; matrix
> multiply doesn't.
>
> You can always call the BLAS / LAPACK yourself, in the dense
> case, or use available C code for the sparse case.  Lots of
> the code in x=a*b, x=A\b, etc is open source.

Tim: Thanks for the info, I think the way to go is to look into the
open source libraries you talk about. Presumably someone has done all
this before, so I don't think it should be too hard to find the
libraries I am looking for.
Steve: Good point, but I am still looking to move away from Matlab
code.

Thanks,

Phil Winder