Path: news.mathworks.com!newsfeed-00.mathworks.com!newsfeed2.dallas1.level3.net!news.level3.com!postnews.google.com!s8g2000prg.googlegroups.com!not-for-mail
From: Phil Winder <philipwinder@googlemail.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Matlab Vectorisation Speed - How is it done in c++?
Date: Mon, 17 Dec 2007 02:09:45 -0800 (PST)
Organization: http://groups.google.com
Lines: 62
Message-ID: <e9d409cb-5623-4c07-83d6-45bf2bd0df7b@s8g2000prg.googlegroups.com>
References: <eb177713-6655-4454-bbf6-92d2c91bb6a6@s19g2000prg.googlegroups.com> 
NNTP-Posting-Host: 213.249.237.113
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Trace: posting.google.com 1197886186 13135 127.0.0.1 (17 Dec 2007 10:09:46 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Mon, 17 Dec 2007 10:09:46 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: s8g2000prg.googlegroups.com; posting-host=213.249.237.113; 
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.8.1.11) 
Xref: news.mathworks.com comp.soft-sys.matlab:442713



On Dec 17, 1:36 am, sturlamolden <sturlamol...@yahoo.no> wrote:
> On 17 Des, 01:05, Phil Winder <philipwin...@googlemail.com> wrote:
>
> > Im currently porting some matlab algorithms to c++ code.  The test
> > code I have is testing the vector math capabilities and how fast they
> > can go.  I have found that it can be very, very fast and I am
> > strugglling to reproduce the speed in c++. How does matlab do it? And
> > how can it be reproduced in c++?
>
> Beating the performance of vectorized Matlab code is very hard, and
> usually not worth the effort.
>
> Matlab makes calls to optimized C and Fortran libraries such as blas/
> atlas, lapack and fftw. You cannot duplicate their efficacies in C++
> for at least two reasons:
>
> 1. There are issues related to the language syntax that makes Fortran
> particularly easy to optimize for compilers, such as lack of pointer
> aliasing. This is particularly important for optimal allocation of
> registers when the CPU goes into a tight loop.
>
> 2. A lot of effort have been put into making these libraries as fast
> as possible. This includes optimal use of cache and branch prediction.
> Duplicating these efforts on your own is going to take the rest of
> your life to complete.
>
> My advice would be this:
>
> If you want speed in your C++ app, link and call the same libraries as
> Matlab do. Most of them are available for free. Give the C++ compiler
> pointer aliasing hints wherever possible.
>
> In addition:
>
> Use optimization level 3 on numerical code and level 2 on non-
> numerical code. Process the data in chunks that fit in your L1 cache.
> Force the CPU to prefetch if you know it will help. Page-align your
> arrays in RAM.  Memory access is terribly slow, traverse as few times
> as possible. Never use strided memory access. Manually unroll tight
> loops. Exploit arithmetic pipelining of four subsequent operations.
> Avoid divisions, transform to a multiplcation. Exploit multiple CPUs:
> use MPI or OpenMP, forkjoin with labour-sharing threadpools, etc. Use
> inline assembly to access SIMD parallel registers.
>
> Also remember Hoare's statement about optimization, quoted by D.
> Knuth: "Premature optimization is the root of all evil in computer
> programming." Profile your code. Direct your optimizations to the
> important bottlenecks. They are likely to be few. Do as much as you
> can with the bottlenecks, and never mind the reminding 90% of your
> code.
>
> But before you begin: ask yourself if the hard work is goint to be
> worth the effort.

Great reply. Thanks for the detail
Would it not also be easier to compile my matlab code into a dll and
link that from my c++ program? Thus using matlabs optimisations before
I even call it?

Thanks,

Phil