Path: news.mathworks.com!newsfeed-00.mathworks.com!newsfeed2.dallas1.level3.net!news.level3.com!postnews.google.com!f3g2000hsg.googlegroups.com!not-for-mail
From: "Steven G. Johnson" <stevenj@alum.mit.edu>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Matlab Vectorisation Speed - How is it done in c++?
Date: Mon, 17 Dec 2007 13:22:28 -0800 (PST)
Organization: http://groups.google.com
Lines: 54
Message-ID: <825523e3-b124-44a4-b82f-7b01b3495029@f3g2000hsg.googlegroups.com>
References: <eb177713-6655-4454-bbf6-92d2c91bb6a6@s19g2000prg.googlegroups.com> 
NNTP-Posting-Host: 18.87.0.80
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Trace: posting.google.com 1197926549 21317 127.0.0.1 (17 Dec 2007 21:22:29 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Mon, 17 Dec 2007 21:22:29 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: f3g2000hsg.googlegroups.com; posting-host=18.87.0.80; 
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; 
	rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11,gzip(gfe),gzip(gfe)
Xref: news.mathworks.com comp.soft-sys.matlab:442836



On Dec 17, 8:42 am, "Tim Davis" <da...@cise.ufl.edu> wrote:
> > 1. There are issues related to the language syntax that
> makes Fortran
> > particularly easy to optimize for compilers, such as lack
> of pointer
> > aliasing. This is particularly important for optimal
> allocation of
> > registers when the CPU goes into a tight loop.
>
> Regarding (1):  I write in C and I haven't found (1) to be
> that much of an issue (although I do worry about it and it's
> well worth it for you to mention here).  I think the more
> recent versions of gcc are able to work around this issue.
> More serious for C is the abuse of pointers (indirect
> addressing, which requires lots of memory traffic).  Memory
> traffic is more of a problem than register allocation,
> anyway (which you point out too, regarding the stride issue)..

The old canard about pointer aliasing semantics being weaker in C than
in Fortran hasn't been an issue even in principle for almost 10 years
now, since the 1999 C standard introduced the "restrict" keyword.  In
practice, I've never found it to be a major practical issue in highly
optimized code, since for key loops you often want to partially unroll
them yourself anyway, and in any case higher-level memory-access
patterns are usually more important for performance.

Regarding the "abuse of pointers" I'm not sure what you're talking
about.  Array access in C, properly implemented, requires no more or
less pointer indirection than in Fortran or any other language.

It's a good learning exercise, by the way, to implement a matrix
multiply yourself and compare it to a fast BLAS implementation.  Even
if you turn off things like SSE2 instructions, it is probably a factor
of 6 faster than your first try, for a decent-sized matrix.  On the
other hand, matrix multiplication is simple enough that it's not *too*
hard to get at least reasonably close to a fast BLAS if you have some
notion of what you are doing.  (I had a class once a few years ago
where there was a contest to write a dgemm as fast as possible, and at
least one student beat the fastest free BLAS at the time for at least
one matrix size.)

I once had an old Fortran programmer remark to me, "A matrix multiply
is just three loops!  How many possible ways can there be to implement
it?"  Recently, I told that story to an old compiler engineer, and he
immediately responded "Six ways (3 factorial), and I once wrote a
compiler that automatically found the best loop order."  The correct
answer (neglecting exotic algorithms like Strassen etc. that no one
uses) is closer to n^3 factorial, since the n^3 multiplications all
commute.  Programming was simpler when floating-point arithmetic
dominated the runtime and all you had to worry about was the operation
count.

Regards,
Steven G. Johnson