Thread Subject: Matlab Vectorisation Speed - How is it done in c++?

Subject: Matlab Vectorisation Speed - How is it done in c++?

From: Phil Winder

Date: 17 Dec, 2007 00:05:32

Message: 1 of 11

Hi,
Im currently porting some matlab algorithms to c++ code. The test
code I have is testing the vector math capabilities and how fast they
can go. I have found that it can be very, very fast and I am
strugglling to reproduce the speed in c++. How does matlab do it? And
how can it be reproduced in c++?

Thanks,
Phil Winder

Subject: Matlab Vectorisation Speed - How is it done in c++?

From: sturlamolden

Date: 17 Dec, 2007 01:36:05

Message: 2 of 11

On 17 Des, 01:05, Phil Winder <philipwin...@googlemail.com> wrote:

> Im currently porting some matlab algorithms to c++ code. The test
> code I have is testing the vector math capabilities and how fast they
> can go. I have found that it can be very, very fast and I am
> strugglling to reproduce the speed in c++. How does matlab do it? And
> how can it be reproduced in c++?

Beating the performance of vectorized Matlab code is very hard, and
usually not worth the effort.

Matlab makes calls to optimized C and Fortran libraries such as blas/
atlas, lapack and fftw. You cannot duplicate their efficacies in C++
for at least two reasons:

1. There are issues related to the language syntax that makes Fortran
particularly easy to optimize for compilers, such as lack of pointer
aliasing. This is particularly important for optimal allocation of
registers when the CPU goes into a tight loop.

2. A lot of effort have been put into making these libraries as fast
as possible. This includes optimal use of cache and branch prediction.
Duplicating these efforts on your own is going to take the rest of
your life to complete.

My advice would be this:

If you want speed in your C++ app, link and call the same libraries as
Matlab do. Most of them are available for free. Give the C++ compiler
pointer aliasing hints wherever possible.

In addition:

Use optimization level 3 on numerical code and level 2 on non-
numerical code. Process the data in chunks that fit in your L1 cache.
Force the CPU to prefetch if you know it will help. Page-align your
arrays in RAM. Memory access is terribly slow, traverse as few times
as possible. Never use strided memory access. Manually unroll tight
loops. Exploit arithmetic pipelining of four subsequent operations.
Avoid divisions, transform to a multiplcation. Exploit multiple CPUs:
use MPI or OpenMP, forkjoin with labour-sharing threadpools, etc. Use
inline assembly to access SIMD parallel registers.

Also remember Hoare's statement about optimization, quoted by D.
Knuth: "Premature optimization is the root of all evil in computer
programming." Profile your code. Direct your optimizations to the
important bottlenecks. They are likely to be few. Do as much as you
can with the bottlenecks, and never mind the reminding 90% of your
code.

But before you begin: ask yourself if the hard work is goint to be
worth the effort.

Subject: Matlab Vectorisation Speed - How is it done in c++?

From: Phil Winder

Date: 17 Dec, 2007 10:09:45

Message: 3 of 11

On Dec 17, 1:36 am, sturlamolden <sturlamol...@yahoo.no> wrote:
> On 17 Des, 01:05, Phil Winder <philipwin...@googlemail.com> wrote:
>
> > Im currently porting some matlab algorithms to c++ code. The test
> > code I have is testing the vector math capabilities and how fast they
> > can go. I have found that it can be very, very fast and I am
> > strugglling to reproduce the speed in c++. How does matlab do it? And
> > how can it be reproduced in c++?
>
> Beating the performance of vectorized Matlab code is very hard, and
> usually not worth the effort.
>
> Matlab makes calls to optimized C and Fortran libraries such as blas/
> atlas, lapack and fftw. You cannot duplicate their efficacies in C++
> for at least two reasons:
>
> 1. There are issues related to the language syntax that makes Fortran
> particularly easy to optimize for compilers, such as lack of pointer
> aliasing. This is particularly important for optimal allocation of
> registers when the CPU goes into a tight loop.
>
> 2. A lot of effort have been put into making these libraries as fast
> as possible. This includes optimal use of cache and branch prediction.
> Duplicating these efforts on your own is going to take the rest of
> your life to complete.
>
> My advice would be this:
>
> If you want speed in your C++ app, link and call the same libraries as
> Matlab do. Most of them are available for free. Give the C++ compiler
> pointer aliasing hints wherever possible.
>
> In addition:
>
> Use optimization level 3 on numerical code and level 2 on non-
> numerical code. Process the data in chunks that fit in your L1 cache.
> Force the CPU to prefetch if you know it will help. Page-align your
> arrays in RAM. Memory access is terribly slow, traverse as few times
> as possible. Never use strided memory access. Manually unroll tight
> loops. Exploit arithmetic pipelining of four subsequent operations.
> Avoid divisions, transform to a multiplcation. Exploit multiple CPUs:
> use MPI or OpenMP, forkjoin with labour-sharing threadpools, etc. Use
> inline assembly to access SIMD parallel registers.
>
> Also remember Hoare's statement about optimization, quoted by D.
> Knuth: "Premature optimization is the root of all evil in computer
> programming." Profile your code. Direct your optimizations to the
> important bottlenecks. They are likely to be few. Do as much as you
> can with the bottlenecks, and never mind the reminding 90% of your
> code.
>
> But before you begin: ask yourself if the hard work is goint to be
> worth the effort.

Great reply. Thanks for the detail
Would it not also be easier to compile my matlab code into a dll and
link that from my c++ program? Thus using matlabs optimisations before
I even call it?

Thanks,

Phil

Subject: Matlab Vectorisation Speed - How is it done in c++?

From: Tim Davis

Date: 17 Dec, 2007 13:42:47

Message: 4 of 11

sturlamolden <sturlamolden@yahoo.no> wrote in message
<08100346-586e-41fb-bb41-9d9342d269ed@w40g2000hsb.googlegroups.com>...
> On 17 Des, 01:05, Phil Winder
<philipwin...@googlemail.com> wrote:
>
> > Im currently porting some matlab algorithms to c++ code.
 The test
> > code I have is testing the vector math capabilities and
how fast they
> > can go. I have found that it can be very, very fast and
I am
> > strugglling to reproduce the speed in c++. How does
matlab do it? And
> > how can it be reproduced in c++?
>
> Beating the performance of vectorized Matlab code is very
hard, and
> usually not worth the effort.
>
> Matlab makes calls to optimized C and Fortran libraries
such as blas/
> atlas, lapack and fftw. You cannot duplicate their
efficacies in C++
> for at least two reasons:
>
> 1. There are issues related to the language syntax that
makes Fortran
> particularly easy to optimize for compilers, such as lack
of pointer
> aliasing. This is particularly important for optimal
allocation of
> registers when the CPU goes into a tight loop.
>
> 2. A lot of effort have been put into making these
libraries as fast
> as possible. This includes optimal use of cache and branch
prediction.
> Duplicating these efforts on your own is going to take the
rest of
> your life to complete.
>
> My advice would be this:
>
> If you want speed in your C++ app, link and call the same
libraries as
> Matlab do. Most of them are available for free. Give the
C++ compiler
> pointer aliasing hints wherever possible.
>
> In addition:
>
> Use optimization level 3 on numerical code and level 2 on non-
> numerical code. Process the data in chunks that fit in
your L1 cache.
> Force the CPU to prefetch if you know it will help.
Page-align your
> arrays in RAM. Memory access is terribly slow, traverse
as few times
> as possible. Never use strided memory access. Manually
unroll tight
> loops. Exploit arithmetic pipelining of four subsequent
operations.
> Avoid divisions, transform to a multiplcation. Exploit
multiple CPUs:
> use MPI or OpenMP, forkjoin with labour-sharing
threadpools, etc. Use
> inline assembly to access SIMD parallel registers.
>
> Also remember Hoare's statement about optimization, quoted
by D.
> Knuth: "Premature optimization is the root of all evil in
computer
> programming." Profile your code. Direct your optimizations
to the
> important bottlenecks. They are likely to be few. Do as
much as you
> can with the bottlenecks, and never mind the reminding 90%
of your
> code.
>
> But before you begin: ask yourself if the hard work is
goint to be
> worth the effort.



Regarding (1): I write in C and I haven't found (1) to be
that much of an issue (although I do worry about it and it's
well worth it for you to mention here). I think the more
recent versions of gcc are able to work around this issue.
More serious for C is the abuse of pointers (indirect
addressing, which requires lots of memory traffic). Memory
traffic is more of a problem than register allocation,
anyway (which you point out too, regarding the stride issue)..

Regarding (2): Yes, that's definitely true. One could
write an m-file script that does x=A\b without backslash, in
maybe 50 lines of M (LU factorization if square, QR with
Householder if rectangular). Backslash itself has maybe
250,000 lines of code (that's a guess, but an educated one
since I wrote about half that).

Some vector operations are trivial (a = b+c) to write in C
or Fortran. If you write a=b*c where b and c are matrices,
then there's no way you'll match performance in an optimized
BLAS library.

Rule of thumb: if the work is O(n) where n is the size of
the data, then there's a decent chance that simple C or
Fortran code can match (not beat) MATLAB. If the work is
higher than O(n) than you probably can't beat MATLAB with
simple C. Matrix add fits in the former category; matrix
multiply doesn't.

You can always call the BLAS / LAPACK yourself, in the dense
case, or use available C code for the sparse case. Lots of
the code in x=a*b, x=A\b, etc is open source.

Subject: Matlab Vectorisation Speed - How is it done in c++?

From: Steve Amphlett

Date: 17 Dec, 2007 13:55:30

Message: 5 of 11

Phil Winder <philipwinder@googlemail.com> wrote in message
<eb177713-6655-4454-bbf6-
92d2c91bb6a6@s19g2000prg.googlegroups.com>...
> Hi,
> Im currently porting some matlab algorithms to c++ code.
The test
> code I have is testing the vector math capabilities and
how fast they
> can go. I have found that it can be very, very fast and
I am
> strugglling to reproduce the speed in c++. How does
matlab do it? And
> how can it be reproduced in c++?
>
> Thanks,
> Phil Winder

If you can work "in place" you'll get a 10+ speedup:

>> myfunc(x);

rather than

x=myfunc(x);

All that memory allocation and copying is a waste.

Subject: Matlab Vectorisation Speed - How is it done in c++?

From: Phil Winder

Date: 17 Dec, 2007 14:47:25

Message: 6 of 11

On Dec 17, 1:42 pm, "Tim Davis" <da...@cise.ufl.edu> wrote:
> sturlamolden <sturlamol...@yahoo.no> wrote in message
>
> <08100346-586e-41fb-bb41-9d9342d26...@w40g2000hsb.googlegroups.com>...
>
> > On 17 Des, 01:05, Phil Winder
> <philipwin...@googlemail.com> wrote:
>
> > > Im currently porting some matlab algorithms to c++ code.
> The test
> > > code I have is testing the vector math capabilities and
> how fast they
> > > can go. I have found that it can be very, very fast and
> I am
> > > strugglling to reproduce the speed in c++. How does
> matlab do it? And
> > > how can it be reproduced in c++?
>
> > Beating the performance of vectorized Matlab code is very
> hard, and
> > usually not worth the effort.
>
> > Matlab makes calls to optimized C and Fortran libraries
> such as blas/
> > atlas, lapack and fftw. You cannot duplicate their
> efficacies in C++
> > for at least two reasons:
>
> > 1. There are issues related to the language syntax that
> makes Fortran
> > particularly easy to optimize for compilers, such as lack
> of pointer
> > aliasing. This is particularly important for optimal
> allocation of
> > registers when the CPU goes into a tight loop.
>
> > 2. A lot of effort have been put into making these
> libraries as fast
> > as possible. This includes optimal use of cache and branch
> prediction.
> > Duplicating these efforts on your own is going to take the
> rest of
> > your life to complete.
>
> > My advice would be this:
>
> > If you want speed in your C++ app, link and call the same
> libraries as
> > Matlab do. Most of them are available for free. Give the
> C++ compiler
> > pointer aliasing hints wherever possible.
>
> > In addition:
>
> > Use optimization level 3 on numerical code and level 2 on non-
> > numerical code. Process the data in chunks that fit in
> your L1 cache.
> > Force the CPU to prefetch if you know it will help.
> Page-align your
> > arrays in RAM. Memory access is terribly slow, traverse
> as few times
> > as possible. Never use strided memory access. Manually
> unroll tight
> > loops. Exploit arithmetic pipelining of four subsequent
> operations.
> > Avoid divisions, transform to a multiplcation. Exploit
> multiple CPUs:
> > use MPI or OpenMP, forkjoin with labour-sharing
>
> threadpools, etc. Use
>
>
>
> > inline assembly to access SIMD parallel registers.
>
> > Also remember Hoare's statement about optimization, quoted
> by D.
> > Knuth: "Premature optimization is the root of all evil in
> computer
> > programming." Profile your code. Direct your optimizations
> to the
> > important bottlenecks. They are likely to be few. Do as
> much as you
> > can with the bottlenecks, and never mind the reminding 90%
> of your
> > code.
>
> > But before you begin: ask yourself if the hard work is
> goint to be
> > worth the effort.
>
> Regarding (1): I write in C and I haven't found (1) to be
> that much of an issue (although I do worry about it and it's
> well worth it for you to mention here). I think the more
> recent versions of gcc are able to work around this issue.
> More serious for C is the abuse of pointers (indirect
> addressing, which requires lots of memory traffic). Memory
> traffic is more of a problem than register allocation,
> anyway (which you point out too, regarding the stride issue)..
>
> Regarding (2): Yes, that's definitely true. One could
> write an m-file script that does x=A\b without backslash, in
> maybe 50 lines of M (LU factorization if square, QR with
> Householder if rectangular). Backslash itself has maybe
> 250,000 lines of code (that's a guess, but an educated one
> since I wrote about half that).
>
> Some vector operations are trivial (a = b+c) to write in C
> or Fortran. If you write a=b*c where b and c are matrices,
> then there's no way you'll match performance in an optimized
> BLAS library.
>
> Rule of thumb: if the work is O(n) where n is the size of
> the data, then there's a decent chance that simple C or
> Fortran code can match (not beat) MATLAB. If the work is
> higher than O(n) than you probably can't beat MATLAB with
> simple C. Matrix add fits in the former category; matrix
> multiply doesn't.
>
> You can always call the BLAS / LAPACK yourself, in the dense
> case, or use available C code for the sparse case. Lots of
> the code in x=a*b, x=A\b, etc is open source.

Tim: Thanks for the info, I think the way to go is to look into the
open source libraries you talk about. Presumably someone has done all
this before, so I don't think it should be too hard to find the
libraries I am looking for.
Steve: Good point, but I am still looking to move away from Matlab
code.

Thanks,

Phil Winder

Subject: Matlab Vectorisation Speed - How is it done in c++?

From: Steven G. Johnson

Date: 17 Dec, 2007 21:22:28

Message: 7 of 11

On Dec 17, 8:42 am, "Tim Davis" <da...@cise.ufl.edu> wrote:
> > 1. There are issues related to the language syntax that
> makes Fortran
> > particularly easy to optimize for compilers, such as lack
> of pointer
> > aliasing. This is particularly important for optimal
> allocation of
> > registers when the CPU goes into a tight loop.
>
> Regarding (1): I write in C and I haven't found (1) to be
> that much of an issue (although I do worry about it and it's
> well worth it for you to mention here). I think the more
> recent versions of gcc are able to work around this issue.
> More serious for C is the abuse of pointers (indirect
> addressing, which requires lots of memory traffic). Memory
> traffic is more of a problem than register allocation,
> anyway (which you point out too, regarding the stride issue)..

The old canard about pointer aliasing semantics being weaker in C than
in Fortran hasn't been an issue even in principle for almost 10 years
now, since the 1999 C standard introduced the "restrict" keyword. In
practice, I've never found it to be a major practical issue in highly
optimized code, since for key loops you often want to partially unroll
them yourself anyway, and in any case higher-level memory-access
patterns are usually more important for performance.

Regarding the "abuse of pointers" I'm not sure what you're talking
about. Array access in C, properly implemented, requires no more or
less pointer indirection than in Fortran or any other language.

It's a good learning exercise, by the way, to implement a matrix
multiply yourself and compare it to a fast BLAS implementation. Even
if you turn off things like SSE2 instructions, it is probably a factor
of 6 faster than your first try, for a decent-sized matrix. On the
other hand, matrix multiplication is simple enough that it's not *too*
hard to get at least reasonably close to a fast BLAS if you have some
notion of what you are doing. (I had a class once a few years ago
where there was a contest to write a dgemm as fast as possible, and at
least one student beat the fastest free BLAS at the time for at least
one matrix size.)

I once had an old Fortran programmer remark to me, "A matrix multiply
is just three loops! How many possible ways can there be to implement
it?" Recently, I told that story to an old compiler engineer, and he
immediately responded "Six ways (3 factorial), and I once wrote a
compiler that automatically found the best loop order." The correct
answer (neglecting exotic algorithms like Strassen etc. that no one
uses) is closer to n^3 factorial, since the n^3 multiplications all
commute. Programming was simpler when floating-point arithmetic
dominated the runtime and all you had to worry about was the operation
count.

Regards,
Steven G. Johnson

Subject: Matlab Vectorisation Speed - How is it done in c++?

From: sturlamolden

Date: 17 Dec, 2007 22:03:40

Message: 8 of 11

On 17 Des, 22:22, "Steven G. Johnson" <stev...@alum.mit.edu> wrote:

> The old canard about pointer aliasing semantics being weaker in C than
> in Fortran hasn't been an issue even in principle for almost 10 years
> now, since the 1999 C standard introduced the "restrict" keyword. I

That is true. But notice that most C code is still written as ANSI C. C
++ does not allow the restrict keyword either.

Microsoft's C compiler does not support ISO C (aka C99). But the most
recent Microsoft C compiler support the restrict keyword as an
extension to ANSI C. Previously one would have to use compiler switch
'/Oa' or '#pragma optimize("a", on)' to assume no aliasing in MSVC. In
GCC one would use the gnu extension __restrict__ to ANSI C, unless
compiling with -std=c99 in which case restrict would be defined. So
GCC would often require non-standard syntax, and MSVC would not allow
control of aliasing at the level of single single variables. One would
then end up with C code cluttered with preprocessor conditionals to
allow compilation on more than a single platform.

A typical pathologic case in ANSI C and ISO C++ would be:

double *c, *a, *b;
int i, n;
/* initialize pointers and n */
for (i=0; i<n;i++)
   *c++ = *a++ + *b++; /* aliasing? */

Which is easily solved in ISO C:

typedef double *restrict arrayptr;
arrayptr a, b, c;
int i, n;
/* initialize pointers and n */
for (i=0; i<n;i++)
   *c++ = *a++ + *b++; /* no aliasing */






Subject: Matlab Vectorisation Speed - How is it done in c++?

From: roberson@ibd.nrc-cnrc.gc.ca (Walter Roberson)

Date: 17 Dec, 2007 22:50:01

Message: 9 of 11

In article <9f100466-2e16-48cd-a4df-5c5cab65def4@l32g2000hse.googlegroups.com>,
sturlamolden <sturlamolden@yahoo.no> wrote:
>On 17 Des, 22:22, "Steven G. Johnson" <stev...@alum.mit.edu> wrote:

>> The old canard about pointer aliasing semantics being weaker in C than
>> in Fortran hasn't been an issue even in principle for almost 10 years
>> now, since the 1999 C standard introduced the "restrict" keyword. I

>That is true. But notice that most C code is still written as ANSI C. C
>++ does not allow the restrict keyword either.

>Microsoft's C compiler does not support ISO C (aka C99).

I believe you are slightly confused about the C standards.

In 1989, ANSI published ANSI X3.159-1989, "Programming Language - C".
In 1990, ISO adopted X3.159-1989 mostly just renumbering
some sections of the standard document. C89 and C90 denote
essentially the same language and are spoken of interchangably
even in the standards-fussy newsgroup comp.lang.c .

In 1999, ISO published ISO/IEC 9899:1999. In 2000, ANSI adopted
the ISO 1999 standard.

*Officially* The C89 and C90 standards are "obsolete", and the 1999
standard ISO standard (also adopted by ANSI) was "C" [*]. So "ANSI C"
and "ISO C" refer to the same standard, the 1999 standard (plus TCs).
And even before the 1999 standard was published, "ANSI C" and "ISO C"
were so close that people only distinguish them when talking about
the section numbers of the relevant documents.

When people wish to distinguish between the 1989/1990 standard
and the 1999 standard, to say something such as that most C code
is still written to the 1989/1990 standard, then people would normally
refer to either C89 or (less often C90), and C99.


[*] "was" because "C" is currently the 1999 standard together
with some technical amendments, Technical Corrigendum 1 (TC1, 2001)
and Technical Corrigendum 2 (TC2, 2004).

http://www.open-std.org/jtc1/sc22/wg14/www/standards
--
   "History is a pile of debris" -- Laurie Anderson

Subject: Matlab Vectorisation Speed - How is it done in c++?

From: Tim Davis

Date: 18 Dec, 2007 11:25:50

Message: 10 of 11

See my replies, interleafed below (for a definition of
interleaf posting, see
http://www.cise.ufl.edu/~davis/Horror_matrices.html#composting
  )

"Steven G. Johnson" <stevenj@alum.mit.edu> wrote in message
<825523e3-b124-44a4-b82f-7b01b3495029@f3g2000hsg.googlegroups.com>...
> On Dec 17, 8:42 am, "Tim Davis" <da...@cise.ufl.edu> wrote:
> > > 1. There are issues related to the language syntax that
> > makes Fortran
> > > particularly easy to optimize for compilers, such as lack
> > of pointer
> > > aliasing. This is particularly important for optimal
> > allocation of
> > > registers when the CPU goes into a tight loop.
> >
> > Regarding (1): I write in C and I haven't found (1) to be
> > that much of an issue (although I do worry about it and
it's > > well worth it for you to mention here). I thinkthe
more
> > recent versions of gcc are able to work around this issue.
> > More serious for C is the abuse of pointers (indirect
> > addressing, which requires lots of memory traffic). Memory
> > traffic is more of a problem than register allocation,
> > anyway (which you point out too, regarding the stride
issue)..
>
> The old canard about pointer aliasing semantics being
weaker in C than
> in Fortran hasn't been an issue even in principle for
almost 10 years
> now, since the 1999 C standard introduced the "restrict"
keyword. In
> practice, I've never found it to be a major practical
issue in highly
> optimized code, since for key loops you often want to
partially unroll
> them yourself anyway, and in any case higher-level
memory-access
> patterns are usually more important for performance.
>
> Regarding the "abuse of pointers" I'm not sure what you're
talking
> about. Array access in C, properly implemented, requires
no more or
> less pointer indirection than in Fortran or any other
language.

Right - I agree with you completely.

For "abuse of pointers", I mean data structures that use an
unnecessary amount of indirection (pointers to pointers to
pointers to ...). I mean that "C gives you enough rope to
hang yourself". Yes, simple arrays require no more or less
indirection than any other language.

> It's a good learning exercise, by the way, to implement a
matrix
> multiply yourself and compare it to a fast BLAS
implementation. Even
> if you turn off things like SSE2 instructions, it is
probably a factor
> of 6 faster than your first try, for a decent-sized
matrix. On the
> other hand, matrix multiplication is simple enough that
it's not *too*
> hard to get at least reasonably close to a fast BLAS if
you have some
> notion of what you are doing. (I had a class once a few
years ago
> where there was a contest to write a dgemm as fast as
possible, and at
> least one student beat the fastest free BLAS at the time
for at least
> one matrix size.)

Yes, that is a good exercise. It's a lot more difficult
than it looks.

> I once had an old Fortran programmer remark to me, "A
matrix multiply
> is just three loops! How many possible ways can there be
to implement
> it?" Recently, I told that story to an old compiler
engineer, and he
> immediately responded "Six ways (3 factorial), and I once
wrote a
> compiler that automatically found the best loop order."

That's hilarious!

> The correct
> answer (neglecting exotic algorithms like Strassen etc.
that no one
> uses) is closer to n^3 factorial, since the n^3
multiplications all
> commute. Programming was simpler when floating-point
arithmetic
> dominated the runtime and all you had to worry about was
the operation
> count.

Yup, I would guess n^3 factorial, maybe more because you can
do a flop in so many ways (fused mult-adds, SSE3 or not, etc).

A similar question I sometimes get:

"Gaussian elimination is just a few loops, how many lines of
code can it possibly take?" ... backslash includes probably
250,000 lines of code (C and Fortran; an educated guess,
since I wrote about half of it but haven't seen the other
half). It can be done in maybe 20 or so lines of code in C
or Fortran, in a naive implementation of Gaussian
elimination with partial pivoting, but then it will be 10 or
20 times slower than x=A\b in the dense case, and quite
literally up to millions of times slower in the sparse case.

Matrix multiply is not quite so extreme, but not far off.
Readers, if they're curious, should take a look at the ATLAS
or Goto BLAS source code (both are available). They are
quite lengthy codes - but very fast.

Ditto for FFT (see FFTW for example). Fast codes are not
(always) short codes; elegant codes are the fast ones, which
are not always short.

Subject: Matlab Vectorisation Speed - How is it done in c++?

From: Tim Davis

Date: 18 Dec, 2007 11:55:58

Message: 11 of 11

"Tim Davis" <davis@cise.ufl.edu> wrote in message
...
> Ditto for FFT (see FFTW for example). Fast codes are not
> (always) short codes; elegant codes are the fast ones, which
> are not always short.
>

Steve - since you and I are clearly on the same page, I was
writing more to the other readers of this thread. So I
tossed out the example of FFTW as a fast, elegant, but not
short, code. I know about the FFTW ... but I didn't know
off the top of my head who the authors were.

Then I looked up the FFTW after I posted my note, just out
of curiousity, and found that you're one of the 2 co-authors.

So in my reply to you I'm using your own code as an example
... :-D !!

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

Public Submission Policy

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.

Contact us at files@mathworks.com