Thread Subject: FOR loops performance - 32bit vs 64 bit Matlab

Subject: FOR loops performance - 32bit vs 64 bit Matlab

From: uC

Date: 2 Aug, 2008 08:58:54

Message: 1 of 10

Hi all,

Yesterday I have written tha post entitled "density matrix conversion -
vectorization" where I needed some help regarding vectorization to improve
the speed. It turned out that in fact I don't need to vectorize it and
performance problem is somewhere else.

There are three loops:

for n=1:col_no
        for m=1:row_no
            for k=1:D(m, n)
                [SOME SIMPLE ARITHMETIC OPERATIONS]
            end
            if rand() < D(m, n)-floor(D(m, n))
               [SOME SIMPLE ARITHMETIC OPERATIONS]
            end
        end
    end
end

On 32 bit system (XP Pro) and Matlab (2006a) it takes ~17 seconds to finish
these operations but on 64 bit (Win 2003 Serv. End. Ed.) and 64 bit Matlab
(2007b) it takes 370 secons!!!

Now the most interesting... when I substitute exact numbers for "col_no" and
"row_no" (6700 and 7100 in tested case) 64 bit Matlab needs only 8 seconds
to finish, when on 32 bit one timings remains almost untouched!

32 bit machine is dual core Athlon64x2 2GHz with 2 GB RAM
64 bit machine is dual socket dual core Opteron 2218 2.6 GHz with 8 GB RAM

Does anyone have an idea if it is a Matlab bug or anything else?

Best wishes,
uC

Subject: FOR loops performance - 32bit vs 64 bit Matlab

From: Rune Allnor

Date: 2 Aug, 2008 10:12:26

Message: 2 of 10

On 2 Aug, 10:58, "uC" <bla....@uc.uc> wrote:
> Hi all,
>
> Yesterday I have written tha post entitled "density matrix conversion -
> vectorization" where I needed some help regarding vectorization to improv=
e
> the speed. It turned out that in fact I don't need to vectorize it and
> performance problem is somewhere else.
>
> There are three loops:
>
> for n=3D1:col_no
> =A0 =A0 =A0 =A0 for m=3D1:row_no
> =A0 =A0 =A0 =A0 =A0 =A0 for k=3D1:D(m, n)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 [SOME SIMPLE ARITHMETIC OPERATIONS]
> =A0 =A0 =A0 =A0 =A0 =A0 end
> =A0 =A0 =A0 =A0 =A0 =A0 if rand() < D(m, n)-floor(D(m, n))
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0[SOME SIMPLE ARITHMETIC OPERATIONS]
> =A0 =A0 =A0 =A0 =A0 =A0 end
> =A0 =A0 =A0 =A0 end
> =A0 =A0 end
> end
>
> On 32 bit system (XP Pro) and Matlab (2006a) it takes ~17 seconds to fini=
sh
> these operations but on 64 bit (Win 2003 Serv. End. Ed.) and 64 bit Matla=
b
> (2007b) it takes 370 secons!!!
>
> Now the most interesting... when I substitute exact numbers for "col_no" =
and
> "row_no" (6700 and 7100 in tested case) 64 bit Matlab needs only 8 second=
s
> to finish, when on 32 bit one timings remains almost untouched!
>
> 32 bit machine is dual core Athlon64x2 2GHz with 2 GB RAM
> 64 bit machine is dual socket dual core Opteron 2218 2.6 GHz with 8 GB RA=
M
>
> Does anyone have an idea if it is a Matlab bug or anything else?

Yes, it is a bug. There is no reason why there should be an
overhead on the order of 50x or 6 minutes just to handle a
few million loops.

Rune

Subject: FOR loops performance - 32bit vs 64 bit Matlab

From: Steven Lord

Date: 4 Aug, 2008 14:33:54

Message: 3 of 10


"uC" <bla.bla@uc.uc> wrote in message
news:g717k3$kb3$1@news.dialog.net.pl...
> Hi all,
>
> Yesterday I have written tha post entitled "density matrix conversion -
> vectorization" where I needed some help regarding vectorization to improve
> the speed. It turned out that in fact I don't need to vectorize it and
> performance problem is somewhere else.
>
> There are three loops:
>
> for n=1:col_no
> for m=1:row_no
> for k=1:D(m, n)
> [SOME SIMPLE ARITHMETIC OPERATIONS]
> end
> if rand() < D(m, n)-floor(D(m, n))
> [SOME SIMPLE ARITHMETIC OPERATIONS]
> end
> end
> end
> end
>
> On 32 bit system (XP Pro) and Matlab (2006a) it takes ~17 seconds to
> finish these operations but on 64 bit (Win 2003 Serv. End. Ed.) and 64 bit
> Matlab (2007b) it takes 370 secons!!!
>
> Now the most interesting... when I substitute exact numbers for "col_no"
> and "row_no" (6700 and 7100 in tested case) 64 bit Matlab needs only 8
> seconds to finish, when on 32 bit one timings remains almost untouched!
>
> 32 bit machine is dual core Athlon64x2 2GHz with 2 GB RAM
> 64 bit machine is dual socket dual core Opteron 2218 2.6 GHz with 8 GB RAM
>
> Does anyone have an idea if it is a Matlab bug or anything else?

You need to post the rest of the code, including what col_no, row_no, D, and
the two instances of "[SOME SIMPLE ARITHMETIC OPERATIONS]" are.

Do you grow a matrix inside "[SOME SIMPLE ARITHMETIC OPERATIONS]", like
this:

d = zeros(1, 0);
for k = 1:10
    d(k) = k;
end

If so, preallocating that matrix would prevent MATLAB from having to
reallocate memory for that matrix each and every iteration through the loop.

Secondly, since you're using RAND, make sure you initialize the state to the
same value before each of your calculations. If during one execution of
your program you encounter a long run of small random numbers, but the
second you encounter a run of large random numbers, the second "[SOME SIMPLE
ARITHMETIC OPERATIONS]" block will execute a different number of times.

Third, if possible generate matrices of random numbers and index into them
instead of calling the RAND function many times, each time generating a
scalar. You will need more memory for that approach, but by reducing the
number of times you call RAND, you'll likely save on function call overhead.
[There is some, though usually it's small -- but if you call the function
thousands or millions of times, it can add up.]

Fourth, after you "substitute exact numbers" for the limits of your FOR
loops, do you make sure to run the function/script again with a clean
slate -- i.e. "clear all", "clear functions", etc? If not, the improvement
you see may be caused because you're no longer growing your matrix inside
the loops (it's already reached its largest size during the first run
through.)

--
Steve Lord
slord@mathworks.com


Subject: FOR loops performance - 32bit vs 64 bit Matlab

From: Rune Allnor

Date: 4 Aug, 2008 14:43:56

Message: 4 of 10

On 4 Aug, 16:33, "Steven Lord" <sl...@mathworks.com> wrote:
> "uC" <bla....@uc.uc> wrote in message
>
> news:g717k3$kb3$1@news.dialog.net.pl...
>
>
>
>
>
> > Hi all,
>
> > Yesterday I have written tha post entitled "density matrix conversion -
> > vectorization" where I needed some help regarding vectorization to impr=
ove
> > the speed. It turned out that in fact I don't need to vectorize it and
> > performance problem is somewhere else.
>
> > There are three loops:
>
> > for n=3D1:col_no
> > =A0 =A0 =A0 =A0for m=3D1:row_no
> > =A0 =A0 =A0 =A0 =A0 =A0for k=3D1:D(m, n)
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0[SOME SIMPLE ARITHMETIC OPERATIONS]
> > =A0 =A0 =A0 =A0 =A0 =A0end
> > =A0 =A0 =A0 =A0 =A0 =A0if rand() < D(m, n)-floor(D(m, n))
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 [SOME SIMPLE ARITHMETIC OPERATIONS]
> > =A0 =A0 =A0 =A0 =A0 =A0end
> > =A0 =A0 =A0 =A0end
> > =A0 =A0end
> > end
>
> > On 32 bit system (XP Pro) and Matlab (2006a) it takes ~17 seconds to
> > finish these operations but on 64 bit (Win 2003 Serv. End. Ed.) and 64 =
bit
> > Matlab (2007b) it takes 370 secons!!!
>
> > Now the most interesting... when I substitute exact numbers for "col_no=
"
> > and "row_no" (6700 and 7100 in tested case) 64 bit Matlab needs only 8
> > seconds to finish, when on 32 bit one timings remains almost untouched!
>
> > 32 bit machine is dual core Athlon64x2 2GHz with 2 GB RAM
> > 64 bit machine is dual socket dual core Opteron 2218 2.6 GHz with 8 GB =
RAM
>
> > Does anyone have an idea if it is a Matlab bug or anything else?
>
> You need to post the rest of the code, including what col_no, row_no, D, =
and
> the two instances of "[SOME SIMPLE ARITHMETIC OPERATIONS]" are.

[technical considerations snipped]

I agree with your sentiments, but I interpret the OPs post
such that the same code runs in 17 seconds on the 32-bit
computer and in 370 seconds on the 64-bit computer.

If any of the arguments you mention were valid, they ought to
impact the performance on the 32 bit system as well, not only
on the 64-bit system.

Seems to me there is a bug in the 64-bit version.

Rune

Subject: FOR loops performance - 32bit vs 64 bit Matlab

From: Mark

Date: 13 Oct, 2008 00:35:03

Message: 5 of 10

Rune Allnor <allnor@tele.ntnu.no> wrote in message
>
> I agree with your sentiments, but I interpret the OPs post
> such that the same code runs in 17 seconds on the 32-bit
> computer and in 370 seconds on the 64-bit computer.
>
> If any of the arguments you mention were valid, they ought to
> impact the performance on the 32 bit system as well, not only
> on the 64-bit system.
>
> Seems to me there is a bug in the 64-bit version.
>
> Rune

I can concur with this from results I have seen. I ran the following tests...
XP SP2 (32-bit) 2.4GHz core 2 Duo with r2007b 32-bit
Win2k3 (64-bit) 2x3GHz Xeon 5450 Quad Core with r2007b 32-bit HP blade.
Vista Business (64-bit) 1x3GHz Xeon Quad Core with r2007b 64-bit HP xw8600 workstation

The Desktop had 3GB memory, the server 16GB and Workstation 8GB.
The 64-bit blade server running 32-bit Matlab (i.e. it's emulating a 32-bit env) gave a 25% performance increase over the desktop. However the all 64-bit workstation was, on average, around 60-70% slower than the desktop including running Matlab's own Bench functionality.
I realise we're not comparing apples with apples here but for a 3GHz quad core Xeon workstation to underperform a 2.4GHz dual core desktop is some difference and is the direct 32-bit to 64-bit comparison. The workstation is also running faster memory and has a faster FSB.
I cannot believe this performance can be just down to the OS (it's too easy to point the finger at Vista) and believe that it may be the case that 32-bit Matlab is better optimised than it's 64-bit version.

Subject: FOR loops performance - 32bit vs 64 bit Matlab

From: Matt Fig

Date: 13 Oct, 2008 01:25:06

Message: 6 of 10

"uC" <bla.bla@uc.uc> wrote in message :

> On 32 bit system (XP Pro) and Matlab (2006a) it takes ~17 seconds to finish
> these operations but on 64 bit (Win 2003 Serv. End. Ed.) and 64 bit Matlab

> 32 bit machine is dual core Athlon64x2 2GHz with 2 GB RAM
> 64 bit machine is dual socket dual core Opteron 2218 2.6 GHz with 8 GB RAM
>
> Does anyone have an idea if it is a Matlab bug or anything


I have a very loopy code (Islands on the FEX) which I have run on 32-bit XP sp3 in 2006a P4 2.5GHz 3 GB, and also on a 64-bit XP in (64-bit)2007b Xenon 2.66 GHz 16 GB. The 64-bit is repeatably faster by about 12% with large matrices. As to whether it is because of the RAM, or the slightly faster CPU I don't know.

Subject: FOR loops performance - 32bit vs 64 bit Matlab

From: Marcus M. Edvall

Date: 13 Oct, 2008 02:35:20

Message: 7 of 10

One could wonder if it's related to 32-bit version using 32-bit ints
to index, while 64-bit version uses 64-bit.

Best wishes, Marcus
Tomlab Optimization Inc.
http://tomopt.com/
http://tomsym.com/

Subject: FOR loops performance - 32bit vs 64 bit Matlab

From: Steve Amphlett

Date: 13 Oct, 2008 13:27:01

Message: 8 of 10

<snip>

I understood that the M$ implementation of "long long" (64-bit integer) was very slow. Maybe this is where to point the finger?

Subject: FOR loops performance - 32bit vs 64 bit Matlab

From: Jan Simon

Date: 13 Oct, 2008 19:06:02

Message: 9 of 10

> Steve Amphlett wrote:
>
> I understood that the M$ implementation of "long long" (64-bit integer) was very slow. Maybe this is where to point the finger?

What happens if the loop indices are defined as UINT32?

> for n=1:col_no
> for m=1:row_no
> for k=1:D(m, n)

-->

for n=uint32(1):uint32(col_no)
        for m=uint32(1):uint32(row_no)
            for k=uint32(1):uint32(D(m, n))

This would be the way I investigate such questions in C.

Jan

Subject: FOR loops performance - 32bit vs 64 bit Matlab

From: Sakhr

Date: 6 Nov, 2008 16:58:02

Message: 10 of 10

I too have recently experienced the same problem. I have tested
the same matlab code on a Xeon X5460 3.16GHz workstation and on laptop with T2500 processor. The workstation was 60% slower.

Anybody found the reason for this strange behavior?

Thank you in advance.

Best regards, Sakhr.

"Jan Simon" <matlab.THIS_YEAR@nMINUSsimon.de> wrote in message <gd066q$jom$1@fred.mathworks.com>...
> > Steve Amphlett wrote:
> >
> > I understood that the M$ implementation of "long long" (64-bit integer) was very slow. Maybe this is where to point the finger?
>
> What happens if the loop indices are defined as UINT32?
>
> > for n=1:col_no
> > for m=1:row_no
> > for k=1:D(m, n)
>
> -->
>
> for n=uint32(1):uint32(col_no)
> for m=uint32(1):uint32(row_no)
> for k=uint32(1):uint32(D(m, n))
>
> This would be the way I investigate such questions in C.
>
> Jan

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
32bitvs64bit Sakhr 6 Nov, 2008 12:00:19
rssFeed for this Thread

Public Submission Policy

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.

Contact us at files@mathworks.com