Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Can this run any faster?

Subject: Can this run any faster?

From: Elnaz

Date: 9 Apr, 2013 22:30:28

Message: 1 of 37

Hi all,
Is there any way to make this excerpt of the code run faster?
In my script I call the function including this loop many times and "length" is in the order of 30K or more. "transitions" is a stored 32 by 4 matrix.
I have turned that function into C and am using it as a mex file but it's still prohibitively slow.
Using tic-toc I see that this loop is the most time consuming loop in that function. Is there anyway to make this run faster?

a=zeros(16,length);
a=a-inf;
a(1,1)=0;
for i= 2:length
     for j= 1:32
           A = a(transitions(j,2),i);
           B = a(transitions(j,1),i-1) + ug(j, i-1) + eg(j, i-1);
           if(A == -inf && B == -inf)
              a(transitions(j,2),i) = -inf;
           else
              a(transitions(j,2),i) = max(A,B) + log(1+exp(-abs(A-B)));
           end
      end
end

Thanks,
Elnaz

Subject: Can this run any faster?

From: Roger Stafford

Date: 10 Apr, 2013 04:46:09

Message: 2 of 37

"Elnaz " <ebsadeghian@gmail.com> wrote in message <kk24q4$3j5$1@newscl01ah.mathworks.com>...
> Hi all,
> Is there any way to make this excerpt of the code run faster?
> In my script I call the function including this loop many times and "length" is in the order of 30K or more. "transitions" is a stored 32 by 4 matrix.
> I have turned that function into C and am using it as a mex file but it's still prohibitively slow.
> Using tic-toc I see that this loop is the most time consuming loop in that function. Is there anyway to make this run faster?
>
> a=zeros(16,length);
> a=a-inf;
> a(1,1)=0;
> for i= 2:length
> for j= 1:32
> A = a(transitions(j,2),i);
> B = a(transitions(j,1),i-1) + ug(j, i-1) + eg(j, i-1);
> if(A == -inf && B == -inf)
> a(transitions(j,2),i) = -inf;
> else
> a(transitions(j,2),i) = max(A,B) + log(1+exp(-abs(A-B)));
> end
> end
> end
>
> Thanks,
> Elnaz
- - - - - - - - -
  You might try the following. (I am assuming 'ug' and 'eg' are always finite.)

 a=zeros(16,length); % <-- Don't use "length". It's the name of a matlab function
 a=a-inf;
 a(1,1)=0;
 t1 = transitions(j,1); %<-- Avoid needless indexing in the loops
 t2 = transitions(j,2);
 ue = ug+eg; % <-- Do this addition ahead of time
 for i= 2:size(a,2)
   for j= 1:32
     k = t2(j);
     A = a(k,i);
     B = a(t1(j),i-1);
     if B ~= -inf
       B = B + ue(j,i-1);
       if A ~= -inf
         a(k,i) = max(A,B) + log(1+exp(-abs(A-B)));
       else
         a(k,i) = B;
       end
     end
   end
 end

  The reasoning here is that in case B is equal to -inf, your code will not alter a(k,i) at all and thus there is nothing to do. In case B is not equal to -inf but A equals -inf, then a(k,i) will be set to B. Hence there is only need to use the time consuming max-log-exp calculation when A and B are both finite.

  Whether all of this saves you a significant amount of time depends on how prevalent an index value of t2 is paired with more than one index value of t1, which are the only cases in which the max-log-exp call can be made, and on the number of columns of 'a' that tend to become finite on the average as the outer loop proceeds.

Roger Stafford

Subject: Can this run any faster?

From: Steven_Lord

Date: 10 Apr, 2013 15:11:16

Message: 3 of 37



"Elnaz " <ebsadeghian@gmail.com> wrote in message
news:kk24q4$3j5$1@newscl01ah.mathworks.com...
> Hi all, Is there any way to make this excerpt of the code run faster?
> In my script I call the function including this loop many times and
> "length" is in the order of 30K or more. "transitions" is a stored 32 by 4
> matrix. I have turned that function into C and am using it as a mex file
> but it's still prohibitively slow. Using tic-toc I see that this loop is
> the most time consuming loop in that function. Is there anyway to make
> this run faster?

Rather than forcing people to reverse engineer what you're trying to do from
your code, please describe IN WORDS NOT CODE OR EQUATIONS what you are
providing to this code as "input" and what you expect as "output" and
someone may have suggestions for how to implement it more efficiently.

*snip*

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Subject: Can this run any faster?

From: dpb

Date: 10 Apr, 2013 15:42:15

Message: 4 of 37

On 4/9/2013 11:46 PM, Roger Stafford wrote:
> "Elnaz " <ebsadeghian@gmail.com> wrote in message
> <kk24q4$3j5$1@newscl01ah.mathworks.com>...
...

>> ...Is there anyway to make this run faster?
>>
>> a=zeros(16,length);
>> a=a-inf;
>> a(1,1)=0; for i= 2:length
>> for j= 1:32
>> A = a(transitions(j,2),i);
>> B = a(transitions(j,1),i-1) + ug(j, i-1) + eg(j, i-1);
...

> You might try the following. (I am assuming 'ug' and 'eg' are always
> finite.)
>
> a=zeros(16,length); % <-- Don't use "length". It's the name of a matlab
> function
> a=a-inf;
> a(1,1)=0;
> t1 = transitions(j,1); %<-- Avoid needless indexing in the loops
> t2 = transitions(j,2);
...

Roger, these need to be

t1 = transitions(:,1); %<-- Avoid needless [2D] indexing in the loops
t2 = transitions(:,2);

don't they?

--

Subject: Can this run any faster?

From: Roger Stafford

Date: 10 Apr, 2013 17:54:07

Message: 5 of 37

dpb <none@non.net> wrote in message <kk418i$anr$1@speranza.aioe.org>...
> Roger, these need to be
>
> t1 = transitions(:,1); %<-- Avoid needless [2D] indexing in the loops
> t2 = transitions(:,2);
- - - - - - - -
  Thanks, dpb. That is indeed what I meant to write.

Roger Stafford

Subject: Can this run any faster?

From: Bruno Luong

Date: 10 Apr, 2013 19:17:07

Message: 6 of 37

"Steven_Lord" <slord@mathworks.com> wrote in message <kk3vel$g4b$1@newscl01ah.mathworks.com>...

>
> Rather than forcing people to reverse engineer what you're trying to do from
> your code, please describe IN WORDS NOT CODE OR EQUATIONS what you are
> providing to this code as "input" and what you expect as "output" and
> someone may have suggestions for how to implement it more efficiently.
>

It looks like he wants to find the maximum likelihood of a Markov's chain, but I agree, it is better if we have a description of the code and its goal.

Bruno

Subject: Can this run any faster?

From: Elnaz

Date: 10 Apr, 2013 19:29:05

Message: 7 of 37

Thanks Roger; that is a clever substitute. It runs 0.01 second faster than the original one.

Steve:
The code above is an excerpt of BCJR algorithm. It is the forward recursion part which sweeps over the trellis structure from the beginning to the end in forward direction.
It is difficult to explain in words what that excerpt is trying to do for the general audience since it requires a lot of backgroun math to set the scene.

I have another loop similar to the first one which is a backward recursion. It starts from the end working its way to the beginning:

% Backwards recursion
b=zeros(16,length);
b=b-inf;
b(1,length)=0;
    for i= length-1:-1:1
       for j= 1:32
           A = b(transitions(j,1),i);
           B = b(transitions(j,2),i+1) + ug (j, i+1) + eg(j, i+1);
           if(A == -inf && B == -inf)
              b(transitions(j,1),i) = -inf;
           else
              b(transitions(j,1),i)= max(A,B) + log(1+exp(-abs(A-B)));
           end
       end
    end

If I apply Roger's idea I get 0.02 sec improvement in one calling. However, the main problem that costs time here is the big length of the signal (the variable i here).
But thanks Roger, that was clever.

Subject: Can this run any faster?

From: Elnaz

Date: 10 Apr, 2013 19:44:12

Message: 8 of 37

I meant 0.02 sec in total in one function calling. 0.01 sec for the first loop and 0.01 sec for the second one. I call that function thousands of times in my scriprt so it is indeed effective but unfortunately not enough. My simulations are still very long.

Elnaz

Subject: Can this run any faster?

From: Bruno Luong

Date: 10 Apr, 2013 20:10:12

Message: 9 of 37

IMO, the only way to speed up significantly is to program this piece of code in MEX, assuming you own a decent C/fortran compiler.

Bruno

Subject: Can this run any faster?

From: Elnaz

Date: 10 Apr, 2013 20:19:08

Message: 10 of 37

Bruno,
I've done that. To my surprise it only made it 1.5 times faster. Still, it takes days.

Subject: Can this run any faster?

From: Eric Sampson

Date: 10 Apr, 2013 22:09:10

Message: 11 of 37

"Elnaz " <ebsadeghian@gmail.com> wrote in message <kk4hfs$g2v$1@newscl01ah.mathworks.com>...
> Bruno,
> I've done that. To my surprise it only made it 1.5 times faster. Still, it takes days.

It would be interesting to see how the MATLAB Coder product fares with this, maybe you could ask TMW for a trial and see?

What compiler are you using for the C test? Did you make sure it wasn't emitting debug code, and appropriate optimization flags were used?

Subject: Can this run any faster?

From: Bruno Luong

Date: 10 Apr, 2013 23:18:10

Message: 12 of 37

"Elnaz " <ebsadeghian@gmail.com> wrote in message <kk4hfs$g2v$1@newscl01ah.mathworks.com>...
> Bruno,
> I've done that. To my surprise it only made it 1.5 times faster. Still, it takes days.

Many TMW's employees will be happy to read that.

Subject: Can this run any faster?

From: Bruno Luong

Date: 10 Apr, 2013 23:29:14

Message: 13 of 37

Few more ideas, starting from Roger's code:

a=-inf(16,m); % m is your length
a(1,1)=0;
[n m] = size(a);
t1 = transitions(:,1);
t2 = transitions(:,2);
ue = ug+eg;
for i= 1:m-1
    for j = 1:32
        k = t2(j) + i*n;
        A = a(k);
        B = a(t1(j),i);
        if B ~= -inf
            B = B + ue(j,i);
            if A < B
                a(k) = B + log(1+exp(A-B));
            else
                a(k) = A + log(1+exp(B-A));
            end
        end
    end
end

% Bruno

Subject: Can this run any faster?

From: Bruno Luong

Date: 11 Apr, 2013 07:37:08

Message: 14 of 37

Here is a mex file, about three time faster on my PC.

/* compile
 CALLING: BCJR_loop_mex(a, t1, t2, ue);
 mex BCJR_loop_mex.c
 *ATTENTION A is modified inplace
 */
#include "mex.h"
#include <math.h>

unsigned char MINF[8] = {0, 0, 0, 0, 0, 0, 240, 255};

#define A_INOUT prhs[0]
#define T1_UINT32 prhs[1]
#define T2_UINT32 prhs[2]
#define UE prhs[3]
void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[])
{
    mxArray *Var;
    int m, n, i, j, k, in;
    int *t1_p, *t2_p, *t1, *t2;
    double A, B, *a, *ue, *minf;
    
    n = mxGetM(A_INOUT);
    m = mxGetN(A_INOUT);
    t1_p = (int*)mxGetData(T1_UINT32);
    t2_p = (int*)mxGetData(T2_UINT32);
    a = mxGetPr(A_INOUT);
    ue = mxGetPr(UE);
    minf = (double*)MINF;
    
    in = -1;
    for (i=0; i<m-1; i++)
    {
        t1 = t1_p;
        t2 = t2_p;
        for (j=0; j<32; j++)
        {
            k = *(t2++) + in + n;
            A = a[k];
            B = a[*(t1++) + in];
            if (B != *minf)
            {
                B += *ue;
                a[k] = ((A < B)?
                        B + log(1+exp(A-B)) :
                        A + log(1+exp(B-A)));
            }
            ue++;
        }
        in += n;
    }
}

m = 3e4;

transitions=ceil(16*rand(32,2));
ug = rand(32,m);
eg = rand(32,m);

tic
a=-inf(16,m);
a(1,1)=0;
for i= 2:m
    for j= 1:32
        A = a(transitions(j,2),i);
        B = a(transitions(j,1),i-1) + ug(j, i-1) + eg(j, i-1);
        if(A == -inf && B == -inf)
            a(transitions(j,2),i) = -inf;
        else
            a(transitions(j,2),i) = max(A,B) + log(1+exp(-abs(A-B)));
        end
    end
end
toc
bb = a;
clear a

tic
a=-inf(16,m); % m is your length
a(1,1)=0;
t1 = uint32(transitions(:,1));
t2 = uint32(transitions(:,2));
ue = ug+eg;
BCJR_loop_mex(a, t1, t2, ue);
toc
aa = a;


max(max(abs(aa-bb)))

Subject: Can this run any faster?

From: Heinrich Acker

Date: 11 Apr, 2013 09:11:13

Message: 15 of 37

"Elnaz " <ebsadeghian@gmail.com> wrote in message <kk24q4$3j5$1@newscl01ah.mathworks.com>...
> Hi all,
> Is there any way to make this excerpt of the code run faster?
> In my script I call the function including this loop many times and "length" is in the order of 30K or more. "transitions" is a stored 32 by 4 matrix.
> I have turned that function into C and am using it as a mex file but it's still prohibitively slow.
> Using tic-toc I see that this loop is the most time consuming loop in that function. Is there anyway to make this run faster?
>
> a=zeros(16,length);
> a=a-inf;
> a(1,1)=0;
> for i= 2:length
> for j= 1:32
> A = a(transitions(j,2),i);
> B = a(transitions(j,1),i-1) + ug(j, i-1) + eg(j, i-1);
> if(A == -inf && B == -inf)
> a(transitions(j,2),i) = -inf;
> else
> a(transitions(j,2),i) = max(A,B) + log(1+exp(-abs(A-B)));
> end
> end
> end
>
> Thanks,
> Elnaz

My experience is that using inf or nan can slow down calculations significantly. I didn't try your code, nor did I try it recently on one of my programs, but since that discovery I went back to the old practice to use an extremly large value instead of inf. It is not always possible, but in many algorithms you will know a finite value that is larger than all finite values that can appear in that place, e.g. 1e99. So the suggestion is to use this value as a metaphor for inf, even if this seems to be less refined.

Of course, the result depends on the architecture of the system you are using. It may not help on your system at all. Your disappointing result when mexing the function is an indicator of an inf or nan speed problem, though. When I came across it, I briefly googled the problem and found out that some architectures raise an execption to process inf or nan, instead of producing an IEEE754 result directly by the FPU.

Heinrich

Subject: Can this run any faster?

From: Bruno Luong

Date: 11 Apr, 2013 11:51:05

Message: 16 of 37

I have optimized the MEX code:

/* MATLAB MEX BCJR_loop_mex.c
 
 CALLING: >> BCJR_loop_mex(a, t1, t2, ue);
 COMPILE: >>mex -O BCJR_loop_mex.c
 ATTENTION: a() will be modified inplace
 */
#include "mex.h"
#include <math.h>

unsigned char MINF[8] = {0, 0, 0, 0, 0, 0, 240, 255};
double *minf = (double*)MINF;

#define A_INOUT prhs[0]
#define T1_UINT32 prhs[1]
#define T2_UINT32 prhs[2]
#define UE prhs[3]
void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[])
{
    mxArray *Var;
    int m, n, i, j, p;
    int *t1_p, *t2_p, *t1, *t2;
    double A, B, *a, *ak, *ue;
    
    n = mxGetM(A_INOUT);
    m = mxGetN(A_INOUT);
    t1_p = (int*)mxGetData(T1_UINT32);
    t2_p = (int*)mxGetData(T2_UINT32);
    a = mxGetPr(A_INOUT);
    ue = mxGetPr(UE);
    minf = (double*)MINF;
    
    p = -1;
    for (i=m-1; i--; )
    {
        t1 = t1_p;
        t2 = t2_p;
        for (j=32; j--; )
        {
            B = a[*(t1++) + p];
            if (B != *minf)
            {
                ak = a + *t2 + p + n;
                A = *ak;
                B += *ue;
                *ak = ((A < B)?
                       B + log(1+exp(A-B)) :
                       A + log(1+exp(B-A)));
            }
            t2++;
            ue++;
        }
        p += n;
    }
} /* End of BCJR_loop_mex.c */

% MATLAB test:
m = 3e4;

transitions=ceil(16*rand(32,2));
ug = rand(32,m);
eg = rand(32,m);

tic
a=-inf(16,m);
a(1,1)=0;
for i= 2:m
    for j= 1:32
        A = a(transitions(j,2),i);
        B = a(transitions(j,1),i-1) + ug(j, i-1) + eg(j, i-1);
        if(A == -inf && B == -inf)
            a(transitions(j,2),i) = -inf;
        else
            a(transitions(j,2),i) = max(A,B) + log(1+exp(-abs(A-B)));
        end
    end
end
toc
bb = a;
clear a

%% MEX engine
tic
a=-inf(16,m); % m is your length
a(1,1)=0;
t1 = uint32(transitions(:,1));
t2 = uint32(transitions(:,2));
ue = ug+eg;
BCJR_loop_mex(a, t1, t2, ue);
toc

% Check accuracy
aa = a;
max(max(abs(aa-bb)))

% Bruno

Subject: Can this run any faster?

From: Yair Altman

Date: 11 Apr, 2013 14:13:09

Message: 17 of 37

"Heinrich Acker" wrote in message <kk5unh$gc1$1@newscl01ah.mathworks.com>...
[snip]
> My experience is that using inf or nan can slow down calculations significantly. I didn't try your code, nor did I try it recently on one of my programs, but since that discovery I went back to the old practice to use an extremly large value instead of inf. It is not always possible, but in many algorithms you will know a finite value that is larger than all finite values that can appear in that place, e.g. 1e99. So the suggestion is to use this value as a metaphor for inf, even if this seems to be less refined.
>
> Of course, the result depends on the architecture of the system you are using. It may not help on your system at all. Your disappointing result when mexing the function is an indicator of an inf or nan speed problem, though. When I came across it, I briefly googled the problem and found out that some architectures raise an execption to process inf or nan, instead of producing an IEEE754 result directly by the FPU.
>
> Heinrich


I do not know which platform/ML release you are using Heinrich, but at least on Windows R2013a your statement seems to be incorrect. See below:

   >> tic, a=intmax; for idx=1:1e6; if (a==intmax) b=1; else, b=2; end, end, toc
   Elapsed time is 5.929147 seconds.

   >> tic, a=0; for idx=1:1e6; if (a==inf) b=1; else, b=2; end, end, toc
   Elapsed time is 0.013034 seconds.

   >> tic, a=inf; for idx=1:1e6; if (a==inf) b=1; else, b=2; end, end, toc
   Elapsed time is 0.013131 seconds.

   >> tic, a=0; for idx=1:1e6; if isinf(a) b=1; else, b=2; end, end, toc
   Elapsed time is 0.006696 seconds.

   >> tic, a=inf; for idx=1:1e6; if isinf(a) b=1; else, b=2; end, end, toc
   Elapsed time is 0.007951 seconds.

Yair Altman
http://UndocumentedMatlab.com
 

Subject: Can this run any faster?

From: Eric Sampson

Date: 11 Apr, 2013 17:08:07

Message: 18 of 37

"Yair Altman" wrote in message <kk6gdl$8jp$1@newscl01ah.mathworks.com>...
> "Heinrich Acker" wrote in message <kk5unh$gc1$1@newscl01ah.mathworks.com>...
> [snip]
> > My experience is that using inf or nan can slow down calculations significantly. I didn't try your code, nor did I try it recently on one of my programs, but since that discovery I went back to the old practice to use an extremly large value instead of inf. It is not always possible, but in many algorithms you will know a finite value that is larger than all finite values that can appear in that place, e.g. 1e99. So the suggestion is to use this value as a metaphor for inf, even if this seems to be less refined.
> >
> > Of course, the result depends on the architecture of the system you are using. It may not help on your system at all. Your disappointing result when mexing the function is an indicator of an inf or nan speed problem, though. When I came across it, I briefly googled the problem and found out that some architectures raise an execption to process inf or nan, instead of producing an IEEE754 result directly by the FPU.
> >
> > Heinrich
>
>
> I do not know which platform/ML release you are using Heinrich, but at least on Windows R2013a your statement seems to be incorrect. See below:
>
(snip)
> Yair Altman
> http://UndocumentedMatlab.com
>

In my experience what Heinrich said can definitely be true - if you're talking about denormalized numbers, not inf/nans. Although it might be math on denorms I'm thinking about, and not necessarily conditionals.

Subject: Can this run any faster?

From: Bruno Luong

Date: 11 Apr, 2013 17:52:08

Message: 19 of 37

"Eric Sampson" wrote in message <kk6qln$c62$1@newscl01ah.mathworks.com>...
>
> In my experience what Heinrich said can definitely be true - if you're talking about denormalized numbers, not inf/nans. Although it might be math on denorms I'm thinking about, and not necessarily conditionals.

I second that. The slownest of NaN (worst) and Inf has been discussed few years back. I couldn't find the thread on this topic.

Bruno

Subject: Can this run any faster?

From: Steven_Lord

Date: 11 Apr, 2013 19:01:29

Message: 20 of 37



"Eric Sampson" <ericDOTsampson@gmail.com> wrote in message
news:kk6qln$c62$1@newscl01ah.mathworks.com...
> "Yair Altman" wrote in message <kk6gdl$8jp$1@newscl01ah.mathworks.com>...
>> "Heinrich Acker" wrote in message
>> <kk5unh$gc1$1@newscl01ah.mathworks.com>...
>> [snip]
>> > My experience is that using inf or nan can slow down calculations
>> > significantly. I didn't try your code, nor did I try it recently on one
>> > of my programs, but since that discovery I went back to the old
>> > practice to use an extremly large value instead of inf. It is not
>> > always possible, but in many algorithms you will know a finite value
>> > that is larger than all finite values that can appear in that place,
>> > e.g. 1e99. So the suggestion is to use this value as a metaphor for
>> > inf, even if this seems to be less refined.
>> >
>> > Of course, the result depends on the architecture of the system you are
>> > using. It may not help on your system at all. Your disappointing result
>> > when mexing the function is an indicator of an inf or nan speed
>> > problem, though. When I came across it, I briefly googled the problem
>> > and found out that some architectures raise an execption to process inf
>> > or nan, instead of producing an IEEE754 result directly by the FPU.
>> >
>> > Heinrich
>>
>>
>> I do not know which platform/ML release you are using Heinrich, but at
>> least on Windows R2013a your statement seems to be incorrect. See below:
>>
> (snip)
>> Yair Altman
>> http://UndocumentedMatlab.com
>
> In my experience what Heinrich said can definitely be true - if you're
> talking about denormalized numbers, not inf/nans. Although it might be
> math on denorms I'm thinking about, and not necessarily conditionals.

The Wikipedia page uses both the terms "handling" and "computations" in the
performance issues section of the article.

http://en.wikipedia.org/wiki/Denormal_number

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Subject: Can this run any faster?

From: Steven_Lord

Date: 11 Apr, 2013 19:01:33

Message: 21 of 37



"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message
news:kk6t88$kmp$1@newscl01ah.mathworks.com...
> "Eric Sampson" wrote in message <kk6qln$c62$1@newscl01ah.mathworks.com>...
>>
>> In my experience what Heinrich said can definitely be true - if you're
>> talking about denormalized numbers, not inf/nans. Although it might be
>> math on denorms I'm thinking about, and not necessarily conditionals.
>
> I second that. The slownest of NaN (worst) and Inf has been discussed few
> years back. I couldn't find the thread on this topic.

Are you thinking of this?

http://www.mathworks.com/matlabcentral/newsreader/view_thread/45565

My search-fu is strong today ;)

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Subject: Can this run any faster?

From: Bruno Luong

Date: 11 Apr, 2013 19:05:10

Message: 22 of 37

Slighly faster (about 4 times than MATLAB code)

/* MATLAB MEX BCJR_loop_mex.c
 
 CALLING: >> BCJR_loop_mex(a, t1, t2, ue);
 COMPILE: >>mex -O BCJR_loop_mex.c
 ATTENTION: a() will be modified inplace
 */

#include <math.h>
#include "mex.h"

unsigned char MINF_CHAR[8] = {0, 0, 0, 0, 0, 0, 240, 255};
double *MINF = (double*)MINF_CHAR;

#define A_INOUT prhs[0]
#define T1_UINT32 prhs[1]
#define T2_UINT32 prhs[2]
#define UE prhs[3]
void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[])
{
    int m, n, i, j, p;
    int *t1_p, *t2_p, *t1, *t2;
    double A, B, *a, *ak, *an, *ue;
    
    n = mxGetM(A_INOUT);
    m = mxGetN(A_INOUT);
    t1_p = (int*)mxGetData(T1_UINT32);
    t2_p = (int*)mxGetData(T2_UINT32);
    a = mxGetPr(A_INOUT);
    ue = mxGetPr(UE);
    
    p = -1;
    an = a + n;
    for (i=m-1; i--; )
    {
        t1 = t1_p;
        t2 = t2_p;
        for (j=32; j--; )
        {
            B = a[*(t1++) + p];
            if (B != *MINF)
            {
                ak = an + *t2 + p;
                A = *ak;
                B += *ue;
                if (A != *MINF) /* avoid calculation with inf */
                {
                    *ak = ((A < B)?
                           B + log(1+exp(A-B)) :
                           A + log(1+exp(B-A)));
                }
                else *ak = B;
            }
            t2++;
            ue++;
        }
        p += n;
    }
} /* End of BCJR_loop_mex.c */

% MATLAB
    a=-inf(16,m); % m is your length
    a(1,1)=0;
    t1 = uint32(transitions(:,1));
    t2 = uint32(transitions(:,2));
    ue = ug+eg;
    avoid_Inf(a, t1, t2, ue);

% Bruno

Subject: Can this run any faster?

From: Bruno Luong

Date: 11 Apr, 2013 19:56:08

Message: 23 of 37

"Steven_Lord" <slord@mathworks.com> wrote in message <kk71ae$4d3$1@newscl01ah.mathworks.com>...

>
> Are you thinking of this?
>
> http://www.mathworks.com/matlabcentral/newsreader/view_thread/45565
>
> My search-fu is strong today ;)
>

Not that one. I was yet not participated. But this one is good too.

Bruno

Subject: Can this run any faster?

From: Roger Stafford

Date: 12 Apr, 2013 19:04:07

Message: 24 of 37

"Elnaz " <ebsadeghian@gmail.com> wrote in message <kk24q4$3j5$1@newscl01ah.mathworks.com>...
> In my script I call the function including this loop many times and "length" is in the order of 30K or more. "transitions" is a stored 32 by 4 matrix.
> ........
> a=zeros(16,length);
> a=a-inf;
> a(1,1)=0;
> for i= 2:length
> for j= 1:32
> A = a(transitions(j,2),i);
> B = a(transitions(j,1),i-1) + ug(j, i-1) + eg(j, i-1);
> if(A == -inf && B == -inf)
> a(transitions(j,2),i) = -inf;
> else
> a(transitions(j,2),i) = max(A,B) + log(1+exp(-abs(A-B)));
> end
> end
> end
- - - - - - - - -
  It suddenly occurred to me early this morning that your code involving values of the array 'a' can be greatly simplified if instead you deal with values of e = exp(a). In those terms each step within your two nested for-loops would consist of merely added some multiple (determined by 'ug' and 'eg') of an e value at the i-1 level to an e value at the i level. There would be no need to repeatedly call on the 'log' and 'exp' functions. The original initial values of -inf and 0 assigned to 'a' would be replaced by 0 and 1 assigned to 'e', respectively.

  To simplify the discussion, suppose that the 'a' array has only 3 rows instead of 16 and the transition vectors are only 6 elements long. Then the result of one pass through the inner loop might be the equivalent of some operation like this:

 e(1,i) = c(1,i-1)*e(2,i-1) + c(2,i-1)*e(3,i-1);
 e(2,i) = c(3,i-1)*e(1,i-1);
 e(3,i) = c(4,i-1)*e(1,i-1) + c(5,i-1)*e(2,i-1) + c(6,i-1)*e(3,i-1);

That is, the e vector would be subjected to a homogeneous linear transformation in which the same locations of six nonzero coefficients were involved but with their values changing at each step. Such an operation should be much faster than using repeated calls on 'log' and 'exp'. After all, linear combinations are what matlab does best.

  Of course there is a potential difficulty here. Such values of 'e' could easily lead to much earlier overflows to infinity than with 'a'. However, I speculate that this could be avoided by dealing appropriately with the variable coefficients derived from 'ug' and 'eg', that is, the 'c' values above. Conceivably those coefficients could be uniformly rescaled in such a way that overflow will not occur, though I have no way of saying just how this could be done because I don't know how 'ug' and 'eg' are defined. The point is that the pass through the entire 30K "length" of a could be replaced by a pass through 'e' with the understanding that 'a' would be some known constant value plus the logarithm of 'e' at the end of the computation.

  I would bet that if you turned Bruno loose using such an approach he would produce some amazingly faster mex computations equivalent to your present ones.

  To show my reasoning in the above claims, when A and B are finite and A > B we have

 a(k,i) = max(A,B) + log(1+exp(-abs(A-B))) = A+log(1+exp(B-A))
 exp(a(k,i)) = exp(A+log(1+exp(B-A))) = exp(A)*(1+exp(B)/exp(A)) =
    exp(A) + exp(B)

Similarly if A and B are finite with A <= B we still have

 exp(a(k,i)) = exp(B) + exp(A)

In the case where A or B or both are -inf this same equality holds. In the step where ug(j,i-1) and eg(j,i-1) are added, this is equivalent to a multiplication of exp(a) by a factor to obtain exp(B), which is what I denoted by the coefficient 'c' values above.

Roger Stafford

Subject: Can this run any faster?

From: Bruno Luong

Date: 12 Apr, 2013 20:21:09

Message: 25 of 37

I see your point Roger. It's very clever.

Bruno

Subject: Can this run any faster?

From: Roger Stafford

Date: 13 Apr, 2013 00:10:14

Message: 26 of 37

"Roger Stafford" wrote in message <kk9lr7$782$1@newscl01ah.mathworks.com>...
> It suddenly occurred to me early this morning that your code involving values of the array 'a' can be greatly simplified if instead you deal with values of e = exp(a). .........
> ..........
- - - - - - -
For the e's I referred to earlier, here is a scheme of rescaling so as to maintain the sum of those e's in each column equal to one in case there is danger of overflow otherwise.

 e=zeros(16,N);
 e(1,1)=1;
 t1 = transitions(:,1);
 t2 = transitions(:,2);
 c = exp(ug(:,1:N-1)+eg(:,1:N-1));
 s = zeros(1,N);
 for i= 2:N
   for j= 1:32
     e(t2(j),i) = e(t2(j),i) + c(j,i-1)*e(t1(j),i-1);
   end
   t = sum(e(:,i),1);
   e(:,i) = e(:,i)/t; <-- % Here is the rescaling part
   s(i) = s(i-1)+log(t); % Keep track of the accumulated scaling factor
 end

It would be understood that at any i the corresponding values of 'a' would be

 a(:,i) = log(e(:,i))+s(i);

wherever they were needed.

  Clearly the c computation as given here would involve much processing time with 'exp'. Hopefully these values might already be available in some part of the surrounding program. Actually I halfway suspect this problem had its origin in such exponential values as e and c and was probably converted to their logarithms just so as to avoid overflow.

  To tell the truth I think something a lot faster than the above inner loop should be possible. Converting the above c's to multiplication by a 16 by 16 matrix varying at each step in place of the inner for-loop could be done but that matrix would have a lot of zeros, since there are at most 32 nonzero coefficients in it. I leave that problem to wiser heads.

Roger Stafford

Subject: Can this run any faster?

From: Bruno Luong

Date: 13 Apr, 2013 09:28:09

Message: 27 of 37

I test Roger's exponential transformation, and the loop simplifies greatly.

Unfortunattely, the EXP (on ug, eg) and LOG (on "e" o get back on "a") cancels on the advantage, and it turns out with the examples we test, exponential algo running even slower than the original algorithm.

Typical results are:

Original algo: 3.240 s
BL MEX: 0.254 s
Roger's exponential: 7.038 s
RE MEX: 0.661 s

Might be I miss code something.

Here is Roger's exponetial MEX coding:

/* MATLAB MEX BCJR_exp_loop_mex.c
 
 CALLING: >> s = BCJR_exp_loop_mex(e, t1, t2, c);
 COMPILE: >>mex -O BCJR_exp_loop_mex.c
 ATTENTION: e() will be modified inplace
 */

#include <math.h>
#include "mex.h"

#define E_INOUT prhs[0]
#define T1_UINT32 prhs[1]
#define T2_UINT32 prhs[2]
#define C prhs[3]
#define S_OUT plhs[0]
void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[])
{
    int m, n, i, j;
    int *t1_p, *t2_p, *t1, *t2;
    double *e, *ei, *en, *c, *s, r, t;
    
    n = mxGetM(E_INOUT);
    m = mxGetN(E_INOUT);
    t1_p = (int*)mxGetData(T1_UINT32);
    t2_p = (int*)mxGetData(T2_UINT32);
    e = mxGetPr(E_INOUT);
    c = mxGetPr(C);
    
    S_OUT = mxCreateDoubleMatrix(1, m, mxREAL);
    s = mxGetPr(S_OUT);
    ei = e + n;
    en = ei-1;
    e = e-1;
    for (i=m-1; i--; )
    {
        t1 = t1_p;
        t2 = t2_p;
        t = 0;
        for (j=32; j--; )
        {
            r = *(c++) * *(e + *(t1++));
            *(en + *(t2++)) += r;
            t += r;
        }
        
        for (j=n; j--; ) *(ei++) /= t;
        
        *(s+1) = *s + log(t);
        
        s++;
        e += n;
        en += n;

    }
} /* End of BCJR_exp_loop_mex.c */

% Here is the Test

% MATLAB test:
N = 3e4;
ntries = 10;
times = zeros(ntries,4);

for k=1:ntries
    transitions=ceil(16*rand(32,2));
    ug = rand(32,N);
    eg = rand(32,N);
    
    tic
    a=-inf(16,N);
    a(1,1)=0;
    for i= 2:N
        for j= 1:32
            A = a(transitions(j,2),i);
            B = a(transitions(j,1),i-1) + ug(j, i-1) + eg(j, i-1);
            if(A == -inf && B == -inf)
                a(transitions(j,2),i) = -inf;
            else
                a(transitions(j,2),i) = max(A,B) + log(1+exp(-abs(A-B)));
            end
        end
    end
    times(k,1) = toc;
    a1 = a;
    clear a
    
    %% MEX engine
    tic
    a=-inf(16,N);
    a(1,1)=0;
    t1 = uint32(transitions(:,1));
    t2 = uint32(transitions(:,2));
    ue = ug+eg;
    BCJR_loop_mex(a, t1, t2, ue);
    times(k,2) = toc;
    a2 = a;
    clear a
    
    %% Roger's engine
    tic
    e = zeros(16,N);
    e(1,1) = 1;
    m = size(e,1);
    t1 = transitions(:,1);
    t2 = transitions(:,2);
    c = exp(ug(:,1:N-1)+eg(:,1:N-1));
    s = zeros(1,N);
    for i= 2:N
        im1 = i-1;
        k = t2 + im1*m;
        for j= 1:32
            e(k(j)) = e(k(j)) + c(j,im1)*e(t1(j),im1);
        end
        t = sum(e(:,i),1);
        e(:,i) = e(:,i)/t;
        s(i) = s(i-1)+log(t); % Keep track of the accumulated scaling factor
    end
    a = bsxfun(@plus, log(e), s);
    times(k,3) = toc;
    a3 = a;
    clear a
    
    %% Roger's mex engine
    tic
    e = zeros(16,N);
    e(1,1) = 1;
    m = size(e,1);
    t1 = uint32(transitions(:,1));
    t2 = uint32(transitions(:,2));
    c = exp(ug+eg); % faster without discard the last column
    s = BCJR_exp_loop_mex(e, t1, t2, c);
    a = bsxfun(@plus, log(e), s);
    times(k,4) = toc;
    a4 = a;
    clear a
    
    % Check accuracy
    err = max(abs(a1(:)-a4(:)));
    disp(err)
end

times = sum(times,1);
fprintf('Original algo: %1.3f s\n', times(1));
fprintf('BL MEX: %1.3f s\n', times(2));
fprintf('Roger''s exponential: %1.3f s\n', times(3));
fprintf('RE MEX: %1.3f s\n', times(4));

% Bruno

Subject: Can this run any faster?

From: Bruno Luong

Date: 13 Apr, 2013 09:44:08

Message: 28 of 37

Sorry I make a wrong assumption in the normalization. Here is the correct one:

/* MATLAB MEX BCJR_exp_loop_mex.c
 
 CALLING: >> s = BCJR_exp_loop_mex(e, t1, t2, c);
 COMPILE: >>mex -O BCJR_exp_loop_mex.c
 ATTENTION: e() will be modified inplace
 */

#include <math.h>
#include "mex.h"

#define E_INOUT prhs[0]
#define T1_UINT32 prhs[1]
#define T2_UINT32 prhs[2]
#define C prhs[3]
#define S_OUT plhs[0]
void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[])
{
    int m, n, i, j;
    int *t1_p, *t2_p, *t1, *t2;
    double *e, *ei, *en, *c, *s, r, t;
    
    n = mxGetM(E_INOUT);
    m = mxGetN(E_INOUT);
    t1_p = (int*)mxGetData(T1_UINT32);
    t2_p = (int*)mxGetData(T2_UINT32);
    e = mxGetPr(E_INOUT);
    c = mxGetPr(C);
    
    S_OUT = mxCreateDoubleMatrix(1, m, mxREAL);
    s = mxGetPr(S_OUT);
    ei = e + n;
    en = ei-1;
    e = e-1;
    for (i=m-1; i--; )
    {
        t1 = t1_p;
        t2 = t2_p;

        for (j=32; j--; )
        {
            r = *(c++) * *(e + *(t1++));
            *(en + *(t2++)) += r;
            t += r;
        }
        
        t = 0;
        for (j=n; j--; ) t += ei[j];
        for (j=n; j--; ) *(ei++) /= t;
        
        *(s+1) = *s + log(t);
        
        s++;
        e += n;
        en += n;

    }
} /* End of BCJR_exp_loop_mex.c */

Subject: Can this run any faster?

From: Bruno Luong

Date: 13 Apr, 2013 09:48:08

Message: 29 of 37

I also observed now and then a difference in the result that might indicate overflowing in exponential methods, despite the normalization by column suggested by Roger.

Bruno

Subject: Can this run any faster?

From: Bruno Luong

Date: 13 Apr, 2013 10:11:09

Message: 30 of 37

I have removed unnecessary statements in Roger's exponetional MEX.

/* MATLAB MEX BCJR_exp_loop_mex.c
 
 CALLING: >> s = BCJR_exp_loop_mex(e, t1, t2, c);
 COMPILE: >>mex -O BCJR_exp_loop_mex.c
 ATTENTION: e() will be modified inplace
 */

#include <math.h>
#include "mex.h"

#define E_INOUT prhs[0]
#define T1_UINT32 prhs[1]
#define T2_UINT32 prhs[2]
#define C prhs[3]
#define S_OUT plhs[0]
void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[])
{
    int m, n, i, j;
    int *t1_p, *t2_p, *t1, *t2;
    double *e, *ei, *en, *c, *s, r, t;
    
    n = mxGetM(E_INOUT);
    m = mxGetN(E_INOUT);
    t1_p = (int*)mxGetData(T1_UINT32);
    t2_p = (int*)mxGetData(T2_UINT32);
    e = mxGetPr(E_INOUT);
    c = mxGetPr(C);
    
    S_OUT = mxCreateDoubleMatrix(1, m, mxREAL);
    s = mxGetPr(S_OUT);
    ei = e + n;
    en = ei-1;
    e = e-1;
    for (i=m-1; i--; )
    {
        t1 = t1_p;
        t2 = t2_p;
        for (j=32; j--; ) *(en + *(t2++)) += *(c++) * *(e + *(t1++));
        t = 0;
        for (j=n; j--; ) t += ei[j];
        for (j=n; j--; ) *(ei++) /= t;
        *(s+1) = *s + log(t);
        s++;
        e += n;
        en += n;
    }
} /* End of BCJR_exp_loop_mex.c */

Subject: Can this run any faster?

From: Bruno Luong

Date: 13 Apr, 2013 12:49:14

Message: 31 of 37

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <kkb9ko$p4s$1@newscl01ah.mathworks.com>...
> I also observed now and then a difference in the result that might indicate overflowing in exponential methods, despite the normalization by column suggested by Roger.

I think it's rather some sort of instability that manifests.

In the original algorithm This expression is used:

max(A,B) + log(1+exp(-abs(A-B))).

Without numerical error, this expression is equal to

A + log(1 + exp(B-A))
or
B + log(1 + exp(A-B)).

However the first formula ensures the numerical stability in the recursion.

What is the equivalent in exponential formula?

Bruno

Subject: Can this run any faster?

From: Roger Stafford

Date: 14 Apr, 2013 05:48:07

Message: 32 of 37

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <kkbk8a$m33$1@newscl01ah.mathworks.com>...
> "Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <kkb9ko$p4s$1@newscl01ah.mathworks.com>...
> > I also observed now and then a difference in the result that might indicate overflowing in exponential methods, despite the normalization by column suggested by Roger.
> I think it's rather some sort of instability that manifests.
> In the original algorithm This expression is used:
>
> max(A,B) + log(1+exp(-abs(A-B))).
>
> Without numerical error, this expression is equal to
>
> A + log(1 + exp(B-A))
> or
> B + log(1 + exp(A-B)).
>
> However the first formula ensures the numerical stability in the recursion.
>
> What is the equivalent in exponential formula?
>
> Bruno
- - - - - - - - -
  The equivalent in exponential form is a simple addition: E + F where E = exp(A), F = exp(B), as can be seen by

 new value of E = exp(new version of A) = exp(A+log(1+exp(B-A)))
   = exp(A)*exp(log(1+exp(B-A))) = exp(A)*(1+exp(B)/exp(A))
   = exp(A) + exp(B) = E + F

The same holds true for the reversed form B+log(1+exp(A-B)) which is identically equal to the above expression, and for the cases when A or B is minus infinity. It remains a simple addition E + F in all cases. Thus a more complicated transformation formula is necessary for the forms 'a' than would be necessary for their exponential 'e' forms.

  You have said that the time used to convert to the exponentials of ug+eg and to convert back to the logarithms of the e's more than cancels out the time saved in this more simple transformation. This does not surprise me. My hope was that the user might already be able to furnish these coefficients of the linear transformations without doing exponential operations on individual values of ug+eg and would find the exponential forms 'e' as suitable as their logarithm versions and a conversion back to their logarithms would therefore not be needed, or if the logarithm forms were necessary perhaps they would not need all 16 x 30000 of them but only a select few.

  You also mentioned observing some sort of "instability" using the exponential forms. If this were due to overflow to plus infinity even though a rescaling using 's' were done at each step, there would have to be some extremely large values given by ug+eg in a single step, something in the 700's to produce such an effect. If such are present, it could be compensated for directly in the ug+eg values before taking their exponentials by subtracting a normalizing constant from each of the values at a given step if the appropriate compensating adjustment were made in the 's' calculation I gave.

Roger Stafford

Subject: Can this run any faster?

From: Bruno Luong

Date: 14 Apr, 2013 08:02:16

Message: 33 of 37

"Roger Stafford" wrote in message <kkdfun$op2$1@newscl01ah.mathworks.com>...
> "Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <kkbk8a$m33$1@newscl01ah.mathworks.com>...
> > "Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <kkb9ko$p4s$1@newscl01ah.mathworks.com>...
> > > I also observed now and then a difference in the result that might indicate overflowing in exponential methods, despite the normalization by column suggested by Roger.
> > I think it's rather some sort of instability that manifests.
> > In the original algorithm This expression is used:
> >
> > max(A,B) + log(1+exp(-abs(A-B))).
> >
> > Without numerical error, this expression is equal to
> >
> > A + log(1 + exp(B-A))
> > or
> > B + log(1 + exp(A-B)).
> >
> > However the first formula ensures the numerical stability in the recursion.
> >
> > What is the equivalent in exponential formula?
> >
> > Bruno
> - - - - - - - - -
> The equivalent in exponential form is a simple addition: E + F where E = exp(A), F = exp(B), as can be seen by
>
> new value of E = exp(new version of A) = exp(A+log(1+exp(B-A)))
> = exp(A)*exp(log(1+exp(B-A))) = exp(A)*(1+exp(B)/exp(A))
> = exp(A) + exp(B) = E + F
>
> The same holds true for the reversed form B+log(1+exp(A-B)) which is identically equal to the above expression,

Roger,

For finite precision calculation, A+log(1+exp(B-A)) is not equal to B+log(1+exp(A-B)) , and the original algorithm prefer one form to another depending on A larger than B or not.

Now in the exponential transformation E + F and F + E gives exactly the same result in IEEE752 regardless E and F.

I observe a divergent of exponential-transformation and original method some time. Might be my random example do not respect some condition required by the trellis.

Bruno

Subject: Can this run any faster?

From: Elnaz

Date: 8 May, 2013 23:09:09

Message: 34 of 37

Hi all,

After a long time while I 'm still suffering from long simulations I came back to this thread and vow!
Thanks so much Bruno, Roger, and Steve.
To sum everything up let me go over the problem and let's agree over the final solution:
Here is the original excerpt from MATLAB:
a=zeros(16,length);
a=a-inf;
a(1,1)=0;
for i= 2:length
     for j= 1:32
           A = a(transitions(j,2),i);
           B = a(transitions(j,1),i-1) + ug(j, i-1) + eg(j, i-1);
           if(A == -inf && B == -inf)
              a(transitions(j,2),i) = -inf;
           else
              a(transitions(j,2),i) = max(A,B) + log(1+exp(-abs(A-B)));
           end
      end
end
The main time-consuming part is the line with "max(A,B) + log(1+exp(-abs(A-B))); " which is the exact calculation of Jacobian Logarithm. We want to minimize the overall time either by reducing the number of times this calculation is required or by manipulating the calculation itself.
In parantheses, if I substitute the Jac Log with a LUT it'll make it much faster but then the accuracy is lost. So, we knowingly avoid that solution.
Bruno and/or Roger, could you please give the final solution in MATLAB script (not in mex) in the following?

Thanks,
Elnaz

Subject: Can this run any faster?

From: Marc

Date: 9 May, 2013 02:28:09

Message: 35 of 37

"Elnaz " <ebsadeghian@gmail.com> wrote in message <kmelul$5ku$1@newscl01ah.mathworks.com>...
> Hi all,
>
> After a long time while I 'm still suffering from long simulations I came back to this thread and vow!
> Thanks so much Bruno, Roger, and Steve.
> To sum everything up let me go over the problem and let's agree over the final solution:
> Here is the original excerpt from MATLAB:
> a=zeros(16,length);
> a=a-inf;
> a(1,1)=0;
> for i= 2:length
> for j= 1:32
> A = a(transitions(j,2),i);
> B = a(transitions(j,1),i-1) + ug(j, i-1) + eg(j, i-1);
> if(A == -inf && B == -inf)
> a(transitions(j,2),i) = -inf;
> else
> a(transitions(j,2),i) = max(A,B) + log(1+exp(-abs(A-B)));
> end
> end
> end
> The main time-consuming part is the line with "max(A,B) + log(1+exp(-abs(A-B))); " which is the exact calculation of Jacobian Logarithm. We want to minimize the overall time either by reducing the number of times this calculation is required or by manipulating the calculation itself.
> In parantheses, if I substitute the Jac Log with a LUT it'll make it much faster but then the accuracy is lost. So, we knowingly avoid that solution.
> Bruno and/or Roger, could you please give the final solution in MATLAB script (not in mex) in the following?
>
> Thanks,
> Elnaz

What is wrong with mex? Seems like you have several final answers. Now it is up to you to find out which one works best.

Believe it or not, Bruno has a day job.

Subject: Can this run any faster?

From: Bruno Luong

Date: 9 May, 2013 06:41:07

Message: 36 of 37

"Elnaz " <ebsadeghian@gmail.com> wrote in message <kmelul$5ku$1@newscl01ah.mathworks.com>...

> Bruno and/or Roger, could you please give the final solution in MATLAB script (not in mex) in the following?

We did provide several solutions, Mex and Matlab. The advantage and drawback of each solution is even discussed. It's really up to you to pick a final solution, suitable for your specific simulation.

Bruno

Subject: Can this run any faster?

From: Elnaz

Date: 9 May, 2013 16:04:09

Message: 37 of 37

Bruno,
Since I'm not familiar with writing the mex directly, I write C code and then compile it to get the mex version. Also, those excerpts are one part of my code. I did implement Roger's and your MATLAB excerpts into my overall C code and then run the resulted mex file. It did not make a notable difference, only 17% improvement. Obviously, the main culprit is the number of times I have to do jac logarithm. So, I think, the more efficient the jac function is implemented the less time consuming the code will be.
One thing is that in C I can not do ue = ug+eg; ahead of time and outside the loops since I need to add element by element there (ug and eg are pointers).
Roger's:
a=zeros(16,length); % <-- Don't use "length". It's the name of a matlab function
 a=a-inf;
 a(1,1)=0;
 t1 = transitions(j,1); %<-- Avoid needless indexing in the loops
 t2 = transitions(j,2);
 ue = ug+eg; % <-- Do this addition ahead of time
 for i= 2:size(a,2)
   for j= 1:32
     k = t2(j);
     A = a(k,i);
     B = a(t1(j),i-1);
     if B ~= -inf
       B = B + ue(j,i-1);
       if A ~= -inf
         a(k,i) = max(A,B) + log(1+exp(-abs(A-B)));
       else
         a(k,i) = B;
       end
     end
   end
 end

Yours:
a=-inf(16,m); % m is your length
a(1,1)=0;
[n m] = size(a);
t1 = transitions(:,1);
t2 = transitions(:,2);
ue = ug+eg;
for i= 1:m-1
    for j = 1:32
        k = t2(j) + i*n;
        A = a(k);
        B = a(t1(j),i);
        if B ~= -inf
            B = B + ue(j,i);
            if A < B
                a(k) = B + log(1+exp(A-B));
            else
                a(k) = A + log(1+exp(B-A));
            end
        end
    end
end

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us