|
Building on from your example, I wrote another MEX code (shown below) to perform a matrix multiplication. Timing wise there seems to be no difference between this and the matlab equivalent (Out = C.*exp(-A*Res./B));. Is this the most efficient (fastest) way to do this in C? I am very new to MEX and C; therefore, apologies in advance if this is an obvious question.
thanks,
Joshua
#include "mex.h"
#include "math.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
mwSize i, n;
double Res;
double *A, *B, *C;
double *Out, *Out0;
n = mxGetNumberOfElements(prhs[0]);
A = mxGetPr(prhs[0]);
B = mxGetPr(prhs[1]);
C = mxGetPr(prhs[2]);
Res = mxGetScalar(prhs[3]);
plhs[0] = mxCreateDoubleMatrix(1,n,mxREAL);
Out = mxGetPr(plhs[0]);
Out0 = Out;
for( i=0; i<n; i++ ) {
*Out++ = *C * exp(*A * Res * (-1) / *B);
A++;
B++;
C++;
}
mxSetN(plhs[0],Out-Out0);
}
"Joshua" wrote in message <jdcg1k$fkq$1@newscl01ah.mathworks.com>...
> Dear James,
>
> Thanks for this. This was very helpful indeed!
>
> Regards,
> Joshua
>
>
> "James Tursa" wrote in message <jdc0kg$2m1$1@newscl01ah.mathworks.com>...
> > "Joshua" wrote in message <jdar6c$he3$1@newscl01ah.mathworks.com>...
> > > "James Tursa" wrote in message <jdako3$t49$1@newscl01ah.mathworks.com>...
> > > > "Joshua" wrote in message <jda5to$hni$1@newscl01ah.mathworks.com>...
> > > > > Hi,
> > > > >
> > > > > I am trying to write a simple mex code which extracts a subvector from a vector. Does anyone know the mex formatting (or has a simple example) when using vector<double> X.
> > > >
> > > > Could you explain this in more detail? I don't see any mex details at all in your code, so I have no idea what to comment on. What, exactly, are you trying to get the mex function to do?
> > > >
> > > > James Tursa
> > >
> > > Hi James,
> > >
> > > I am trying to write a mex function which imports the vector iv and pumps out the shortened form of the same vector. I have it working know using the code below; however, it is extremely slow especially with large vectors. Any ideas how to speed this up would be greatly appreciated.
> > >
> > > // Example C = A(A >= B(1) & A < B(2))
> > > /*
> > > * This is a MEX-file for MATLAB.
> > > */
> > >
> > > #include "mex.h"
> > > #include <iostream>
> > > #include <vector>
> > > #include <algorithm>
> > > using namespace std;
> > >
> > > double Vecs(double Out[], double In[],double MIN, double MAX, double Size)
> > >
> > > {
> > >
> > > vector<int> iv;
> > > unsigned int i;
> > > double k;
> > >
> > > // vector<int> iv;
> > > //int min = 5;
> > > //int max = 20;
> > >
> > > // Push arbitrary values to the vector
> > > for ( i=0; i<Size; i++ ) {
> > > k = In[i];
> > > iv.push_back(k);
> > > }
> > >
> > >
> > > // Remove elements that have values less than the min
> > > // cout << "Removing values less than " << min << endl;
> > > for ( i=0; i<iv.size(); i++ ) {
> > > if ( iv[i] < MIN ) {
> > > iv.erase( iv.begin()+i );
> > > i--;
> > > }
> > > }
> > >
> > > // Remove elements that have values greater than the max
> > > // cout << "Removing values less than " << max << endl;
> > > for ( i=0; i<iv.size(); i++ ) {
> > > if ( iv[i] > MAX ) {
> > > iv.erase( iv.begin()+i );
> > > i--;
> > > }
> > > }
> > >
> > > for ( i=0; i<iv.size(); i++ ) {
> > >
> > > Out[i] = iv[i];
> > >
> > >
> > > }
> > >
> > > }
> > >
> > >
> > >
> > > void mexFunction( int nlhs, mxArray *plhs[],
> > > int nrhs, const mxArray *prhs[] )
> > > {
> > > double *In;
> > > double *Out;
> > > double MIN;
> > > double MAX;
> > > double Size;
> > >
> > > int size_t, mrows,ncols;
> > >
> > >
> > >
> > > mrows = mxGetM(prhs[0]);
> > > ncols = mxGetN(prhs[0]);
> > >
> > > /* Create matrix for the return argument. */
> > > plhs[0] = mxCreateDoubleMatrix((mwSize)mrows, (mwSize)ncols, mxREAL);
> > >
> > > /* Assign pointers to each input and output. */
> > > In = mxGetPr(prhs[0]);
> > > MIN = mxGetScalar(prhs[1]);
> > > MAX = mxGetScalar(prhs[2]);
> > > Size = mxGetScalar(prhs[3]);
> > >
> > > Out = mxGetPr(plhs[0]);
> > >
> > > /* Call the timestwo subroutine. */
> > >
> > > Vecs(Out, In, MIN, MAX, Size);
> > > }
> >
> > The reason it is taking so long is because you are spending 99% (or more) of your time doing repeated wasteful data copying. For instance:
> >
> > // Push arbitrary values to the vector
> > for ( i=0; i<Size; i++ ) {
> > k = In[i];
> > iv.push_back(k);
> > }
> >
> > The above lines drag the entire input array through the high level cache to do the copy. In addition, the push_back method in all likelihood has to do multiple reallocations and copying to get an increasingly large array to hold the data. That is, the implementation of vector in your compiler probably allocates a block of memory up front to hold the data and then you start filling it. At some point you exceed the memory size so the implementation allocates a larger block, copies the current data to the new block and then adds the new item. This repeats *several* times and you end up copying the same data over and over again, wasting a tremendous amount of time.
> >
> > Then you iteratively delete items using the lines:
> >
> > iv.erase( iv.begin()+i );
> >
> > Every time you delete an item, my guess is the implementation copies the trailing data back by one element. Again you do this *several* times so the same data gets repeatedly copied over and over again. There may be several reallocations (i.e., entire data block copying) involved in this effort also. This again wastes a tremendous amount of time.
> >
> > Instead of going about it this way, try to access the data only once (i.e., drag it through the high level cache only once) and copy it only once. E.g., here is a simple mex routine that accomplishes what I think you are trying to do (change the >= and <= to suit your actual needs):
> >
> > // File: gele.c
> > // B = gele(A,minval,maxval)
> > // returns the same result as the MATLAB m-code:
> > // B = A( A>=minval & A<=maxval )
> > // Single pass through A, wasted memory on tail of B
> > // This is fast, but B will have wasted memory at the tail since it will
> > // take the same amount of physical memory as A even if it is smaller.
> > // CAUTION: Bare bones code, no argument checking.
> > // Programmer: James Tursa
> > #include "mex.h"
> > void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
> > {
> > mwSize i, n;
> > double minval, maxval;
> > double *Apr, *Bpr, *Bpr0;
> > n = mxGetNumberOfElements(prhs[0]);
> > minval = mxGetScalar(prhs[1]);
> > maxval = mxGetScalar(prhs[2]);
> > plhs[0] = mxCreateDoubleMatrix(1,n,mxREAL);
> > Apr = mxGetPr(prhs[0]);
> > Bpr = mxGetPr(plhs[0]);
> > Bpr0 = Bpr;
> > for( i=0; i<n; i++ ) {
> > if( *Apr >= minval && *Apr <= maxval ) {
> > *Bpr++ = *Apr;
> > }
> > Apr++;
> > }
> > mxSetN(plhs[0],Bpr-Bpr0); // Does *not* free up the trailing memory!
> > }
> >
> > The above is just one way to do it that attempts to optimize the result for speed at the expense of memory. Basically it allocates a variable that is the same size as the input and then fills in the result in a single pass through both variables. The downside is that it retains the entire memory block even if the resulting dimensions are small. If you don't like this, then one could either count the result size ahead of time and then allocate only that much memory for the result up front (would require two passes through the input A), or perhaps reallocate the result to a smaller memory block once the result size is known (requires two passes through the result B), or some variation of either of these.
> >
> > James Tursa
|