I want to in each voxel fit this 10 points by specific biexponencial curve (so I receive parametric maps of brain)
Do not fit one voxel at a time in a loop. Instead, write the model as an N-dimensional curve where N is the number of voxels and apply lsqcurvefit to this model. Do not let lsqcurvefit use its default finite difference derivative calculations. Instead, supply your own Jacobian*x operator using the JacobianMultiplyFcn option (EDIT: or provide a sparse block diagonal Jacobian matrix computation).
Your problem is a good example of what I was discussing with Joss: both the model function and the JacobianMultiplyFcn could be GPU-optimized if lsqcurvefit could work with gpuArray variables without constantly doing GPU/CPU transfers. However, I expect the fit will still be reasonably fast if you use appropriate Matlab vectorization techniques.