Parallelizing multiple iterations and randperm for a Monte Carlo simulation

3 views (last 30 days)
I have a function which randomly picks stocks to buy, subject to the constraint of a fixed number of holding periods and fixed number of stocks that can be bought at any one time. The actual function is as follows:
function profit=monteCarloSim(stockPrices,numHoldingPeriods,numStocks,numSimuls)
[rows,cols] = size(stockPrices);
stockReturns = diff(stockPrices);
profit = zeros(numSimuls,1);
parfor i=1:numSimuls
buyWeights = zeros(rows-1,cols);
buySignal = randperm(rows-1,numHoldingPeriods);
buyRandomAssetPicks = randperm(cols, numStocks);
buyWeights(buySignal,buyRandomAssetPicks)=1/numStocks;
portfolioReturns = sum(stockReturns.*buyWeights,2);
profit(i) = sum(portfolioReturns);
end
end
I was planning to limit the number of operations in the loop by assigning each iteration of buyWeights to a 3D matrix. This way, I would take the portfolioReturns and profits(i) steps out of the loop, and completely vectorize them.
However, this approach becomes complicated when I have a huge number of simulations and a large data set, because there's not enough memory to store the whole 3D matrix. So I have to come up with a few extra steps to manage memory, e.g. build smaller chunks of the 3D matrix iteratively.
Even if I had the memory to do the 3D matrix approach, I'll still have to iterate the randperm function. I can't find a function to repeat this more effectively and I'm still a novice at using bsxfun.
Is there a way to speed this function up? I also have a good number of GPU cores, if a gpuArray method helps.
Thanks in advance!

Accepted Answer

Jan
Jan on 25 Sep 2012
Edited: Jan on 25 Sep 2012
You could use the faster FEX: Shuffle instead of RANDPERM. But unfortunately this function is not threadsafe currently and the replied random numbers will be equal in each thread.
Allocating buyWeights repeatedly consumes time. Better set it to zero:
buyWeights = zeros(rows-1,cols);
parfor i=1:numSimuls
buyWeights(:) = 0;
...
But here the PARFOR is a problem also. I cannot test this currently, but I guess this could help for both problems:
block = round(linspace(1, numSimuls + 1, numberOfCores + 1));
parfor iCore = 1:numberOfCores
buyWeights = zeros(rows-1,cols);
Shuffle(rand(1,4) * 2^32, 'seed');
for iS = block(iCore):block(iCore + 1) - 1
buyWeights(:) = 0;
buySignal = Shuffle(rows-1, 'index', numHoldingPeriods);
buyRandomAssetPicks = Shuffle(cols, 'index', numStocks);
buyWeights(buySignal,buyRandomAssetPicks) = 1/numStocks;
profit(iS) = sum(stockReturns(:) .* buyWeights(:));
end
end
In addition it is avoided to calculate the sum over the 2nd dimension at first, because this means accessing elementes spread through wide sections of the RAM. Better create the sum over the first dimension at first:
b = sum(stockReturns .* buyWeights, 1);
profit(iS) = sum(b);
Or join it to one SUM command as in the example above.
NOTE: Please test and debug this, because it is written from the scratch in a text editor.
  3 Comments
Edric Ellis
Edric Ellis on 25 Sep 2012
Unfortunately, Jan's PARFOR loop no longer follows the rules of what's allowed in terms of indexing. Results from PARFOR must either be reduction variables like this:
x = 0;
parfor ii=1:7
x = x + rand();
end
or 'sliced' outputs where the variable is indexed using the loop variable, like this:
x = zeros(1,7);
parfor ii=1:7
x(ii) = rand();
end
To fix Jan's code, we need to work around the PARFOR constraints, like so:
parfor iCore = 1:matlabpool('size') % this gets the number of workers
profitTmp = zeros(1, block(iCore+1) - block(iCore));
for iS = ...
...
profitTmp(iS) = sum ...
end
profit{iCore} = profitTmp;
end
profit = [ profit{:} ]; % might need some dimension fixing here.
where we have ensured that 'profit' is indexed correctly for a PARFOR output ('sliced').
I would also suggest you might wish to chunk the work up into more blocks than the number of workers if there is any possibility of load imbalance between the blocks.
Ephedyn
Ephedyn on 25 Sep 2012
Awesome, I understand how it works now. This ran wonderfully. In case someone else needs the solution on this Q&A, here's the extra (minor) debugging to get it running:
numberOfCores = matlabpool('size');
profit = cell(numberOfCores,1);
...
parfor iCore = 1:numberOfCores
indexOffset = block(iCore) - 1;
...
for iS = block(iCore):block(iCore + 1) - 1
profitTmp(iS-indexOffset) = sum(stockReturns(:) .* weights(:), 1);
end
profit{iCore} = profitTmp;
end
profit = [profit{:}]';
Cheers!

Sign in to comment.

More Answers (1)

Edric Ellis
Edric Ellis on 25 Sep 2012
If you take computations out of the PARFOR loop, then even if they are vectorized, they are no longer operating in parallel so it may not be faster. The two cases that you are considering both appear to involve transferring much more data from outside the PARFOR loop to be used inside it. This again will add overhead. Consider simply the 'profit(i)' calculation. Currently, the PARFOR loop only has to return a numSimuls-by-1 array. If you could return 'portfolioReturns' for summation outside the loop, you'd need to return numSimuls-by-(rows-1) elements. This extra data transfer might easily outweigh the benefit of performing a single call to SUM.
  1 Comment
Ephedyn
Ephedyn on 25 Sep 2012
Ah, got it.
P.S.: Noticed you're on the development team for the parallel computing toolbox. Want to thank you guys a lot for the work, it's been extremely useful and put together in an intuitive way.

Sign in to comment.

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!