vectorizing a for loop when nested loop varies in size

Question

0 votes

i have a cell array firmsandC, that is 1x123. each of these cells has a table that is NxL, where N ranges from 1 to 4 and L ranges from 8 to 60000.

funcomplete is a function that takes the indexes for each of these entries in firmsandC, and each of the columns in each cell, and spits out a Nx5 table.

it currently takes 1.3 seconds per iteration, and would like to vectorize this. the main problem i have is that the inner loop has different size for each k, and have been trying to vectorize it with little success

for k=1:length(firmsandC)
    for i=1:size(firmsandC{1,k},2)-2
        tic
        tempfinal{k,i}=funcomplete(k,i);
        toc
    end
end

any suggestions?

thank you

7 Comments
Show 5 older comments Hide 5 older comments

John on 28 May 2020

Edited: John on 28 May 2020

Open in MATLAB Online

is the loop claim true? because I have seen this claim in the actual matlab documentation.

yes, tempfinal is preallocated.

the funcomplete function consists of a meshgrid command, and several data manipulations (innerjoin, join, outerjoin, getting unique values, etc)

i update the code above, with new changes

 tempfinal=cell(1,length(firmsandC));
for k=1:length(firmsandC)
    templist2=firmsandC{1,k}(:,2);
    templist3=firmsandC{1,k}(:,1:2);
for i=1:size(firmsandC{1,k},2)-2
    tic
    templist1=firmsandC{1,k}(:,2+i);
    tempfinal{1,k}{i}=funcomplete(k,templist1,templist2,templist3);
    toc
end
end

here is the funcomplete function if you want to see it

function tempfinal=funcomplete(k,templist1,templist2,templist3)
global  A2 A372 A462 matrixofcensustractsandlatlongtable censustractdistancestable  U  lat_new long_new temp28 lambda1 lambda2 rho
%here we create the meshgrid for census tracts for locations and census tracts for firms
[F,G]=meshgrid(A2,templist1);
c=cat(2,F',G');
d=reshape(c,[],2);
geoid2=d(:,1);
altgeoid2=d(:,2);
%here we create a similar meshgrid, mostly to keep firm license numbers
[H,J]=meshgrid(A2,templist2);
f=cat(2,H',J');
g=reshape(f,[],2);
firm_ids=g(:,2);
%we create the first table, to be used in this code
newJoin= join(table(geoid2,altgeoid2,firm_ids),matrixofcensustractsandlatlongtable);
altgeoid2=A2;
newJoin=join(newJoin,table(altgeoid2, lat_new, long_new));
%we compute the distances
newJoin=[newJoin table(lldistkm(newJoin{:,4:5},newJoin{:,6:7}))];
newJoin.Properties.VariableNames(8)={'dist1km'};
%this is the radius i constructed that in the original data, it ensures 3
%firms per census tract. varies by census tract
newJoin=join(newJoin,censustractdistancestable);
%we get rid of those firms that are outside the given radius
newJoin=[newJoin table((newJoin{:,8}  <= newJoin{:,9}))];
newJoin(newJoin{:, 10}== 0, :)= [];
newJoin.Properties.VariableNames(10)={'dummyvar'};
newJoin=innerjoin(newJoin, U);
newJoin=join(newJoin,temp28);
%%%%%%%%%%%%%%%%%% here we compute the inner terms, those related to consumers in specific census tracts
 newJoin=[newJoin table(log(newJoin{:,33}),lambda1*newJoin{:,8}+lambda2*newJoin{:,8}.*log(newJoin{:,36}))];
 newJoin.Properties.VariableNames([37 38])={'lninner','distterm'};
 newJoin=[newJoin table(exp(newJoin{:,38}+(1-rho)*newJoin{:,37}))];
 newJoin.Properties.VariableNames(39)={'indtopterm'};
 %now we have to do the bottom term of the ind expression
   [D31,~,ic]=unique( [newJoin(:,1) newJoin(:,3) newJoin(:,17) ],'rows','stable');
   out = splitapply(@(rows) sum(rows, 1), newJoin{:,39}, ic);  %sum rows with identical id
   out=out+1; % i define the outside option as a payoff of 0, so exp(0)=1.
Result_temp=table(D31{:,1},D31{:,2},D31{:,3},out);
Result_temp.Properties.VariableNames([1 2 3 4]) = {'geoid2','firm_ids','modate','indbottterm'};
 newJoin=join(newJoin,Result_temp);
newJoin=[newJoin table(newJoin{:,39}./newJoin{:,40})];
 newJoin.Properties.VariableNames(41)={'indshare'};
%we multiply the share(inner by outer) by the census tract-level market size 
newJoin=[newJoin table(newJoin{:,34}.*newJoin{:,41}.*newJoin{:,35})];
[D,~,ic]=unique( [newJoin(:,3) newJoin(:,17) newJoin(:,29)],'rows','stable');
out = splitapply(@(rows) sum(rows, 1), newJoin{:,42}, ic);  %sum rows with identical id
Result_temp=table(D{:,1},D{:,2},D{:,3},out);
Result_temp.Properties.VariableNames([1 2 3 4]) = {'firm_ids','modate','product_ids','quantitiesmodel'};
 newJoin=join(newJoin,Result_temp);
 
%%%%now that we got the quantities sold at the jft level, we can 'drop
%%%%back' to the U dataset.
   [D,~,~]=unique( [newJoin(:,3) newJoin(:,17) newJoin(:,29) newJoin(:,43)],'rows','stable');
Unew=join(U,D);
%%%%%i need to update the prices by the pricing equation. i dont have the
%%%%%elasticities yet though. i need to program the demand for that. good
%%%%%thing for performance is that it can be done outside of this loop.
[D,~,ic]=unique( [Unew(:,1) Unew(:,8)],'rows','stable');
   out = splitapply(@(rows) sum(rows, 1), (Unew{:,12}-Unew{:,13}).*Unew{:,26}, ic);  %sum rows with identical id
Result_temp=table(D{:,1},D{:,2},out);
Result_temp.Properties.VariableNames([1 2 3 ]) = {'firm_ids','modate','monthlyvarprofits'};
Unew=join(Unew,Result_temp);
[D,~,ic]=unique( Unew(:,1),'rows','stable');
   out = splitapply(@(rows) sum(rows, 1), Unew{:,27}, ic);  %sum rows with identical id
Result_temp=table(D{:,1},out);
Result_temp.Properties.VariableNames([1 2 ]) = {'firm_ids','totalprofits'};
Unew=join(Unew,Result_temp);
temp=[Unew(:,1) Unew(:,8)];
[~,uniqueYearIndex] = unique(Unew{:,1},'last');
out = temp(uniqueYearIndex,:);
out.Properties.VariableNames(2)={'totalmonths'};
Unew=join(Unew,out);
 Unew=[Unew table(Unew{:,28}./Unew{:,29})];
temp3=table(A2,A372.*log(A462));
temp3.Properties.VariableNames([1 2])={'altgeoid2','prelimcostrealestate'};
%%%%%%%%%%%%%
[D,~,~]=unique(newJoin(:,2:3),'rows','stable');
Unew=join(Unew,D);
Unew=outerjoin(Unew,temp3);
Unew( any(isnan(Unew{:,2}),2),:) = [];
temp=unique(templist3,'rows','stable');
temp=table(temp(:,1),temp(:,2));
temp.Properties.VariableNames([1 2])={'location_set_firm','firm_ids'};
Unew=join(Unew,temp);
tempfinal=unique([Unew(:,1) Unew(:,30) Unew(:,31) Unew(:,33) Unew(:,34)],'rows','stable');
 tempfinal(tempfinal{:, 5}~= k, :)= [];
 tempfinal=sortrows(tempfinal,1);
end

my main goal that led me to ask this question was that since each k and i iteration is independent of each other, if there was a way to vectorize the code in the OP, or to parallelize it.

trying to vectorize it (probably the wrong way) is doing the meshgrid all in one matrix (and i need a different meshgrid matrix for each k and i.

also, trying to parallelize it has gave me errors that parfor does not support the 'join' command. i have tried to do the workarounds that i see on other questions about this but i have been unable to get it to work

Stephen23 on 28 May 2020

Edited: Stephen23 on 28 May 2020

"Am I interpreting this wrong?"

You assumed a dilemma: that code is either fast or slow.

If your Bugatti Chiron is faster than your Porsche 911, then is the Porsche slow?

We agree that the Bugatti is much faster than the Porsche, but I still rather doubt that the police would accept your excuse that you can't possibly be speeding whilst driving your Porsche because your Bugatti is much faster.

Note that not all code can be vectorized, and not all vectorized code is faster than loops: naive code vectorization can easily lead to large intermediate data arrays which require a lot of memory, and this definitely slows things down (or even makes the vectorization untenable) (this is not just a hypothetical situation, it really does happen).

That line really says more about code vectorization than it does about the speed of loops.

John on 28 May 2020

Brent: I will look into profiling, I did not know of this concept. Thank you

Stephen: I agree, I did assume that. Thank you for the clarification

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Vaibhav Tomar on 28 May 2020

0 votes

Hey John,

Be aware of background processes that share computational resources and decrease the performance of your MATLAB® code.

While organizing your code:

Use functions instead of scripts. Functions are generally faster.
Prefer local functions over nested functions. Use this practice especially if the function does not need to access variables in the main function.
Use modular programming. To avoid large files and files with infrequently accessed code, split your code into simple and cohesive functions. This practice can decrease first-time run costs.

You may refer to the following link for more information:

https://www.mathworks.com/help/matlab/matlab_prog/techniques-for-improving-performance.html

1 Comment
Show -1 older comments Hide -1 older comments

John on 28 May 2020

Thank you, but I know this link already

Sign in to comment.

vectorizing a for loop when nested loop varies in size

7 Comments
Show 5 older comments Hide 5 older comments

Answers (1)

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Tags

Community Treasure Hunt

vectorizing a for loop when nested loop varies in size

7 Comments Show 5 older comments Hide 5 older comments

Answers (1)

1 Comment Show -1 older comments Hide -1 older comments

Categories

Tags

See Also

Community Treasure Hunt

7 Comments
Show 5 older comments Hide 5 older comments

1 Comment
Show -1 older comments Hide -1 older comments