# How can I improve Matlab perfomance for this code?

6 views (last 30 days)
Helen on 25 Aug 2014
Adn and custo are same size vectors, with a length of 10252 elements (length(Adn)=10252). Every row value in custo corresponds to the cost of the same row value in Adn . I'm trying to get the sum of all the possible combinations for the values in Adn and analyze the cost of that combination. From the Adn combinations, I am just interested in the ones that sum to a specific interval (which is lower than sum(A) and greater than sum(A)*0.9). A is previously written in the code, so it doesn't matter now. Then I want to store the combinations and their equivalent costs ( custoAll ) in two new vectors.
for k1=1:nA
C=sum(B,2);
E=combntns(custo,k1);
F=sum(E,2);
D=find(((C<(sum(A)))&(C>(sum(A)*0.9))));
if ~isempty(D)
combinations=[combinations; B(D,:)];
custoAll=[custoAll; F(D)];
end
end
This code is taking very long though. I know I should preallocate the variables combinations and custoAll otherwise they will change size in every iteration and it'll take longer to run, but I haven't been able to find a good way to do it, since Adn is already so huge. Does anybody have an insight on this?

Matz Johansson Bergström on 25 Aug 2014
Edited: Matz Johansson Bergström on 25 Aug 2014
It seems to me that combntns (which is the same as nchoosek) is taking a long time to finish. So, the appending of the combinations is not the bottleneck in this case.
Consider your example with 10252 elements. Then during the first loop nchoosek(10252, 1) = 10252. For the next loop we have nchoosek(10252,2)= 52546626. This will be the number of rows of B.
So B will then become nchoosek(10252,2)*8*2 bytes, because we use double precision, which is nchoosek(10252,2)*8*2/2^20 = 801 MB.
Ok, just for the fun of it, for the next loop B will become nchoosek(10252,3)*3*8/2^20 which is 4 TB. This is just to store the answer to B, we are not even considering the computational time.
So, you will never be able to compute this using matrices in the current way because you cannot possibly store them. I would suggest to use another approach like looping all combinations in some way. I will have to think about it.
The computation "N choose k" will increase and become smaller, sort of like a parabola. The largest number of combinations you will get from
nchoosek(10252, 10252/2)
In which we get from Matlab
Warning: Result may not be exact. Coefficient is greater than 9.007199e+15 and is only accurate to 15 digits
Remember, this is just the number of rows in B.
So I would suggest to use something else to solve the problem. I'm thinking about binary numbers, but I'm not sure. I will have to get back to this question later today.
##### 1 CommentShowHide None
Matz Johansson Bergström on 25 Aug 2014
Edited: Matz Johansson Bergström on 25 Aug 2014
I believe this problem is NP-complete. It can be seen as a relaxed version of the "Subset sum" problem, but they are really the same, see http://en.wikipedia.org/wiki/Subset_sum_problem.
This really means that this kind of problem does not have a program that can compute the answer in a quick way. However, this does not mean it is not solvable, you just have to find another way.
Even if you managed to speed up the code by creating a mask of binary numbers for instance, you will have to find all combinations and there are many iterations.
I would look at the subset sum and see if your question can be modified to that problem and use that code.