Numerical packing

Question

2 votes

Hi all, here's a quick one. I've got an array and a limit:

A = [235 235 235 234 235 235 234 234 234 220 280 215 234]
lim = 1000

I want to group the elements of A so that:

1. The sum of each group's elements is below my lim

2. The group numbers of consecutive elements of A never decrease

3. A minimum number of groups is used. Ie, if a valid solution exists using 3 groups, all solutions using 4 or more groups are inferior.

4. The maximum of the sums within each group is minimized (clarified below)

As an example:

grps = zeros(size(A));
while any(grps==0)
 openIdxs = find(grps==0);
 mask = cumsum(A(openIdxs))<lim;
 grps(openIdxs(mask)) = max(grps)+1;
end

The above returns:

grps =
     1  1  1  1  2  2  2  2  3  3  3  3  4

This satisfies (1), (2) and (3), but note that the 4th group has only one element, and the sums of each group are:

grpSums = arrayfun(@(i)sum(A(grps==i)),1:max(grps))
grpSums =
   939   938   949   234

Can anyone think of a nice way to "balance" the group sums so that the maximum group sum is minimised? Note that I want to keep the group count minimised (at 4 in this case)... I don't want to just make lots of groups of 1 element.

Bonus votes for simplicity of code over speed and/or exactness of solution... I don't mind if something is close to optimal but not quite there, and I'm only working with tiny arrays similar to the example above.

Clarification:

Point (4) refers to the grpSums variable calculated above. The command max(grpSums) in this example returns 949. If move some elements from group 3 to group 4:

grps = [1 1 1 1 2 2 2 2 3 3 4 4 4]

my grpSums becomes [939 938 734 449], making max(grpSums) give 939. An even better (and perhaps optimal for this case?) solution is:

grps = [1 1 1 2 2 2 3 3 3 3 4 4 4]

which gives grpSums = [705 704 922 729]. Note that it's not so much the range of numbers in grpSums that I care about - it's specifically its maximum.

8 Comments
Show 6 older comments Hide 6 older comments

Sven on 19 Dec 2011

Sean I've added this point specifically above. Don't worry, I'm not moving goal posts... that point was part of my original example, just not explicitly put down with the other rules :)

Sean de Wolski on 19 Dec 2011

Oh it's fine, and a great simplification (if what I'm doing makes any sense at all (which it might (probably does) not)).

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Sven on 29 Dec 2011

Open in MATLAB Online

0 votes

Thank you Sean, Elige, for your responses, and thanks Walter for naming the problem I'd posed. Even after seeing it as a "knapsack" problem, I had difficulty adopting one of the FEX knapsack solutions to this particular problem out of the box, so I also made a solution of my own (below).

Here are some codes, times, and comments:

My solution: (sorry about the line wrapping... I use a wider MATLAB page size than this website)

function grps = orderedMinOfMaxKnapsackSums(A, lim)
% Discover how many knapsacks (or groups) to use
grps = zeros(size(A));
while any(grps==0)
    openIdxs = find(grps==0);
    mask = cumsum(A(openIdxs))<lim;
    grps(openIdxs(mask)) = max(grps)+1;
end
numGrps = max(grps);
% Build all possible indice pairs for each group
maxInds = arrayfun(@(i)find(grps==i,1,'last'),1:numGrps);
grpIndPairs = arrayfun(@(mn,mx)[mn mn; nchoosek(mn:mx,2); mx mx], 1:numGrps, maxInds, 'Un',0);
% Ensure that the first group "starts" at 1, and the last group "ends" at the length of A
grpIndPairs{1}(grpIndPairs{1}(:,1)>1,:) = [];
grpIndPairs{end}(grpIndPairs{end}(:,2)<length(A),:) = [];
% Remove the "possible" indice pairs that over-fill that knapsack
grpIndPairs = cellfun(@(pair)pair(arrayfun(@(fr,to)sum(A(fr:to)), pair(:,1),pair(:,2))<lim, :), grpIndPairs, 'Un',0);
% Reduce the "possible" indice pairs to those with valid adjacent knapsack possibilities. For
% example, if knapsack 1 only holds indices 1:2 or 1:3, knapsack 2 must start at element 3 or 4.
while 1
    oldNumels = cellfun(@numel, grpIndPairs);
    grpIndPairs(1:end-1) = cellfun(@(this,next)this(ismember(this(:,2), next(:,1)-1),:),...
        grpIndPairs(1:end-1), grpIndPairs(2:end), 'Un',0);
    grpIndPairs(2:end) = cellfun(@(this,last)this(ismember(this(:,1), last(:,2)+1),:)  ,...
        grpIndPairs(2:end), grpIndPairs(1:end-1), 'Un',0);
    if isequal(oldNumels, cellfun(@numel, grpIndPairs)), break;  end
end
% Build the set of all possible combinations from each of the groups
elNumsCell = cellfun(@(x)1:size(x,1), grpIndPairs, 'UniformOutput', false);
grpIndsCell = cell(size(elNumsCell));
[grpIndsCell{:}] = ndgrid(elNumsCell{:});
allCombs = cell2mat(arrayfun(@(i)grpIndPairs{i}(grpIndsCell{i}(:),:),1:numGrps,'Un',0));
% Remove combinations where knapsack elements don't immediately follow the previous knapsack
combsDiff = diff(allCombs,[],2);
allCombs = allCombs(all(combsDiff(:,2:2:end)==1, 2), :);
% Calculate the size of each knapsack for all the combinations that remain
combSums = zeros(size(allCombs,1),numGrps);
for i = 1:numGrps
    combSums(:,i) = arrayfun(@(fr,to)sum(A(fr:to)), allCombs(:,2*(i-1)+1), allCombs(:,2*(i-1)+2));
end
% Choose the best combination
[~, bestCombIdx] = min(max(combSums,[],2));
grps = cell2mat(arrayfun(@(i)i*ones(1,1+diff(allCombs(bestCombIdx,2*(i-1)+(1:2)))), 1:numGrps,'Un',0));
end

And here's Elige's solution:

function best_combo = bruteForce(A, lim)
intmax = prod(A);
% WEIGHT 
%   >0.5 weight STD of sum more heavily 
%   <0.5 weight difference of lim and MEAN of sum more heavily 
weight = 0.5; 
best_combo = zeros(1,length(A));
best_fit = intmax;
for i = 1 : length(A)-1
  B = nchoosek(1:length(A)-1,i);
  for j = 1 : nchoosek(length(A)-1,i)
      C = (i+1)*ones(1,length(A));
      for k = sort(1 : i,'descend')
          C(1:length(A)<=B(j,k))=k;
      end
      fit = (std(arrayfun(@(k)sum(A(C==k)),1:i+1))*(weight))^2 + ...
            ((lim-mean(arrayfun(@(k)sum(A(C==k)),1:i+1)))*(1-weight))^2;
      if sum(arrayfun(@(k)sum(A(C==k)),1:i+1)>lim) == 0 && fit < best_fit
          best_fit = fit;
          best_combo = C;
      end
  end
end
end

Here are some timings:

A = [235 235 235 234 235 235 234 234 234 220 280 215 234]
lim = 1000
tic, for i = 1:10
orderedMinOfMaxKnapsackSums(A, lim);
     end, toc
tic, for i = 1:10
bruteForce(A, lim);
     end, toc

Elapsed time is 0.053719 seconds.

Elapsed time is 338.265262 seconds.

Elige, I know I stated that I was dealing with small data sets so efficiency wasn't an issue - but I think the brute force method here ran for a bit longer than I was anticipating.

Sean, I don't have the GO toolbox so I can't test out your solution. Can you post a little speed summary here for us?

Walter, I had a bit of trouble fitting my particular problem into one of the FEX knapsack entries. It seems that they had in-built criterion to optimise (whereas my criterion was to minimise the maximum knapsack size), so I went ahead making my own solution as an exercise. Did I miss something obvious that could have saved time?

So anyway Sean, you get the accepted answer... I'm tempted to accept my own but it doesn't meet my own criteria (of short and sweet) and want to at least give you guys the credit for tackling someone else's problem :)

3 Comments
Show 1 older comment Hide 1 older comment

Sean de Wolski on 30 Dec 2011

For the example above, timing for mine on R2011b 64bit Windows:

Elapsed time is 0.313661 seconds.

You could also speed your's up a little bit by using 'prodofsize' as the function for cellfun. ( http://www.mathworks.com/matlabcentral/answers/10966-is-there-any-process-to-convert-cell-type-array-to-numarical-array Jan's comment and my time test)

(And don't worry about the Answer; at least you didn't delete this question :) and ask it again)

Sean de Wolski on 30 Dec 2011

Ans 0.55 seconds for A = [A;A];

Afetr that it's not obeying constraints.

Sign in to comment.

Answer 2

Walter Roberson on 19 Dec 2011

3 votes

After the clarifications, this is a knapsack problem. Knapsack problems are generally NP-complete (exponential time) to solve exactly, but in the restricted case of non-negative integer "weights", one can use Dynamic Programming; the algorithm is hinted at over here

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Answer 3

Seth DeLand on 10 Mar 2014

Open in MATLAB Online

3 votes

Seeing how Optimization Toolbox has a mixed-integer solver in R2014a, I figured I would post this alternate solution that uses intlinprog to solve this problem. If you examine the output structure from intlinprog, the absolute gap is zero, meaning the solver has proven that this is the optimal solution to this problem.

A few things to note:

This finds the same solution found in the brute force approach, with a max group size of 922.
This code runs in a few hundredths of a second, likely much faster than brute force or global optimization methods for this problem.

%%Data
AA = [235 235 235 234 235 235 234 234 234 220 280 215 234];
lim = 1000; % maximum allowable in any group  
ng = 4; % number of groups
%%Useful sizes
na = length(AA);
nx = ng*na;
nz = 1;
%%Variables
% There are two types of variables in this formulation: 
% x: ng x na matrix of binary variables. x_i,j = 1 if A(i) is in group j. 
% z: auxiliary variable used to keep track of maximum size of any group.
% The goal is to minimize z.
%%Objective function coefficients
f = zeros(nx + nz, 1);
f(end) = 1; % Minimize z, which is last in f
%%Sum of each group is less than limit
  % A*x <= b
  A = zeros(ng,nx+nz);
  for ii = 1:na
    A(:,ng*(ii-1)+1:ng*ii) = diag(repmat(AA(ii),ng,1));
end
b = repmat(lim,ng,1);
%%Minimax formulation
% Variable z changes the minimization problem into a minimax problem:
% min z  s.t.  Sum Ax - z <=0 (z is greater than or equal to size of each 
% group)
Atmp =  A;
Atmp(:,end) = -1;
A = [A; Atmp];
b = [b; zeros(ng,1)];
%%Groups of consecutive elements never increase (linear inequality)
Atmp = zeros(na-1,nx+nz);
for ii = 1:na-1
    Atmp(ii,ng*(ii-1)+1:ng*ii) = 1:ng;
    Atmp(ii,ng*ii+1:ng*(ii+1)) = -(1:ng)';
end
A = [A; Atmp];
b = [b; zeros(na-1,1)];
%%Equality constraint that assigns each member to exactly 1 group
Aeq = zeros(na,nx+nz);
for ii = 1:na
    Aeq(ii,ng*(ii-1)+1:ng*ii) = 1;
end
beq = ones(na,1); % exaclty 1 in each group
%%Bounds
lb = zeros(nx+nz,1); % all variables positive
ub = ones(nx+nz,1); % upper bounds for binary variables
ub(end) = Inf; % no upper bound on z
%%Integer variables
intIdxs = 1:nx; % All x variables are binary integers
%%Solve
[xopt,fval,eflag,output] = intlinprog(f, intIdxs, A, b, Aeq, beq, lb, ub);
% If the problem is infeasible (negative eflag), might need to add more
% bins.
%%Post-process
x = xopt(1:end-1); % Extract x from solution vector
grpMax = xopt(end); % grpMax is z
x = round(reshape(x,ng,na));
optBins = zeros(size(AA)); % assigned bins
for ii = 1:na
    optBins(ii) = find(x(:,ii));
end
binVals = A(1:ng,:)*xopt; % grpMax is z

The found solution:

binVals =

1 Comment
Show -1 older comments Hide -1 older comments

Sven on 14 May 2015

By the way, Seth, this answer is simply excellent. I had encountered a larger problem (same scenario, more groups to the input data) that would hit memory limits on my brute force method but was still just fractions of a second via this integer linprog call.

Thanks!

Sign in to comment.

Answer 4

Dr. Seis on 19 Dec 2011

Open in MATLAB Online

1 vote

UGLY/BRUTE-FORCE: I assumed you were trying to minimize the standard deviation of your group sum AND the difference between your lim and mean group sum. I also assumed that there would always be at least 2 groups. I check all possible group sizes/combinations. Here goes:

A = [235 235 235 234 235 235 234 234 234 220 280 215 234];
lim = 1000;
intmax = prod(A);
% WEIGHT 
%   >0.5 weight STD of sum more heavily 
%   <0.5 weight difference of lim and MEAN of sum more heavily 
weight = 0.5; 
best_combo = zeros(1,length(A));
best_fit = intmax;
for i = 1 : length(A)-1
  B = nchoosek(1:length(A)-1,i);
  for j = 1 : nchoosek(length(A)-1,i)
      C = (i+1)*ones(1,length(A));
      for k = sort(1 : i,'descend')
          C(1:length(A)<=B(j,k))=k;
      end
      fit = (std(arrayfun(@(k)sum(A(C==k)),1:i+1))*(weight))^2 + ...
            ((lim-mean(arrayfun(@(k)sum(A(C==k)),1:i+1)))*(1-weight))^2;
      if sum(arrayfun(@(k)sum(A(C==k)),1:i+1)>lim) == 0 && fit < best_fit
          best_fit = fit;
          best_combo = C;
      end
  end
end
grpsum=arrayfun(@(k)sum(A(best_combo==k)),1:max(best_combo));

This results in:

best_combo =
     1     1     1     2     2     2     3     3     3     3     4     4     4
grpsum =
   705   704   922   729

2 Comments
Show None Hide None

Sven on 19 Dec 2011

Nice one Elige - you get a vote for correctness but I'll hold off on accepting this one as final. Even though it's a small dataset so brute force won't be taxing on resources, I'm still holding out for "neat and tidy". I'm wondering if some accumarray() or diag() trick is lurking beyond my reach that some clever answer can bring closer :)

And a little note that "k=sort(1:i,'descend')" can be simplified as "k=i:-1:1" to avoid a sort.

Dr. Seis on 20 Dec 2011

Oooh, I learn a neat trick every day. Thanks!

Sign in to comment.

Answer 5

Sean de Wolski on 20 Dec 2011

Open in MATLAB Online

1 vote

Alright, here goes!! accumarray is certainly a part of it, as is ga() the awesome genetic algorithm function. So yes, Global Optimization Toolbox is required :)

main.m:

%12/19/2011
%
%Data
A = [235 235 235 234 235 235 234 234 234 220 280 215 234]'; %column vector (because they're cool)
lim = 1000; %upper limit.
szA = size(A);
% Get bounds and number of variables:
[bounds, n] = getValidGroup(A,lim);
% Engine:
ObjectiveFunction = @(x)binWeight(A,index2groups(x,szA),lim); %get the weight
[x,fval] = ga(ObjectiveFunction,n,[],[],[],[],ones(1,n),bounds(2,:),[],1:n);
index = index2groups(x,szA);
fprintf('Groups: %s \n',num2str(index));
fprintf('Weight is: %i!\n',binWeight(A,index,lim));

binWeight.m:

function weight = binWeight(A,index,lim)
%
%data nx1 vector of values
%index = nx1 vector of groupings
%lim = threshold of group sum
%
groups = accumarray(index,A); %sum bins
groups(groups>lim) = lim*10; %lim is soft constraint with stiff penalty.
weight = max(groups); %worst one

getValidGroup.m:

function [bounds cnt] = getValidGroup(A,lim)
%
%inputs:
%A  - nx1 vector of values
%lim - the group limit
%
%outputs:
%bounds - 2x(cnt) lower/upper bounds
%cnt - number of bins
%
cnt = 0;
szA = size(A);
index = zeros(szA);
A2 = A;
while ~isempty(A)
    cnt = cnt+1;
    index(cnt) = find(cumsum(A)<lim,1,'last');
    A(1:index(cnt)) = [];
end
index = index(1:cnt);
index(end) = find(cumsum(flipud(A2))<lim,1,'last');
bounds = 1:cnt; %lower bounds
bounds(2,:) = index; %upper bounds

index2groups.m:

function groups = index2groups(index,szA)
%index is a vector cumulative max values
index = index(:);
index = cumsum([1;index(1:end-1)]);
groups = zeros(szA(1),1);
groups(index) = 1; %min(index,szA(1))
groups = cumsum(groups);

And executing main.m:

Optimization terminated: average change in the penalty fitness value less than options.TolFun and constraint violation is less than options.TolCon. Groups: 1112223333444 Weight is: 922!

I have to say this is one of the most fun/educational/rewarding MATLAB problems I've ever worked on. Thanks!

2 Comments
Show None Hide None

Sven on 20 Dec 2011

@Sean:

1. I agree that column vecs are much cooler than row vecs.

2. Thanks for posting a great answer.

3. I don't have the GOT, but from your answer I can see other problems that I tackle where it may be helpful.

@all: I'm out tomorrow but will select an answer when I get back on a computer.

To give some background, this problem came up because I'm taking a set of ~20 small figure snapshots and trying to stitch them into one "summary" image. My limit of 1000 comes from the max px height that I have to display things. I want my summary to be as compact as possible (laterally first, then vertically) but keep my small figures in order. My current solution as given in the question does a great job of getting the number of columns correct, but I often see the final column of stitched images containing one measly little image. No matter how hard I shake my fist at it, it doesn't change. I'm about to go through around 10,000 of these sets remaking the stitched images so I thought I should ask if anyone knows a nicer way to organise them :)

Walter Roberson on 20 Dec 2011

Yup, a knapsack problem ;-)

Sign in to comment.

Numerical packing

8 Comments
Show 6 older comments Hide 6 older comments

Accepted Answer

3 Comments
Show 1 older comment Hide 1 older comment

More Answers (4)

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments

2 Comments
Show None Hide None

2 Comments
Show None Hide None

Categories

Tags

Community Treasure Hunt

Numerical packing

8 Comments Show 6 older comments Hide 6 older comments

Accepted Answer

3 Comments Show 1 older comment Hide 1 older comment

More Answers (4)

0 Comments Show -2 older comments Hide -2 older comments

1 Comment Show -1 older comments Hide -1 older comments

2 Comments Show None Hide None

2 Comments Show None Hide None

Categories

Tags

See Also

Community Treasure Hunt

8 Comments
Show 6 older comments Hide 6 older comments

3 Comments
Show 1 older comment Hide 1 older comment

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments

2 Comments
Show None Hide None

2 Comments
Show None Hide None