Random elements of a vector which sum near a specific number

Question

0 votes

This is a fun question. I have a large vector which I need to take out 1000 random sub-vectors whose elements all add up to around the same value.

The length of this vector is 33268 with an average value of 0.028743, max of 6.0637, min of 0.006063, and std of 0.040495. The total sum is 956.23 and I need subvectors which add to 20, 30, and so on... So I think this should be possible while being fairly random. This is my attempt:

for j=1:1000
 while 1
     randoms = (randi(2,[1 length(parentweight)])-1);
     sumweights(i) = sum(parentweight(find(randoms)));
     if (sumweights>=n-mean(parentweight))&&(sumweights<=n+mean(parentweight))
        break;
     end
 end
 randsubinds{j} = randoms; 
end
end

I let this run for 86000 iterations and the minimum sum was 462.19 and the max was 495.35 with a 4.5 std. In other words it seems to be always choosing almost exactly half of the elements and will never go down to around 20. That's because it's a uniform distribution I assume but there's no poisson distributed integer function.

While typing this out I came up with this:

 for j=1:1000
broken = 0;
while 1
    for i = 1:length(parentweight)
        randoms(i) = randi(length(parentweight),1);
        subweights(i) = parentweight(randoms(i));
        if (sum(subweights)>=n-mean(parentweight)/2)&&(sum(subweights)<=n+mean(parentweight)/2)
            broken = 1;
            break;
        end
    end
    if broken
        break;
    end
    clear randoms subweights
end
   randsubinds{j} = randoms; 
   clear randoms subweights
end

This is very slow. Anyone have ideas on a better way to do this?

2 Comments
Show None Hide None

Jim Hokanson on 2 Jul 2013

Do you care about the uniqueness of the solutions both within a target value and between target values? For the between example would a vector of ids for 20 be fine and that same vector plus a numeric value of 10 be fine for 30 (assuming 10 were actually present). The main point being that the two vectors only differ by a small subset of numbers.

Jon on 2 Jul 2013

They should be found separately and be more unique than that for sure.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Jim Hokanson on 2 Jul 2013

Open in MATLAB Online

0 votes

Here's a solution.

Some notes: # Remove calculations from your loops, like the mean of the data, this doesn't change # sorting random data is a nice way of shuffling your indices

N_RAND_PER_ITERATION = 1000; %Make this larger to go quicker, but it also
%takes a lot more memory
N_ITERATIONS_MAX     = 10; %You could make this really large to force
%obtaining a solution
N_SOLUTIONS_GOAL = 1000; %You might want to go higher if you want to trim
%the solutions afterwards
%This is just to avoid overflow on the solution cell array
N_SOLUTIONS_MAX  = N_SOLUTIONS_GOAL + 1000;
%This roughly matches the statistics in the post
n    = 33268;
data = gamrnd(0.05,1,1,n);
target_vectors  = 20:10:100;
%This code doesn't need to change
%---------------------------------------------------------
n_targets       = length(target_vectors);
solution_count  = zeros(1,n_targets); %This keeps track of how many
%valid vectors we've found for each target length
final_solutions = cell(N_SOLUTIONS_MAX,n_targets); %Indices go back into
%the original data vector
%
%i.e. in this example
%temp = final_solutions{1,1}
%sum(temp) => will be about 20
tolerance       = mean(data);
for iRand = 1:N_ITERATIONS_MAX
    [~,I_shuffle] = sort(rand(N_RAND_PER_ITERATION,n),2);
    shuffled_data = data(I_shuffle);
    cum_data      = cumsum(shuffled_data,2);
      for iTarget = 1:n_targets
          cur_target      = target_vectors(iTarget);
          %This is the value we are trying to hit
          [min_val,I2]    = min(abs(cum_data - cur_target),[],2);
          %Find the set of values which bring us closest to the target
          valid_entries_I = find(min_val < tolerance);
          %Does this closest value come within our tolerance?
          n_entries       = length(valid_entries_I);
          %This bit of code goes through each of the solutions
          %and assigns it to the final output
          cur_solution_index = solution_count(iTarget);
          for iMatch = 1:n_entries
              cur_solution_index = cur_solution_index + 1;
              cur_entry = valid_entries_I(iMatch);
              cur_end   = I2(cur_entry);
              final_solutions{cur_solution_index,iTarget} = I_shuffle(cur_entry,1:cur_end);
          end
          solution_count(iTarget) = cur_solution_index;
      end
      if all(solution_count > N_SOLUTIONS_GOAL)
          break
      end
  end

1 Comment
Show -1 older comments Hide -1 older comments

Jon on 2 Jul 2013

Interesting solution. Thanks.

Sign in to comment.

Answer 2

Image Analyst on 2 Jul 2013

0 votes

I'm not sure I understand why you think that random elements taken from a vector should ever add up to a specific number, unless they're integers and you just got lucky.

2 Comments
Show None Hide None

Jon on 2 Jul 2013

Edited: Jon on 2 Jul 2013

That's why it's "add up to around the same value" and I use this as criteria for an acceptable subvector if (sum(subweights)>=n-mean(parentweight))&&(sum(subweights)<=n+mean(parentweight)). I changed it slighly than what's above. Based upon the statistics of the values of the parent vector I should be able to get pretty random vectors which sum to aournd 20 or more because these numbers are orders of magnitude larger than the average element of the parent vector.

This weight that I'm summing over is the measure of how many single "events" that value counts as. It can be a lot less than one because this is a Monte Carlo simulation. I need to take a random sampling of a specific number of events. My code works. It's just very slow.

Image Analyst on 2 Jul 2013

OK, you can check if the sum is within some tolerance value, like shown in the FAQ.

Sign in to comment.

Random elements of a vector which sum near a specific number

2 Comments
Show None Hide None

Accepted Answer

1 Comment
Show -1 older comments Hide -1 older comments

More Answers (1)

2 Comments
Show None Hide None

Categories

Tags

Community Treasure Hunt

Random elements of a vector which sum near a specific number

2 Comments Show None Hide None

Accepted Answer

1 Comment Show -1 older comments Hide -1 older comments

More Answers (1)

2 Comments Show None Hide None

Categories

Tags

See Also

Community Treasure Hunt

2 Comments
Show None Hide None

1 Comment
Show -1 older comments Hide -1 older comments

2 Comments
Show None Hide None