The problem of removing the data from the matrix
Show older comments
I have an array including about 100 numbers.We assume the number is from 1 to 100.I want to get 7 numbers from the array, then calculate the sum of these numbers. If the sum value is in the range of two known number.We get these seven number and dispaly.oterwise we go on to find the sum value between the known number until all the arrangement method have been tried. I want to get the matlab programming code to complete it,i really don't have a good method. Look forward to you help.Thank you!
4 Comments
Jan
on 7 Nov 2011
This sounds like a homework question. Because it is neither wanted nor useful if we solve your homework, please follow the standard procedure: Show us, what you have done so far and ask if any specific problem occurs.
Jan
on 8 Nov 2011
@chengxuan yu: It is enough, if you post the messages here. Please consider my profile and do *not* send me emails. And if you send an email (to somebody who like this), then include a link to the question. It is not convenient to search for the topic you are talking about.
Walter Roberson
on 17 Nov 2011
This is a technical forum for civil discussions. Personal arguments or justifications will be removed.
Jan
on 17 Nov 2011
@Walter: To be exact, this is a technical forum for questions related to Matlab.
@chengxuan yu: I do not answer questions sent by email. I do answer here in this forum.
Of course you are allowed to say what you want. And the editors have the power to remove it according to these guidelines:
1. Personal attacs
2. Off-topic content not related to MATLAB or any MathWorks products
Walter keeps the forum clean. He is assiduous and careful. I like it.
Answers (6)
Andrei Bobrov
on 7 Nov 2011
variant
d = randperm(100).';
idx = bsxfun(@plus,1:7,(0:numel(d)-7)');
d1 = d(idx);
ou1 = sum(d1,2);
bnd = [250 300] % Interval borders
[c c] = histc(ou1,bnd);
out = d1(find(c,1,'first'),:)
simple variant
d = randperm(100).'; % your data
n = 7;
bnd = [250 300];
n1 = 0:n-1;
for j1 = 1:numel(d)-n+1
i1 = j1 + n1;
s1 = sum(d(i1));
if s1 >= bnd(1) && s1 <= bnd(2)
out = d(i1);
break
end
end
2 Comments
Jan
on 7 Nov 2011
It will not be easy for yu to explain this solution of the homework.
chengxuan yu
on 8 Nov 2011
Fangjun Jiang
on 8 Nov 2011
The following code could solve your problem if you have enough memory. N is set at 10 first so you can try and understand the code. nchoosek(10,7) is 120 combinations. Among them, anywhere from 20 to 80 combinations might meet the criteria depending on the random data. nchoosek(100,7) will be 1.6008e+010 combinations. I don't think a personal computer at this age will have 16*8*8G memory so maybe this solution is not practical. In the meantime, take a look to see if this is a solution (for a small scale). I'll think about if there is a way to not to use the huge array, maybe use a long for-loop. Someone else might chip in too.
N=10;
K=7;
V_min=3;
V_max=4;
Data=rand(N,1);
Index=nchoosek(1:N,K);
M=size(Index,1);
Flag=false(M,1);
for j=1:M
Select=Data(Index(j,:));
Total=sum(Select);
Flag(j)=and(V_min<Total,Total<V_max);
end
SelectedData=Data(Index(Flag,:))
size(SelectedData,1)
6 Comments
chengxuan yu
on 8 Nov 2011
Jan
on 8 Nov 2011
@Fangjun: You do not need 16*8*8GB memory, if you create a tree and store only the last combination. For brute-force searching the tree will be too large: it has 1.6E10 nodes in the 1st level (100over7), 9.47E9 in the 2nd level (93over7), etc. Therefore you have to cut branches, which violates the conditions. Even then an exhaustive search will use a lot of resources, such that running the program might be more expensive than buying matching battery packs. Therefore I hoped, it is a homework only...
@chengxuan yu: To minimize the total costs (programming time + run time + costs of the batteries) I suggest either a GA method or a simple gunshot technique: Choose 15 packs randomly, swap 1, 2, or 3 random elements of the packs with the worst voltage.
BTW.: It is ridiculous to determine the voltage of a batterie as 10.7201 V. The voltage depends at first on the intensity of the current due to the internal resistance, at second on the temperature (which changes during usage), at third on the life-time of the battery (it will change during your optimization program runs!). If this is really not a homework, this is an ugly design error.
Jan
on 8 Nov 2011
@chengxuan yu: I've implemented the naive approach: Choose 7 batteries randomly (see RANDPERM or the faster SHUFFLE from the FEX) and accept them, if they match. Remove them from the pool and proceed with the next pack.
There are a lot of solutions for 16 packs, but I did not find one for 17.
chengxuan yu
on 8 Nov 2011
Fangjun Jiang
on 8 Nov 2011
@chengxuan, you didn't fully understand the code. N is the number of data points you have. Data=rand(N,1) is the data points in my code, which should correspond to your 127x1 voltage data.
Fangjun Jiang
on 8 Nov 2011
@Simon: Jan, I found this problem interesting. I doubt it is a homework as a professor might give a nchoosek(10,7) but not a nchoosek(100,7) problem.
Jan
on 8 Nov 2011
There are no solutions with more than 16 batteries, because the mean voltage rate is above the upper limit 7.85. One of the many solutions:
V = [ ...
10.866, 11.0957, 10.9235, 10.9357, 11.121, 11.0172, 11.204, 10.7411, 11.233, ...
10.7411, 11.304, 11.43, 11.3116, 11.4252, 11.2561, 11.2136, 11.2529, 11.288, ...
11.353, 11.2135, 11.274, 11.3288, 13.0326, 13.0389, 13.021, 13.0716, 13.0399, ...
13.0177, 13.0297, 13.0366, 13.0173, 13.0006, 13.0061, 13.7852, 13.8199, ...
13.8602, 13.8133, 13.7394, 13.6925, 13.6472, 13.7704, 11.3288, 12.1985, ...
12.1771, 12.2468, 12.1428, 12.2267, 10.9119, 11.018, 9.8768, 9.9391, 10.9587, ...
10.8442, 11.002, 10.8963, 10.7968, 10.9584, 10.9174, 11.0768, 10.3929, 10.6948, ...
10.8723, 11.0513, 11.1445, 11.1231, 11.0267, 11.0055, 11.1285, 11.0819, ...
11.0273, 11.0783, 11.0346, 11.1789, 11.0578, 11.0693, 10.8994, 11.0868, ...
11.2318, 10.9432, 9.6885, 8.9472, 11.6253, 11.6791, 11.6072, 11.666, 11.746, ...
11.6042, 9.5838, 9.6636, 8.9982, 11.6352, 11.6809, 11.6358, 11.6418, 11.5941, ...
11.6731, 10.3308, 10.4433, 9.7064, 9.0447, 10.6631, 10.6594, 10.5641, 10.5523, ...
10.543, 10.6785, 10.4593, 10.5995, 10.6576, 10.7239, 10.6486, 11.1344, 11.1412, ...
11.0046, 10.7201, 11.1397, 10.7368, 11.0692, 11.1128, 10.4428, 10.9322, ...
11.0642, 11.0944, 11.1896, 11.329, 11.1137, 11.0596];
Index = [ ...
81, 71, 42, 96, 11, 45, 41, 75, 56, 118, 76, 43, 98, 93, 101, 114; ...
25, 90, 62, 14, 79, 107, 20, 103, 84, 33, 91, 126, 87, 12, 13, 27; ...
70, 80, 5, 63, 95, 69, 97, 59, 17, 94, 58, 73, 74, 106, 55, 50; ...
19, 47, 85, 120, 109, 77, 110, 2, 104, 3, 48, 57, 86, 8, 78, 102; ...
99, 116, 7, 10, 83, 122, 52, 6, 88, 49, 113, 61, 123, 54, 64, 1; ...
115, 31, 105, 112, 121, 66, 65, 51, 30, 72, 82, 60, 119, 127, 15, 68; ...
32, 92, 67, 22, 111, 53, 89, 26, 4, 100, 108, 124, 117, 18, 125, 9];
disp(sum(V(Index), 1));
Using the gunshot technique finding the solutions is even faster, if the limits 77.7 < sum(V) & sumV < 77.85 are used. Then my program needs about 0.1 to 2 seconds, average 1.1 seconds. In addition increasing the lower limit let the batteries remain longer in the wanted voltage range.
The unused batteries have a too high voltage and you can even create a pack of 6 to get the required limits. Is this allowed?
Sorry for my idea that this is a homework. The values look extremly artificial and it is a very bad idea to connect batteries with 9.1V and 13.8V in series. What will happen if the weakest battery is exhausted?! So please be careful if you apply the solution and replace weak batteries as soon as possible.
As far as I understand, one solution is enough for you. But here is the program, if anybody else is interested:
% V = [... values from above]
Idx = uint8(1):uint8(length(V));
R = zeros(7, 18, 'uint8');
iR = 0;
for i = 1:1e5 % No infinite loops
test = Shuffle(length(Idx), 'index', 7); % or "randperm(length(Idx), 7)"
sumV = sum(V(Idx(test)));
if and(77.7 < sumV, sumV < 77.85) % Actually "77.05 < sumV"
iR = iR + 1;
R(:, iR) = Idx(test);
Idx(test) = [];
% Stop early, if you have proven, that there is no 17th solution:
% if iR == 16, break; end
end
end
Jan
on 8 Nov 2011
@Fangjun: UNIQUE is slow. A tiny change let the program rum 11 times faster (70 sec -> 6 sec):
N = 8
K = 7;
V_min = 3.5;
V_max = 3.8;
Data = rand(N, 1);
% fid=fopen('output.csv', 'w');
Fmt = [repmat('%d,',1,K-1),'%d\n'];
for j1=1:N
for j2=1:N
for j3=1:N
for j4=1:N
for j5=1:N
for j6=1:N
for j7=1:N
% CHANGED: UNIQUE->SORT
Index = sort([j1 j2 j3 j4 j5 j6 j7]);
if all(diff(Index)) == 0
continue;
end
Total = sum(Data(Index));
if V_min < Total && Total < V_max % AND->&&
fprintf(Fmt, Index);
end
end
end
end
end
end
end
end
% fclose(fid)
But it would be fast to cut the branches early and exclude non-unique indices directly after choosing them:
for j1=1:N
for j2=1:N
if j1~=j2
for j3=1:N
if j3~=j1 && j3~=j2
for j4=1:N
if j4 ~= j1 && j4 ~= j2 && j4 ~= j3
for j5=1:N
if j5 ~= j1 && j5 ~= j2 && j5 ~= j3 && j5 ~= j4
for j6=1:N
if j6 ~= j1 && j6 ~= j2 && j6 ~= j3 && ...
j6 ~= j4 && j6 ~= j5
for j7=1:N
if j7 ~= j1 && j7 ~= j2 && j7 ~= j3 && ...
j7 ~= j4 && j7 ~= j5 && j7 ~= j6
Index = [j1 j2 j3 j4 j5 j6 j7];
Total=sum(Data(Index));
if V_min<Total && Total<V_max
% fprintf(Fmt, Index);
end
end
end
end
end
end
end
end
end
end
end
end
end
end
0.24 sec. About 300 times faster - then the full program runs for 6 month only.
[EDITED: 09Nov2011 11:07 UTC] The above solutions create duplicate answers - 5040 for N=8. This can be avoided if the indices are created in a sorted order already:
for j1=1:N
for j2=j1+1:N
for j3=j2+1:N
for j4=j3+1:N
for j5=j4+1:N
for j6=j5+1:N
for j7=j6+1:N
Index = [j1 j2 j3 j4 j5 j6 j7];
Total = sum(Data(Index));
if V_min<Total && Total<V_max
fprintf(Fmt, Index);
end
end
end
end
end
end
end
end
Runtime for N=8: 0.000865 seconds
But currently the program creates a list of all packages with the desired property. But the author does not want a single package, but as much packages as possible. Therefore the main part remains unsolved by this method: Find a maximum set of non-overlapping packages.
8 Comments
Fangjun Jiang
on 8 Nov 2011
Great improvement, Jan! Now we are at 15 years!
Would "if ~all(diff(Index))" help?
Fangjun Jiang
on 8 Nov 2011
I thought about that but was too lazy to type in those logic. You did and it paid off!
Fangjun Jiang
on 9 Nov 2011
Now it's time to look at the voltage data. If the target for 7 batteries is 77.05 to 77.85, we are looking at the average from 11 to 11.12. If we can narrow down the individual voltage from 10.5 to 11.5, the number of qualified batteries will be 77. If further narrow down to 10.8 to 11.3, the number will be 53. We are really looking at a possibility here.
Jan
on 9 Nov 2011
@Fangjun: Have you seen my cruel gunshot method? Its charm is the power to be applicable to 10'000 entities also. It is fast and if you obtain 1000 of its good solutions, it is very likely, that the best one is fairly good - although it is usually not optimal. Well, it is a gunshot method - it has a lot of Wummms.
Fangjun Jiang
on 9 Nov 2011
Great job, Jan! I though about a similar loop but didn't get it completely right. My initial though was below but missed the j1+1 part.
%%
N=8;
count=0;
for j1=1:N
for j2=j1:N
for j3=j2:N
count=count+1;
end
end
end
count
Fangjun Jiang
on 9 Nov 2011
@Jan, I don't fully understand your "gunshot method". There is no "gunshot method" entry in wikipedia. You should enter one at your convenience. My guess is it randomly picks 7 batteries and check the voltage sum. Running the random selection large enough should return a good result. I explained this in my other answer too. But I can't understand in your answer why the number 16 is involved. And I can't find the reference of Shuffle() function although I can guess.
chengxuan yu
on 10 Nov 2011
Jan
on 17 Nov 2011
@Fangjun: I've posted my gunshot code above on the 8th of Nov. "Gunshot" means, that the program is simple, free of intelligence, and even processes the inputs randomly - but it hits the mark at least near to the target. The best matching Wikipedia article is http://en.wikipedia.org/wiki/Anti-pattern . Sorry, I do not post the specific anchor - this technique is called "gunshot linking". ;-)
The 16 is a "magic number" (see anti-patterns): There ist simply no 17th soltuion, therefore the search can stop after finding 16 battery packs. You can insert a smarter break condition: If the sum of the 7 batteries with the smallest capacity is higher than the lower limit, stop the process.
There is a link to the Shuffle function on the bottom of my other answer above, so I do not think, that you have to _guess_.
chengxuan yu
on 8 Nov 2011
0 votes
3 Comments
Fangjun Jiang
on 8 Nov 2011
What do you mean "all the possible array numbers"? All the combination picking 7 numbers out of 100, or nchoosek(100,7) combinations?
chengxuan yu
on 8 Nov 2011
chengxuan yu
on 8 Nov 2011
Fangjun Jiang
on 8 Nov 2011
So this is a nchoosek() combination problem.
The theoretical solution:
Use index=nchoosek(1:100,7) to generate all the possible combinations, use index to select the batteries, sum the voltage and check against the range. The problem is that there is not enough memory to hold the index, which is a very large matrix, not even mention nchoosek(1:127,7).
The practical solution:
You don't really need all the possible solution. You just need the sufficiently enough solutions. You can generate a random index such as index=1+round(99*rand(7,1)). Use this index to select the battery and then check the sum. This is similar to Jan's solution. You can achieve the same even without shuffle() or the new randperm(N,M) function. Repeat the random selection large enough in a for-loop should give you a good result. Remember to check every time whether the index has duplicated numbers.
I think seeking the theoretical solution using MATLAB programming is still possible.
Use the following code, go through 7 levels of nested for-loop to get all the possible combinations. If the voltage sum is within range, write the battery index to a text file. Duplicated index within each combination has been taken care of. But for every combination that meets the criteria, it will be duplicated 5040 times (which is 7! or prod(1:7)). That is also the number of repeats that are executed in the code inside the for-loops. So the code is inefficient but it runs without hitting memory limits. For N=8, it takes about 100 seconds on my laptop. For N=100, it will take 100*(100^7/8^7) seconds which is 151 years! Darn! I thought it would be practical. But at least, it is not light-years. Oh, wait, light-year is not about time.
One more thing, the output .csv file contains 5040 duplicated copies for every valid selection, but that could be easily handled using Excel.
N=8;
K=7;
V_min=3.5;
V_max=3.8;
Data=rand(N,1);
fid=fopen('output.csv','wt');
for j1=1:N
for j2=1:N
for j3=1:N
for j4=1:N
for j5=1:N
for j6=1:N
for j7=1:N
Index=unique([j1 j2 j3 j4 j5 j6 j7]);
if size(Index,2)<K
continue;
end
Total=sum(Data(Index));
if and(V_min<Total,Total<V_max)
fprintf(fid,[repmat('%d,',1,K-1),'%d\n'],Index);
end
end
end
end
end
end
end
end
fclose(fid);
3 Comments
chengxuan yu
on 9 Nov 2011
Jan
on 9 Nov 2011
Capacity is less dangerous, because I assume (hope) that the batteries are connected in parallel.
chengxuan yu
on 10 Nov 2011
Categories
Find more on Loops and Conditional Statements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!