MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

How to enable the optimizer to determine the order of some elements

Asked by William on 25 May 2013

For a little background I am working on an iGEM project (genetic engineering) and I have a bunch of DNA sequences for proteins. I need to figure out the optimal order for them so that I can break them up so that the first chunk is 500 bp long and each following chunk is 20 bp from the previous part + 480 bp that are next. I need to order them so that we use our 500 bp chunks as optimally as possible and so that proteins span as few of these 500 bp chunks as possible.

I know how to write a costing function so that given an order for these proteins I can determine how good that order is but I am currently at a loss for how to get the global optimizer to be able to change this order. The only thing that matters is that I end up with something I can order so I could take 20 arguments with each one being a number for its position, a number that I could put in order, or a vector that just had the order of the proteins in it. I just don't know how to generate that structure.

If I use a number for each one the optimizer could try to put more than one at the same value and if I give the optimizer a very large cost for those so that it gets thrown out I suspect it won't be able to find a good solution since most of the solutions it tries will be not possible. As for having it just shuffle the order of a vector I have no idea how to do that.

Any help would be appreciated. Thank you

Here is my costing function The function gives a cost (where lower is better) based on the order chosen. The only input that needs to change is order and it is just a vector that contains the indexes into the genes cell array.

For the data I have right now I would need a vector that is 17 elements consisting of the integers 1 to 17 in any order but without duplication and allow the optimizer to try various permutations to come up with an optimal solution.

```    function [ cost ] = gene_cost( genes, order )
%UNTITLED Summary of this function goes here
%   Detailed explanation goes here
% genes is a cell array where column 1 is the gene name and column 2 is the
% DNA sequence,  and column 3 is the length of the gene there is one row for each gene
%order is just a vector that defines the order of the genes
%ex [10 7 5 3...]
temp = genes(order,:);```
```upper_interval = zeros(1,length(temp));
for i=1:length(temp)
upper_interval(i)= sum(cell2mat(temp(1:i,3)));
end
lower_interval = [0 upper_interval(1:length(upper_interval)-1)];
```
```min_cost = transpose(ceil(cell2mat(temp(:,3))/500));
```
```dna = strjoin(transpose(temp(:,2)),'');
dna_length = length(dna);
%get first 500 bp and put it in the first chunk
block = {};
piece = dna(1:500);
block = vertcat(block, piece);
genes_in_block = 1 < upper_interval & lower_interval < 500;
cost = sum(genes_in_block,1);
```
```for i = 481:480:dna_length
if dna_length -i < 500
chunk_size = dna_length -i;
else
chunk_size = 499;
end
piece = dna(i:i+chunk_size);
block = vertcat(block, piece);
genes_in_block = i < upper_interval & lower_interval < i+chunk_size;
cost = cost + sum(genes_in_block,1);
end
cost = sum((cost-min_cost).^4); %the power is used to give a high penalty for taking up more too many more blocks that necessary
```

Jonathan Epperl on 25 May 2013

"Shuffling" a vector can be done with perms or randperm.

I don't think I completely understand what you are trying to do though, could you have another go at explaining your problem, maybe without referring to genetics at all?

Matt J on 25 May 2013

I know how to write a costing function so that given an order for these proteins I can determine how good that order

Please write that for us so that we can see it, too.