How to enable the optimizer to determine the order of some elements

2 views (last 30 days)
For a little background I am working on an iGEM project (genetic engineering) and I have a bunch of DNA sequences for proteins. I need to figure out the optimal order for them so that I can break them up so that the first chunk is 500 bp long and each following chunk is 20 bp from the previous part + 480 bp that are next. I need to order them so that we use our 500 bp chunks as optimally as possible and so that proteins span as few of these 500 bp chunks as possible.
I know how to write a costing function so that given an order for these proteins I can determine how good that order is but I am currently at a loss for how to get the global optimizer to be able to change this order. The only thing that matters is that I end up with something I can order so I could take 20 arguments with each one being a number for its position, a number that I could put in order, or a vector that just had the order of the proteins in it. I just don't know how to generate that structure.
If I use a number for each one the optimizer could try to put more than one at the same value and if I give the optimizer a very large cost for those so that it gets thrown out I suspect it won't be able to find a good solution since most of the solutions it tries will be not possible. As for having it just shuffle the order of a vector I have no idea how to do that.
Any help would be appreciated. Thank you
Here is my costing function The function gives a cost (where lower is better) based on the order chosen. The only input that needs to change is order and it is just a vector that contains the indexes into the genes cell array.
For the data I have right now I would need a vector that is 17 elements consisting of the integers 1 to 17 in any order but without duplication and allow the optimizer to try various permutations to come up with an optimal solution.
function [ cost ] = gene_cost( genes, order )
%UNTITLED Summary of this function goes here
% Detailed explanation goes here
% genes is a cell array where column 1 is the gene name and column 2 is the
% DNA sequence, and column 3 is the length of the gene there is one row for each gene
%order is just a vector that defines the order of the genes
%ex [10 7 5 3...]
temp = genes(order,:);
upper_interval = zeros(1,length(temp));
for i=1:length(temp)
upper_interval(i)= sum(cell2mat(temp(1:i,3)));
end
lower_interval = [0 upper_interval(1:length(upper_interval)-1)];
min_cost = transpose(ceil(cell2mat(temp(:,3))/500));
dna = strjoin(transpose(temp(:,2)),'');
dna_length = length(dna);
%get first 500 bp and put it in the first chunk
block = {};
piece = dna(1:500);
block = vertcat(block, piece);
genes_in_block = 1 < upper_interval & lower_interval < 500;
cost = sum(genes_in_block,1);
for i = 481:480:dna_length
if dna_length -i < 500
chunk_size = dna_length -i;
else
chunk_size = 499;
end
piece = dna(i:i+chunk_size);
block = vertcat(block, piece);
genes_in_block = i < upper_interval & lower_interval < i+chunk_size;
cost = cost + sum(genes_in_block,1);
end
cost = sum((cost-min_cost).^4); %the power is used to give a high penalty for taking up more too many more blocks that necessary
  2 Comments
Jonathan Epperl
Jonathan Epperl on 25 May 2013
"Shuffling" a vector can be done with perms or randperm.
I don't think I completely understand what you are trying to do though, could you have another go at explaining your problem, maybe without referring to genetics at all?
Matt J
Matt J on 25 May 2013
I know how to write a costing function so that given an order for these proteins I can determine how good that order
Please write that for us so that we can see it, too.

Sign in to comment.

Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!