Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

How to enable the optimizer to determine the order of some elements

Asked by William on 25 May 2013

For a little background I am working on an iGEM project (genetic engineering) and I have a bunch of DNA sequences for proteins. I need to figure out the optimal order for them so that I can break them up so that the first chunk is 500 bp long and each following chunk is 20 bp from the previous part + 480 bp that are next. I need to order them so that we use our 500 bp chunks as optimally as possible and so that proteins span as few of these 500 bp chunks as possible.

I know how to write a costing function so that given an order for these proteins I can determine how good that order is but I am currently at a loss for how to get the global optimizer to be able to change this order. The only thing that matters is that I end up with something I can order so I could take 20 arguments with each one being a number for its position, a number that I could put in order, or a vector that just had the order of the proteins in it. I just don't know how to generate that structure.

If I use a number for each one the optimizer could try to put more than one at the same value and if I give the optimizer a very large cost for those so that it gets thrown out I suspect it won't be able to find a good solution since most of the solutions it tries will be not possible. As for having it just shuffle the order of a vector I have no idea how to do that.

Any help would be appreciated. Thank you

Here is my costing function The function gives a cost (where lower is better) based on the order chosen. The only input that needs to change is order and it is just a vector that contains the indexes into the genes cell array.

For the data I have right now I would need a vector that is 17 elements consisting of the integers 1 to 17 in any order but without duplication and allow the optimizer to try various permutations to come up with an optimal solution.

    function [ cost ] = gene_cost( genes, order )
  %UNTITLED Summary of this function goes here
  %   Detailed explanation goes here
  % genes is a cell array where column 1 is the gene name and column 2 is the
  % DNA sequence,  and column 3 is the length of the gene there is one row for each gene
  %order is just a vector that defines the order of the genes 
  %ex [10 7 5 3...]
  temp = genes(order,:);
upper_interval = zeros(1,length(temp));
for i=1:length(temp)
    upper_interval(i)= sum(cell2mat(temp(1:i,3)));
end
lower_interval = [0 upper_interval(1:length(upper_interval)-1)];
min_cost = transpose(ceil(cell2mat(temp(:,3))/500));
dna = strjoin(transpose(temp(:,2)),'');
dna_length = length(dna);
%get first 500 bp and put it in the first chunk
block = {};
piece = dna(1:500);
block = vertcat(block, piece);
genes_in_block = 1 < upper_interval & lower_interval < 500;
cost = sum(genes_in_block,1);
for i = 481:480:dna_length
    if dna_length -i < 500
        chunk_size = dna_length -i;
    else
        chunk_size = 499;
    end
    piece = dna(i:i+chunk_size);
    block = vertcat(block, piece);
    genes_in_block = i < upper_interval & lower_interval < i+chunk_size;
    cost = cost + sum(genes_in_block,1);
end
cost = sum((cost-min_cost).^4); %the power is used to give a high penalty for taking up more too many more blocks that necessary

2 Comments

Jonathan Epperl on 25 May 2013

"Shuffling" a vector can be done with perms or randperm.

I don't think I completely understand what you are trying to do though, could you have another go at explaining your problem, maybe without referring to genetics at all?

Matt J on 25 May 2013

I know how to write a costing function so that given an order for these proteins I can determine how good that order

Please write that for us so that we can see it, too.

William

Products

0 Answers

Contact us