Code covered by the BSD License  

Highlights from
Primer Design GUI

image thumbnail

Primer Design GUI



16 Jun 2011 (Updated )

A GUI for designing primers

function PrimerList = PrimerDesignFcn(sequence,TargetLength,GCRange,TmRange,noSelfDimerization,noHairpinFormation,requireGCClamp,noNucleotideRepeats,handles)

%% Exploring Primer Design
% This demonstration uses the Bioinformatics Toolbox to find
% potential primers that can be used for automated DNA
% sequencing.

%   Copyright 2005-2007 The MathWorks, Inc. $Revision:
% $  $Date: 2007/01/02 18:24:29 $

%   Modified (functionalized) by Brett Shoelson
%       for incorporation in PrimerDesignGUI
%   6/19/2007
% Copyright 2007 - 2011 MathWorks, Inc.

%% Introduction
% Primer design for PCR can be a daunting task. A good
% primer should usually meet the following criteria:
% * Length is 18-30 bases. 
% * Melting temperature is 50-60 degrees Celsius. 
% * GC content is between 45 and 55
% * Does not form stable hairpins
% * Does not self dimerize.
% * Does not cross dimerize with other primers in the reaction.
% * Has a GC clamp at the 3' end of the primer.
% This demo uses MATLAB and Bioinformatics Toolbox to find
% PCR primers for the human hexosaminidase gene.

% First retrieve the hexosaminidase sequence data file from
% GenBank. The information from the GenBank file for human
% hexosaminidase is stored in a structure. The DNA sequence
% that you want to find primers for is in the Sequence field
% of the structure.

%humanHEXA = getgenbank('NM_000520');
% load hexosaminidase
% sequence = humanHEXA.Sequence;

%% Finding All Potential Forward Primers
% Your goal is to create a list of all potential forward
% primers of length 20. You can do this either by iterating
% over the entire sequence and taking subsequences at every
% possible position or by using a matrix of indices. The
% following example shows how you can set a matrix of
% indices and then use it to create all possible forward
% subsequences of length 20, in this case N-19 subsequences
% where N is the length of the target hexosaminidase
% sequence. Then you can use the *oligoprop* function to get
% properties for each of the potential primers in the
% list.

N = length(sequence); % length of the target sequence
M = TargetLength;  % desired primer length
    sprintf('Finding all potential FORWARD primers (OLIGOPROP)...\nSequence Length: %d; Target Primer Length: %d',N,M));

index = repmat((0:N-M)',1,M)+repmat(1:M,N-M+1,1);
fwdprimerlist = sequence(index);

for i = N-(TargetLength-1):-1:1 % reverse order to pre-allocate structure
        sprintf('Finding all potential FORWARD primers (OLIGOPROP)...\nSequence Length: %d; Target Primer Length: %d\nEvaluating subsequence %d of %d.',N,M,N-(TargetLength-1)-i+1,N-(TargetLength-1)));
    fwdprimerprops(i) = oligoprop(fwdprimerlist(i,:));

%% Finding All Potential Reverse Primers
% After you have all potential forward primers, you can
% search for reverse primers in a similar fashion. Reverse
% primers are be found on the complementary strand. To
% obtain the complementary strand use the *seqcomplement*
% function. Use the same index as above to retrieve the
% reverse primers.

    sprintf('Finding all potential REVERSE primers (OLIGOPROP)...\nSequence Length: %d; Target Primer Length: %d',N,M));

comp_sequence = seqcomplement(sequence); 
revprimerlist = seqreverse(comp_sequence(index));

for i = N-(TargetLength-1):-1:1 % reverse order to preallocate structure
        sprintf('Finding all potential REVERSE primers (OLIGOPROP)...\nSequence Length: %d; Target Primer Length: %d\nEvaluating subsequence %d of %d.',N,M,N-(TargetLength-1)-i+1,N-(TargetLength-1)));
    revprimerprops(i) = oligoprop(revprimerlist(i,:));

%% Filtering Primers Based on GC Content
% The GC content information for the primers is in a
% structure with the field GC. To eliminate all potential
% primers that do not meet the criteria stated above (a GC
% content of 45% to 55%), you can make a logical indexing
% vector that indicates which primers have GC content
% outside the acceptable range. Extract the GC field from
% the structure and convert it to a numeric vector.

fwdgc = [fwdprimerprops.GC]';
revgc = [revprimerprops.GC]';

if ~isempty(GCRange)
        'Filtering for GC range...');
    bad_fwdprimers_gc = fwdgc < GCRange(1) | fwdgc > GCRange(2);
    bad_revprimers_gc = revgc < GCRange(1) | revgc > GCRange(2);
    bad_fwdprimers_gc = false(size(fwdgc));
    bad_revprimers_gc = false(size(revgc));

%% Filtering Primers Based on Their Melting Temperature
% The melting temperature is significant when you are
% designing PCR protocols. The melting temperatures from
% *oligoprop* are estimated in a variety of ways (basic,
% salt-adjusted, nearest-neighbor). Create another logical
% indexing vector to keep track of primers with bad melting
% temperatures. The following example uses the
% nearest-neighbor estimates for melting temperatures with
% parameters established by SantaLucia, Jr. [1]. These are
% stored in the fifth element of the field Tm returned by
% *oligoprop*. The other elements of this field represent
% other methods to estimate the melting temperature, you can
% also use the  *mean* function from the Statistics Toolbox
% to compute an average over all the estimates.

fwdtm = cell2mat({fwdprimerprops.Tm}');
revtm = cell2mat({revprimerprops.Tm}');

if ~isempty(TmRange)
        'Filtering for Tm range...');
    bad_fwdprimers_tm = fwdtm(:,5) < TmRange(1) | fwdtm(:,5) > TmRange(2);
    bad_revprimers_tm = revtm(:,5) < TmRange(1) | revtm(:,5) > TmRange(2);
    bad_fwdprimers_tm = false(size(fwdtm(:,5)));
    bad_revprimers_tm = false(size(revtm(:,5)));

%% Finding Primers With Self-Dimerization and Hairpin Formation
% Self-dimerization and hairpin formation can prevent the
% primer from binding to the target sequence. As above, you
% can create logical indexing vectors to indicate whether
% the potential primers do or do not form self-dimers or
% hairpins. The *cellfun* function allows string lengths to
% be calculated in the cell array.

if noSelfDimerization
        'Filtering self-dimerizing primers...');
    bad_fwdprimers_dimers  = ~cellfun('isempty',{fwdprimerprops.Dimers}');
    bad_revprimers_dimers  = ~cellfun('isempty',{revprimerprops.Dimers}');
    bad_fwdprimers_dimers  = false(size(fwdtm(:,1)));
    bad_revprimers_dimers  = false(size(fwdtm(:,1)));

if noHairpinFormation
        'Filtering stable-hairpin-forming primers...');
    bad_fwdprimers_hairpin = ~cellfun('isempty',{fwdprimerprops.Hairpins}');
    bad_revprimers_hairpin = ~cellfun('isempty',{revprimerprops.Hairpins}');
    bad_fwdprimers_hairpin = false(size(fwdtm(:,1)));
    bad_revprimers_hairpin = false(size(fwdtm(:,1)));

%% Finding Primers Without a GC Clamp
% A strong base pairing at the 3' end of the primer helps in
% PCR. Find all the primers that do not end in a G or C.
% Remember that all the sequences in our lists are 5'->3'.

if requireGCClamp
        'Discarding primers absent a 3'' GC Clamp...');
    bad_fwdprimers_clamp = lower(fwdprimerlist(:,end)) == 'a' | lower(fwdprimerlist(:,end)) == 't';
    bad_revprimers_clamp = lower(revprimerlist(:,end)) == 'a' | lower(revprimerlist(:,end)) == 't';
    bad_fwdprimers_clamp = false(size(fwdtm(:,1)));
    bad_revprimers_clamp = false(size(fwdtm(:,1)));

%% Finding Primers With Nucleotide Repeats
% Primers that have stretches of repeated nucleotides can
% give poor PCR results. These are sequences with low
% complexity. To eliminate any that have stretches of four
% or more repeated bases, use the function *regexp*. Since
% *regexp* only handles strings or cell arrays of strings
% you can use the function *cellstr* to turn the
% subsequences list into a cell array of strings.

if noNucleotideRepeats
        'Discarding primers with nucleotide repeats...');
    fwdrepeats = regexpi(cellstr(fwdprimerlist),'a{4,}|c{4,}|g{4,}|t{4,}','ONCE');
    revrepeats = regexpi(cellstr(revprimerlist),'a{4,}|c{4,}|g{4,}|t{4,}','ONCE');
    bad_fwdprimers_repeats = ~cellfun('isempty',fwdrepeats);
    bad_revprimers_repeats = ~cellfun('isempty',revrepeats);
    bad_fwdprimers_repeats = false(size(fwdtm(:,1)));
    bad_revprimers_repeats = false(size(fwdtm(:,1)));

%% Find the Primers That Satisfy All the Criteria
% The rows of the original list of subsequences correspond
% to the base number where each subsequence starts. You can
% use all of the logical indexing vectors collected so far
% and create a new list of primers that only has the
% criteria that you are looking for. The figure shows how
% the forward primers have been filtered, red indicates bad
% primers and blue indicates good primers.

    'Applying all restrictions to select primers...');

bad_fwdprimers = [bad_fwdprimers_tm,bad_fwdprimers_gc,... 
bad_revprimers = [bad_revprimers_tm,bad_revprimers_gc,... 
good_fwdpos = find(all(~bad_fwdprimers,2));
good_fwdprimers = fwdprimerlist(good_fwdpos,:);
good_fwdprop = fwdprimerprops(good_fwdpos);
N_good_fwdprimers = numel(good_fwdprop);

good_revpos = find(all(~bad_revprimers,2));
good_revprimers = revprimerlist(good_revpos,:);
good_revprop = revprimerprops(good_revpos);
N_good_revprimers = numel(good_revprop);

imagesc([bad_fwdprimers any(bad_fwdprimers,2)]);
title('Filtering candidate forward primers','fontweight','b');
ylabel('Primer location');
set(handles.axes1,'XtickLabel',char({'    Tm','   %GC','Hairpin',' Dimers','Repeats','GC clamp','    ALL'}));

annotation(handles.figure1,'textbox','String','Good primers','Color','w',...
  'Position',[0.85 0.75 0.125 0.05],'BackgroundColor',[0 0 0.6275],...
annotation(handles.figure1,'textbox','String','Bad primers','Color','w',...
  'Position',[0.85 0.675 0.125 0.05],'BackgroundColor',[0.502 0 0],...

    sprintf('Computation complete...\nFound %d Good Forward Primers, %d Good Reverse Primers.',N_good_fwdprimers,N_good_revprimers));

%% Checking For Cross Dimerization
% Cross dimerization can occur between the forward and
% reverse primer if they have a significant amount of
% complementarity. The primers will not function properly if
% they dimerize with each other. To check for dimerization,
% align every forward primer against every reverse dimer,
% using the *swalign* function, and keep the low-scoring
% pairs of primers. This information can be stored in a
% matrix with rows representing forward primers and columns
% representing reverse primers. This calculation is quite
% time consuming, but you can reduce the time taken by
% noticing that there is no point in performing this
% calculation on primer pairs where the reverse primer is
% upstream of the forward primer. The image in the figure
% shows the pairwise scores before being thresholded, low
% scores (dark blue) represent primer pairs that do not
% dimerize. 

    sprintf('Checking for cross-dimerization (SWALIGN)...'));

scr_mat = [-1,-1,-1,1;-1,-1,1,-1;-1,1,-1,-1;1,-1,-1,-1;];
scr = zeros(N_good_fwdprimers,N_good_revprimers);
for i = 1:N_good_fwdprimers
    for j = 1:N_good_revprimers
            sprintf('Checking cross-dimerization (SWALIGN) for Primer Pair %d of %d...',i,N_good_fwdprimers));
        if good_fwdpos(i) < good_revpos(j)
            scr(i,j) = swalign(good_fwdprimers(i,:), good_revprimers(j,:), ...
            scr(i,j) = 13; % give a high score to ignore forward primers 
                           % that are after reverse primers

title('Cross dimerization scores')
xlabel('Candidate reverse primers')
ylabel('Candidate forward primers')

% Low scoring primer pairs are identified a logical one in
% an indicator matrix.
pairedprimers = scr<=3; 

%% Visualizing Potential Pairs of Primers in the Sequence Domain
% An alternative way to present this information is to look
% at all potential combinations of primers in the sequence
% domain. Each dot in the plot represents a possible
% combination between the forward and reverse primers after
% filtering out all those cases with potential cross
% dimerization.

    sprintf('Visualizing potential primer pairs in Sequence Domain...'));

[f r] = find(pairedprimers);
axis([1 N 1 N])
title('Primer selection graph')
xlabel('Reverse primer positions')
ylabel('Forward primer positions')

%% Selecting a Primer Pair to Amplify a Specific Region
% You can use the information calculated so far to find the
% best primer pairs that allow amplification of the 220bp
% region from position 880 to 1100. First, you find all
% pairs that can cover the required region, taking into
% account the length of the primer (M). Then, you calculate
% the Euclidean distance of the actual positions to the
% desired ones, and re-order the list starting with the
% closest distance. 

    sprintf('Selecting a Primer Pair to amplify a specific region...'));

pairs = find(good_fwdpos(f)<(880-M) & good_revpos(r)>1100);
dist = (good_fwdpos(f(pairs))-(880-M)).^2 + (good_revpos(r(pairs))-(1100)).^2;
[dist,h] = sort(dist);
pairs = pairs(h);

hold on 
plot([1100 1100],[1 880-M],'g','linewidth',3)
plot([1100 N],[880-M 880-M],'g','linewidth',3)

%% Retrieve Primer Pairs
% Use the *sprintf* function to prepare a report with the
% ten best pairs and associated information. These primer
% pairs can then be verified experimentally. These primers
% can also be 'BLASTed' using the *blastncbi* function to
% check specificity. 

    sprintf('Retrieving 10 best Primer Pairs...'));

TenBestFwdPrimers = cell(10,1);
TenBestFwdCutters = cell(10,1);
TenBestRevPrimers = cell(10,1);
TenBestRevCutters = cell(10,1);

Primers = sprintf('Fwd/Rev Primers      Start End   %%GC   mT   Length\n\n');
for i = 1:10
        sprintf('Calculating primer and cutter information for %d of 10...',i));

    fwd = f(pairs(i));
    rev = r(pairs(i));
    Primers = sprintf('%s%-21s%-6d%-6d%-4.4g%-4.4g\n%-21s%-6d%-6d%-4.4g%-7.4g%-6d\n\n', ...
        Primers, good_fwdprimers(fwd,:),good_fwdpos(fwd),good_fwdpos(fwd)+M-1,good_fwdprop(fwd).GC,good_fwdprop(fwd).Tm(5), ...
        good_revprimers(rev,:),good_revpos(rev)+M-1,good_revpos(rev),good_revprop(rev).GC,good_revprop(rev).Tm(5), ...
        good_revpos(rev) - good_fwdpos(fwd) );

    TenBestFwdPrimers{i} = good_fwdprimers(f(pairs(i)),:);
    TenBestFwdCutters{i} = rebasecuts(TenBestFwdPrimers{i});

    TenBestRevPrimers{i} = good_revprimers(r(pairs(i)),:);
    TenBestRevCutters{i} = rebasecuts(TenBestRevPrimers{i});

%% Find Restriction Enzymes That Cut Inside the Primer
% Use the *rebasecuts* function to list all the restriction
% enzymes from the REBASE database [2] that will cut a
% primer. These restriction enzymes can be used in the
% design of cloning experiments. For example, you can use
% this on the first pair of primers from the list of
% possible primers that you just calculated.

    sprintf('Finding restriction enzymes that cut inside the Primer (REBASECUTS)...'));

fwdprimer = good_fwdprimers(f(pairs(1)),:)
fwdcutter = rebasecuts(fwdprimer)

revprimer = good_revprimers(r(pairs(1)),:)
revcutter = rebasecuts(revprimer)

%% Analysis done...put results in a structure
    sprintf('Analysis complete. Use EXPORT button to write results to workspace...'));

% Construct results in a structure
PrimerList.GoodFwdPrimers = good_fwdprimers;
PrimerList.GoodFwdStartPositions = good_fwdpos;
PrimerList.GoodFwdEndPositions = good_fwdpos+M-1;
PrimerList.GoodFwdProperties = good_fwdprop;
PrimerList.GoodRevPrimers = good_revprimers;
PrimerList.GoodRevStartPositions = good_revpos;
PrimerList.GoodRevEndPositions = good_revpos+M-1;
PrimerList.GoodRevProperties = good_revprop;
PrimerList.TenBestFwdPrimers = TenBestFwdPrimers;
PrimerList.TenBestFwdCutters = TenBestFwdCutters;
PrimerList.TenBestRevPrimers = TenBestRevPrimers;
PrimerList.TenBestRevCutters = TenBestRevCutters;


%% References
% [1] SantaLucia, J. Jr. "A unified view of polymer,
% dumbbell, and oligonucleotide DNA nearest-neighbor
% thermodynamics"  Proc. Natl. Acad. Sci.  Vol. 95 pp
% 1460-1465 Feb. 1998
% [2] Roberts, R.J., Vincze, T., Posfai, J., Macelis, D.
% "The Restriction Enzyme data BASE" Nucleic Acids Research
% 33: D230-D232 (2005). 

Contact us