image thumbnail
from Fractal visualisation of genomic sequences by Sam Roberts
Visualise n-mer nucleotide sequences using a fractal visualisation based on a Sierpinski gasket.

nmervisdemo.m
%% Fractal visualisation of genomic base counts

%% Introduction
% Genomic researchers are often interested in counts of the nucleotide base
% in a genetic sequence. The Bioinformatic Toolbox contains functions for
% counting sequences of length one, and two, ie individual bases and dimers.
%
% We demonstrate these using a mitochondrial DNA sequence.

load mitochondria

mitochondria(1:10)

basecount(mitochondria)

dimercount(mitochondria)

%% Visualising the counts
%
% These functions also have options for visualising the counts.

figure
subplot(1,2,1)
basecount(mitochondria,'Chart','pie');

subplot(1,2,2)
dimercount(mitochondria,'Chart','bar');

%% Longer sequences
%
% If we want to count longer sequences, we can use nmercount:

counts = nmercount(mitochondria,3);
for i = 1:10
    disp([counts{i,1},': ',num2str(counts{i,2})]);
end

%%
% But there is currently no way of visualising these counts. We turn to
% some work by Carr [1] that advocates using a fractal visualisation, which
% is extensible (in theory) to arbitrarily long sequences. To visualise
% counts of length n, we construct a Sierpinski gasket of depth n.

%% What is a Sierpinski gasket?
%
% A Sierpinski gasket is a simple fractal. A gasket of depth 0 is simply a
% tetrahedron. To construct a gasket of depth 1, the tetrahedron is
% replaced by four smaller tetrahedra, each of half the side length of the
% original, placed at each corner of the original. This leaves a hole in the 
% center. To construct deeper gaskets, keep replacing each tetrahedron with 
% four smaller ones.

%% How does this relate to genetic sequences?
%
% There are four nucleotide bases: A, C, G and T - conveniently, the same
% number as the number of vertices of a tetrahedron. Base sequences of
% length n are mapped onto subgaskets as follows: The first base in the
% sequence maps onto one of the four tetrahedra of depth 1. This contains
% four sub-subgaskets. The second base in the sequence maps onto one of
% these sub-subgaskets. And so on. We color the gasket by the count of that
% sequence, and set all the gaskets to have a transparency of 0.1.

%%
% For length 3:
nmervis(mitochondria,3)

%%
% And for length 4:

nmervis(mitochondria,4)

%%
% Sequences of length 5 and 6 are fine; 7 might be a task for DCT.
%
% This is nicely rotatable. The Sierpinski gasket has a nice property
% that when it is viewed from above, all the tetrahedra line up to form a
% square. You can achieve this by manually rotating the previous plot, or by
% calling as follows.

nmervis(mitochondria,3,'View','flat')

%%
% For longer sequences, it is useful to also scale the transparency of the
% tetrahedra by the count, to focus sttention on the common sequences.

nmervis(mitochondria,5,'ScaleAlpha','on')

%% Reference
% Carr DB, 2002: 2D and 3D coordinates for m-mers and dynamic graphics.
% Computing Science and Statistics: Proceedings of the 34th Symposium on
% the Interface.

Contact us at files@mathworks.com