%% Fractal visualisation of genomic base counts
%% Introduction
% Genomic researchers are often interested in counts of the nucleotide base
% in a genetic sequence. The Bioinformatic Toolbox contains functions for
% counting sequences of length one, and two, ie individual bases and dimers.
%
% We demonstrate these using a mitochondrial DNA sequence.
load mitochondria
mitochondria(1:10)
basecount(mitochondria)
dimercount(mitochondria)
%% Visualising the counts
%
% These functions also have options for visualising the counts.
figure
subplot(1,2,1)
basecount(mitochondria,'Chart','pie');
subplot(1,2,2)
dimercount(mitochondria,'Chart','bar');
%% Longer sequences
%
% If we want to count longer sequences, we can use nmercount:
counts = nmercount(mitochondria,3);
for i = 1:10
disp([counts{i,1},': ',num2str(counts{i,2})]);
end
%%
% But there is currently no way of visualising these counts. We turn to
% some work by Carr [1] that advocates using a fractal visualisation, which
% is extensible (in theory) to arbitrarily long sequences. To visualise
% counts of length n, we construct a Sierpinski gasket of depth n.
%% What is a Sierpinski gasket?
%
% A Sierpinski gasket is a simple fractal. A gasket of depth 0 is simply a
% tetrahedron. To construct a gasket of depth 1, the tetrahedron is
% replaced by four smaller tetrahedra, each of half the side length of the
% original, placed at each corner of the original. This leaves a hole in the
% center. To construct deeper gaskets, keep replacing each tetrahedron with
% four smaller ones.
%% How does this relate to genetic sequences?
%
% There are four nucleotide bases: A, C, G and T - conveniently, the same
% number as the number of vertices of a tetrahedron. Base sequences of
% length n are mapped onto subgaskets as follows: The first base in the
% sequence maps onto one of the four tetrahedra of depth 1. This contains
% four sub-subgaskets. The second base in the sequence maps onto one of
% these sub-subgaskets. And so on. We color the gasket by the count of that
% sequence, and set all the gaskets to have a transparency of 0.1.
%%
% For length 3:
nmervis(mitochondria,3)
%%
% And for length 4:
nmervis(mitochondria,4)
%%
% Sequences of length 5 and 6 are fine; 7 might be a task for DCT.
%
% This is nicely rotatable. The Sierpinski gasket has a nice property
% that when it is viewed from above, all the tetrahedra line up to form a
% square. You can achieve this by manually rotating the previous plot, or by
% calling as follows.
nmervis(mitochondria,3,'View','flat')
%%
% For longer sequences, it is useful to also scale the transparency of the
% tetrahedra by the count, to focus sttention on the common sequences.
nmervis(mitochondria,5,'ScaleAlpha','on')
%% Reference
% Carr DB, 2002: 2D and 3D coordinates for m-mers and dynamic graphics.
% Computing Science and Statistics: Proceedings of the 34th Symposium on
% the Interface.