seqprofile - Calculate sequence profile from set of multiply aligned sequences

Syntax

Profile = seqprofile(Seqs)
[Profile, Symbols] = seqprofile(Seqs)

seqprofile(Seqs, ...'Alphabet', AlphabetValue, ...)
seqprofile(Seqs, ...'Counts', CountsValue, ...)
seqprofile(Seqs, ...'Gaps', GapsValue, ...)
seqprofile(Seqs, ...'Ambiguous', AmbiguousValue, ...)
seqprofile(Seqs, ...'Limits', LimitsValue, ...)

Arguments

Seqs

Set of multiply aligned sequences represented by any of the following:.

  • Array of strings

  • Cell array of strings

  • Array of structures containing the field Sequence

AlphabetValue

String specifying the sequence alphabet. Choices are:

  • 'NT' — Nucleotides

  • 'AA' — Amino acids (default)

  • 'none' — No alphabet

When Alphabet is 'none', the symbol list is based on the observed symbols. Each character can be any symbol, except for a hyphen (-) and a period (.), which are reserved for gaps.

CountsValue

Controls returning frequency (ratio of counts/total counts) or counts. Choices are true (counts) or false (frequency). Default is false.

GapsValue

String that controls the counting of gaps in a sequence. Choices are:

  • 'all' — Counts all gaps

  • 'noflanks' — Counts all gaps except those at the flanks of every sequence

  • 'none' — Default. Counts no gaps.

AmbiguousValue

Controls counting ambiguous symbols. Enter 'Count' to add partial counts to the standard symbols.

LimitsValue

Specifies whether to use part of the sequence. Enter a [1x2] vector with the first position and the last position to include in the profile. Default is [1,SeqLength].

Description

Profile = seqprofile(Seqs) returns Profile, a matrix of size [20 (or 4) x SequenceLength] with the frequency of amino acids (or nucleotides) for every column in the multiple alignment. The order of the rows is given by

[Profile, Symbols] = seqprofile(Seqs) returns Symbols, a unique symbol list where every symbol in the list corresponds to a row in Profile, the profile.

seqprofile(Seqs, ...'PropertyName', PropertyValue, ...) calls seqprofile with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:


seqprofile(Seqs, ...'Alphabet', AlphabetValue, ...)
selects a nucleotide alphabet, amino acid alphabet, or no alphabet.

seqprofile(Seqs, ...'Counts', CountsValue, ...) when Counts is true, returns the counts instead of the frequency.

seqprofile(Seqs, ...'Gaps', GapsValue, ...) appends a row to the bottom of a profile (Profile) with the count for gaps.

seqprofile(Seqs, ...'Ambiguous', AmbiguousValue, ...) when Ambiguous is 'count', counts the ambiguous amino acid symbols (B Z X) and nucleotide symbols (R Y K M S W B D H V N) with the standard symbols. For example, the amino acid X adds a 1/20 count to every row while the amino acid B counts as 1/2 at the D and N rows.

seqprofile(Seqs, ...'Limits', LimitsValue, ...) specifies the start and end positions for the profile relative to the indices of the multiple alignment.

Examples

   seqs = fastaread('pf00002.fa');
   [P,S] = seqprofile(seqs,'limits',[50 60],'gaps','all')

See Also

Bioinformatics Toolbox functions: fastaread, multialignread, multialignwrite, seqconsensus, seqdisp, seqlogo

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS