seqprofile

Calculate sequence profile from set of multiply aligned sequences

Syntax

Profile = seqprofile(Seqs) [Profile, Symbols] = seqprofile(Seqs) seqprofile(Seqs, ...'Alphabet', AlphabetValue, ...) seqprofile(Seqs, ...'Counts', CountsValue, ...) seqprofile(Seqs, ...'Gaps', GapsValue, ...) seqprofile(Seqs, ...'Ambiguous', AmbiguousValue, ...) seqprofile(Seqs, ...'Limits', LimitsValue, ...)

Arguments

`Seqs`	Set of multiply aligned sequences represented by any of the following:. Character array Cell array of character vectors String vector Array of structures containing the field `Sequence`
`AlphabetValue`	Character vector or string specifying the sequence alphabet. Choices are: `'NT'` — Nucleotides `'AA'` — Amino acids (default) `'none'` — No alphabet When `Alphabet` is `'none'`, the symbol list is based on the observed symbols. Each character can be any symbol, except for a hyphen (-) and a period (.), which are reserved for gaps.
`CountsValue`	Controls returning frequency (ratio of counts/total counts) or counts. Choices are `true` (counts) or `false` (frequency). Default is `false`.
`GapsValue`	Character vector or string that controls the counting of gaps in a sequence. Choices are: `'all'` — Counts all gaps `'noflanks'` — Counts all gaps except those at the flanks of every sequence `'none'` — Default. Counts no gaps.
`AmbiguousValue`	Controls counting ambiguous symbols. Enter `'Count'` to add partial counts to the standard symbols.
`LimitsValue`	Specifies whether to use part of the sequence. Enter a `[1x2]` vector with the first position and the last position to include in the profile. Default is `[1,SeqLength]`.

Description

Profile = seqprofile(Seqs) returns Profile, a matrix of size [20 (or 4) x SequenceLength] with the frequency of amino acids (or nucleotides) for every column in the multiple alignment. The order of the rows is given by

4 nucleotides — A C G T/U
20 amino acids — A R N D C Q E G H I L K M F P S T W Y V

[Profile, Symbols] = seqprofile(Seqs) returns Symbols, a unique symbol list where every symbol in the list corresponds to a row in Profile, the profile.

seqprofile(Seqs, ...'PropertyName', PropertyValue, ...) calls seqprofile with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

seqprofile(Seqs, ...'Alphabet', AlphabetValue, ...) selects a nucleotide alphabet, amino acid alphabet, or no alphabet.

seqprofile(Seqs, ...'Counts', CountsValue, ...) when Counts is true, returns the counts instead of the frequency.

seqprofile(Seqs, ...'Gaps', GapsValue, ...) appends a row to the bottom of a profile (Profile) with the count for gaps.

seqprofile(Seqs, ...'Ambiguous', AmbiguousValue, ...) when Ambiguous is 'count', counts the ambiguous amino acid symbols (B Z X) and nucleotide symbols (R Y K M S W B D H V N) with the standard symbols. For example, the amino acid X adds a 1/20 count to every row while the amino acid B counts as 1/2 at the D and N rows.

seqprofile(Seqs, ...'Limits', LimitsValue, ...) specifies the start and end positions for the profile relative to the indices of the multiple alignment.

Examples

collapse all

Calculate Sequence Profile

Open Live Script

Create an array of structures representing a multiple alignment of amino acids:

seqs = fastaread('pf00002.fa');

Return the sequence profile and symbol list from position 50 through 55 of the set of multiply aligned sequences, counting all gaps.

[Profile2,Symbols2] = seqprofile(seqs,'limits',[50 55],'gaps','all')

Profile2 = 21×6

    0.0312    0.0312    0.1562    0.4375    0.1250    0.2188
         0         0    0.3750         0         0         0
         0         0    0.0938    0.1562         0         0
         0         0         0    0.0312         0         0
         0    0.0625         0         0    0.0312         0
         0         0         0    0.0312         0         0
         0         0         0    0.1250         0         0
    0.0312         0    0.0625         0         0         0
         0         0         0         0         0         0
    0.4688    0.0625         0         0    0.3125    0.1562
    0.1250    0.6250    0.0312         0    0.2188    0.1875
         0         0    0.1250    0.0312         0         0
    0.1250    0.0625         0         0         0    0.0312
    0.1562    0.0312         0         0    0.0312         0
         0         0         0         0         0         0
      ⋮

Symbols2 = 
'ARNDCQEGHILKMFPSTWYV-'

Version History

Introduced before R2006a