Count number of occurrences of word in sequence
Enter a nucleotide or amino acid sequence of characters. You can also enter a structure with the field Sequence.
Enter a short sequence of characters.
seqwordcount(Seq, Word) counts the number of times that a word appears in a sequence, and then returns the number of occurrences of that word.
If Word contains nucleotide or amino acid symbols that represent multiple possible symbols (ambiguous characters), then seqwordcount counts all matches. For example, the symbol R represents either G or A (purines). For another example, if word equals 'ART', then seqwordcount counts occurrences of both 'AAT' and 'AGT'.
seqwordcount does not count overlapping patterns multiple times. In the following example, seqwordcount reports three matches. TATATATA is counted as two distinct matches, not three overlapping occurrences.
seqwordcount('GCTATAACGTATATATAT','TATA') ans = 3
The following example reports two matches ('TAGT' and 'TAAT'). B is the ambiguous code for G, T, or C, while R is an ambiguous code for G and A.
seqwordcount('GCTAGTAACGTATATATAAT','BART') ans = 2