counting occurances of a specific character in a cell array

3 views (last 30 days)
Hi guys,
I want to count repeated occurances of characters in a cell array,
e.g.
AAA AAT AAG AAT AGC ACG
I want something to automatically identify and count the occurrences?
could anyone give me some help.
  1 Comment
Walter Roberson
Walter Roberson on 19 Jan 2013
So 'AAA' would be 3 'A's? Or do you mean that you want to count the number of 'AAA', the number of 'AAT', and so on?
The cell arrays: is there one entry per cell, or are they blank-separated strings that need to be broken up?

Sign in to comment.

Answers (1)

Cedric
Cedric on 19 Jan 2013
Edited: Cedric on 26 Jan 2013
Assuming that these are amino acids/codons (3 uppercase letters), here are three "not-very-orthodox" solutions, just for fun. But keep in mind that with bioinformatics being a hot topic, there are quite a few very specialized libs out there (e.g. http://www.mathworks.com/help/bioinfo/functionlist.html) that would do the job in a much better fashion. You might also get a more orthodox version from someone else once you answer Walter's comment.
Assuming, for the example (but it works for any cell array of 3 uppercase letters codes):
C = {'AAA','AAT','AAG','AAT','AGC','ACG'} ;
n = numel(C) ;
1. Probably the most efficient of these non-orthodox solutions (~0.58s for processing 1 million codons on my poor laptop):
D = accumarray([[C{:}]-64; reshape([1;1;1]*(1:n), 1, [])].', 1, [26 n]) ;
2. Closely followed by a "sparse" version:
D = sparse([C{:}]-64, reshape([1;1;1]*(1:n), 1, []), ones(1,3*n), 26, n) ;
3. And finally a much less efficient cell2mat/cellfun:
D = cell2mat(cellfun(@(code)accumarray(code.'-64, 1, [26,1]), C, ...
'UniformOutput', false)) ;
They all three produce a 26 x #codes matrix whose columns are the distributions of the 26 letters of the alphabet for each code, with row index = letter ID, A=1,..,Z=26. (the sparse version produces a sparse matrix) :
>> D
D =
3 2 2 2 1 1
0 0 0 0 0 0
0 0 0 0 1 1
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 1 0 1 1
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 1 0 1 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Note that the 3rd version doesn't assume 3 letters codes and would work with arbitrary codes lengths. The first 2 versions could be adapted to have this flexibility.
Cheers,
Cedric

Categories

Find more on Resizing and Reshaping Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!