understanding the bioinformatic tool

2 views (last 30 days)
Dear Matlab Team, I am really enjoying Matlab and your internet help. I am corrently working with http://www.mathworks.com/help/pdf_doc/bioinfo/bioinfo_ug.pdf
In page 2-77 I used two lines I am not sure about: fow_idx = find(~bitget(getFlag(bm1_filtered),5)); rev_idx = find(bitget(getFlag(bm1_filtered),5));
it says there: "get the indices for the forward and the reverse reads in each pair. This information is captured in the fifth bit of the flag field"
in the end of this page an example is shown (and I got the same): SRR054715.sra.6849385 163 20 60 40M AACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAA SRR054715.sra.6849385 83 229 60 40M CCTATTTCTTGTGGTTTTCTTTCCTTCACTTAGCTATGG SRR054715.sra.6992346 99 20 60 40M AACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAA SRR054715.sra.6992346 147 239 60 40M GTGGTTTTCTTTCCTTCACTTAGCTATGGATGGTTTAT
when looking at the flags: the forward index include: 163 and 99 and the reverse: 83 and 147.
using: http://picard.sourceforge.net/explain-flags.html If I understand correctly the forward should include: 99 and 83 that are the firsts in the pair and not 163?
I think that the correct bit for the forward should be 7 or 8 and not 5?
Maybe I don't understand something, I will thank you to correct my way of thinking. Best, yishai

Accepted Answer

Joe Myint
Joe Myint on 27 Mar 2013
Edited: Joe Myint on 27 Mar 2013
Hello Yishai:
The fifth bit of SAM flag indicates whether a read is mapped to reverse strand or forward strand. If the bit is 1, then the read is mapped to the reverse strand, and 0 if the read is mapped to the forward strand.
Getting back to the two commands that you weren’t sure about:
|fow_idx = find(~bitget(getFlag(bm1_filtered),5));|
This command returns indices of all reads that are mapped to the forward strand.
|rev_idx = find(bitget(getFlag(bm1_filtered),5));|
This command returns indices all reads that are mapped to the reverse strand.
In this pair-end ChIP-Seq data, the firsts in pairs are not necessarily mapped to the forward strand only. In fact, the reads (and mates) can be mapped to both forward and reverse strands. In our example, we have found the following pairs:
Flag 99’s are paired to 147’s and vice versa
The read with flag 99 is mapped to the forward strand, and the mate is mapped to the reverse strand. The read appears in the first input file (first in pair) which belongs to the 5’ reads. The read with flag 147 is mapped to the reverse strand, and the mate is mapped to the forward strand. The read appears in the second input file (second in pair) which belongs to the 3’ reads.
Flag 83’s are paired to 163’s and vice versa
The read with flag 83 is mapped to the reverse strand, and the mate is mapped to the forward strand. The read appears in the first input file (first in pair) which belongs to the 5’ reads. The read with flag 163 is mapped to the forward strand, and the mate is mapped to the reverse strand. The read appears in the second input file (second in pair) which belongs to the 3’ reads.
Hope it helps,
Joe
  1 Comment
yishaiy
yishaiy on 3 Apr 2013
Hello Joe, Thanks for your answer. I still want to be sure that I understood: to show the pairs arranged by the 5' / 3' the right arrangement is:
SRR054715.sra.6849385 83 229 60 40M CCTATTTCTTGTGGTTTTCTTTCCTTCACTTAGCTATGG SRR054715.sra.6849385 163 20 60 40M AACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAA SRR054715.sra.6992346 99 20 60 40M AACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAA SRR054715.sra.6992346 147 239 60 40M GTGGTTTTCTTTCCTTCACTTAGCTATGGATGGTTTAT
I should use: fow_idx = find(bitget(getFlag(bm1_filtered),*7*));
that will give the indices of the reads that were mapped first.
and the current script gives the pairs arranged by their strand with no connection to 5' or 3', like this: SRR054715.sra.6849385 163 20 60 40M AACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAA SRR054715.sra.6849385 83 229 60 40M CCTATTTCTTGTGGTTTTCTTTCCTTCACTTAGCTATGG SRR054715.sra.6992346 99 20 60 40M AACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAA SRR054715.sra.6992346 147 239 60 40M GTGGTTTTCTTTCCTTCACTTAGCTATGGATGGTTTAT
Is it correct? thanks again, Yishai

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!