How to obtain InsertSize from a BioMap object?

Is there a way of obtaining the InsertSize field (available using bamread) when opening a bam file as a BioMap object? Opening the bam files using a BioMap object is much faster compared to the bamread function, but I can't find any way of obtaining the InsertSize (TLEN from the corresponding sam file)...

Answers (2)

If you use something like:
BAMStruct = bamread(File,RefSeq,Range)
You get a struct. from this you can obtain the InsertSize by using something like:
x = BAMStruct.InsertSize
In the examples of the help doc (which are here) you will find something similar; they show an example how they extract instead of the 'InsertSize' the 'Position' by using the following:
data_one = bamread('ex1.bam','seq1', [100 300]);
data_one(27).Position
Good luck!

3 Comments

Thanks! I know that bamread can get the InsertSize field from the bam files, but my question was how to obtain the similar information if I want to use the much faster method of opening the bam file as a BioMap object? bamread takes forever to open big bam files. By the time bamread opens the bam file, I can open it as a BioMap object, match the headers of the forward/reverse paired ends and compute the fragment lengths. But that is also not optimal, because sorting millions of headers is not that fast either. What I would like to know is how to access the TLEN(InsertSize) field from a sam/bam files using the faster method of BioMap objects (instead of using the slow samread/bamread functions)
Can you provide a .bam file for me and you code? I'll give it a try!
Razvan
Razvan on 31 Jan 2015
Edited: Razvan on 7 Feb 2015
Thanks for spending your time on this! Here is a BAM file and some Matlab code that illustrates my problem: https://app.box.com/s/e3sjphe802z616gzwzelse0o89py6vlm
If you run my script from that folder, you'll see the 2 ways that I know to open a bam file and to compute the fragment lengths. Using bamread, the TLEN field from the SAM/BAM file is easily accessible. Using BioMap objects, the TLEN/InsertSize information is not available anymore and one needs to match the headers of the reads in order to compute to total fragment length. The problem is that bamread is very slow in comparison with BioMap (3s vs. 0.2s), and sorting the headers of millions of reads is also pretty slow. It would be much better if BioMap would simply load the TLEN field of the SAM/BAM files. This is available information stored in the SAM/BAM file which is simply disregarded by BioMap (as far as I know). Any chance of getting this TLEN info using the faster BioMap loading function?

Sign in to comment.

Asked:

on 26 Jan 2015

Edited:

on 7 Feb 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!