File Exchange

image thumbnail

Find an ungapped pattern window from a set of protein sequences

version 1.1 (2.94 MB) by

This program is to find an ungapped pattern window of certain width from a set of protein sequences

1 Download

Updated

View License

This program is a bioinformatics tool developed for helping biologists finding patterns from a set of protein sequences. The method is the first one that fully utilizes the advantages of the Dirichlet mixture models. It starts from a random pattern and iteratively improves the Bayesian log-odds ratio score as the pattern is updated. When the score cannot be significantly improved, the algorithm terminates and returns a pattern window of pre-specified length. The resulting pattern can be used as a starting point for later refined alignment through introducing gaps. We are developing the more advanced version that can introduce gaps into the pattern. We believe the current ungapped version is already very helpful for identifying conserved regions of the protein sequences. It is a useful tool that can save a certain amount of manual work in the pattern discovery.

To use the c program, first compile it using mex in linux/unix, then run the demo script gibbs_script_4_1.m. You can manipulate the demo script for your needs.

Comments and Ratings (20)

Tammy Tatley

It's very useful for my research at Fox Chase Cancer Center! I hope more people in this field will know this to help their research.

Ying

Ying (view profile)

Prandtl

Ying

Ying (view profile)

Ying

Ying (view profile)

Very useful code. Thanks a lot!

Ying

Ying (view profile)

Daisy

Daisy (view profile)

Quan Zhang

I am happy to find this software. It is very useful.

Jerry

Jerry (view profile)

Great. I am looking for such a tool for a long time. Thank you for sharing, Xugang

cathy

cathy (view profile)

wonderful~ I have already look it up for a long time. Thank you. It's very useful.

Prandtl

Looks very nice!, the demo looks beautiful. - P.L.

Xugang Ye

Dear Professor Prandtl,

I used a 20-component Dirichlet mixture prior that is provided by UCSC, here is the website:

http://compbio.soe.ucsc.edu/dirichlets/index.html

The prior as defult in this program is called "recode4.20comp". But the order of the amino acid letter is different, they use "ACDEFGHIKLMNPQRSTVWY", but I use "ARNDCQEGHILKMFPSTWYV". Make sure the prior and the order of letters are consistent.

Xugang

Prandtl

Hello,

I am a faculty member at the University of Sheffield, U.K., I found your program is very intersting. I have a question on the prior you choose. Can I choose different priors?

- P.L.

Fantastic! It is exactly what I am looking for! Thank you!

Judy

Judy (view profile)

JHMI (Johns Hopkins Medical Institutions)

Xugang Ye

Hi, Judy,

Yes. By the way, Could you let me know what Lab are you working for? Thanks.

Xugang

Judy

Judy (view profile)

Hi, Xugang,

Thanks. Should I put my sequence data in fasta format into the subfolder called "data"?

Judy

Xugang Ye

Hi, Judy,

Thanks for your interest of using the codes. First, upload the folder to your server if you have not done so. Then enter the sub-folder codes. And the .c file you mentioned is find_patternwindow_v4_1.c
that's a computing function written in c language. Other than directly using the usual gcc compiler, you need to use mex to compile it so that the function can be called by your matlab scripts.
type "mex -setup", then you may be given several options. In linux/unix I recommanded, you just chose the first option that use gcc-mex compiler. When you are asked whether to overwrite the file mexopts.sh, you answer is yes. Then the next is simply to type

mex find_patternwindow_v4_1.c

you will find that an executable file

find_patternwindow_v4_1.mexa64

is created. Then you have matlab function
find_patternwindow_v4_1() to use.

Feel free to ask any question

Xugang

Judy

Judy (view profile)

Hello, Xugang,

I am trying to use your codes, how to compile the .c file? thanks

Judy

Xugang Ye

Fantastic work! I am a researcher at the National Institutes of Health, Bethesda, MD, I found this program is very useful to my research in sequence-based domain detection.

Updates

1.1

correted a typo (gapps -> gaps) in the description

MATLAB Release
MATLAB 7.8 (R2009a)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video

pattern_finding/codes/