Code covered by the BSD License  

Highlights from
Find an ungapped pattern window from a set of protein sequences


5.0 | 13 ratings Rate this file 11 Downloads (last 30 days) File Size: 2.94 MB File ID: #34083
image thumbnail

Find an ungapped pattern window from a set of protein sequences



05 Dec 2011 (Updated )

This program is to find an ungapped pattern window of certain width from a set of protein sequences

| Watch this File

File Information

This program is a bioinformatics tool developed for helping biologists finding patterns from a set of protein sequences. The method is the first one that fully utilizes the advantages of the Dirichlet mixture models. It starts from a random pattern and iteratively improves the Bayesian log-odds ratio score as the pattern is updated. When the score cannot be significantly improved, the algorithm terminates and returns a pattern window of pre-specified length. The resulting pattern can be used as a starting point for later refined alignment through introducing gaps. We are developing the more advanced version that can introduce gaps into the pattern. We believe the current ungapped version is already very helpful for identifying conserved regions of the protein sequences. It is a useful tool that can save a certain amount of manual work in the pattern discovery.

To use the c program, first compile it using mex in linux/unix, then run the demo script gibbs_script_4_1.m. You can manipulate the demo script for your needs.

Required Products Bioinformatics Toolbox
Statistics Toolbox
MATLAB release MATLAB 7.8 (R2009a)
Other requirements mex compiler in linux/unix; to have better view of the results, run win32 and then run matlab
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (20)
08 Jan 2013 Tammy Tatley

It's very useful for my research at Fox Chase Cancer Center! I hope more people in this field will know this to help their research.

16 Dec 2011 Ying  
16 Dec 2011 Prandtl  
15 Dec 2011 Ying  
15 Dec 2011 Ying

Very useful code. Thanks a lot!

15 Dec 2011 Ying  
14 Dec 2011 Daisy  
08 Dec 2011 Quan Zhang

I am happy to find this software. It is very useful.

08 Dec 2011 Jerry

Great. I am looking for such a tool for a long time. Thank you for sharing, Xugang

06 Dec 2011 cathy

wonderful~ I have already look it up for a long time. Thank you. It's very useful.

06 Dec 2011 Prandtl

Looks very nice!, the demo looks beautiful. - P.L.

06 Dec 2011 Xugang Ye

Dear Professor Prandtl,

I used a 20-component Dirichlet mixture prior that is provided by UCSC, here is the website:

The prior as defult in this program is called "recode4.20comp". But the order of the amino acid letter is different, they use "ACDEFGHIKLMNPQRSTVWY", but I use "ARNDCQEGHILKMFPSTWYV". Make sure the prior and the order of letters are consistent.


06 Dec 2011 Prandtl


I am a faculty member at the University of Sheffield, U.K., I found your program is very intersting. I have a question on the prior you choose. Can I choose different priors?

- P.L.

05 Dec 2011 Vicky Johnson

Fantastic! It is exactly what I am looking for! Thank you!

05 Dec 2011 Judy

JHMI (Johns Hopkins Medical Institutions)

05 Dec 2011 Xugang Ye

Hi, Judy,

Yes. By the way, Could you let me know what Lab are you working for? Thanks.


05 Dec 2011 Judy

Hi, Xugang,

Thanks. Should I put my sequence data in fasta format into the subfolder called "data"?


05 Dec 2011 Xugang Ye

Hi, Judy,

Thanks for your interest of using the codes. First, upload the folder to your server if you have not done so. Then enter the sub-folder codes. And the .c file you mentioned is find_patternwindow_v4_1.c
that's a computing function written in c language. Other than directly using the usual gcc compiler, you need to use mex to compile it so that the function can be called by your matlab scripts.
type "mex -setup", then you may be given several options. In linux/unix I recommanded, you just chose the first option that use gcc-mex compiler. When you are asked whether to overwrite the file, you answer is yes. Then the next is simply to type

mex find_patternwindow_v4_1.c

you will find that an executable file


is created. Then you have matlab function
find_patternwindow_v4_1() to use.

Feel free to ask any question


05 Dec 2011 Judy

Hello, Xugang,

I am trying to use your codes, how to compile the .c file? thanks


05 Dec 2011 Xugang Ye

Fantastic work! I am a researcher at the National Institutes of Health, Bethesda, MD, I found this program is very useful to my research in sequence-based domain detection.

05 Dec 2011

correted a typo (gapps -> gaps) in the description

Contact us