The Bioinformatics Toolbox™ accesses many of the databases on the Web and other online data sources. It allows you to copy data into the MATLAB® Workspace, and read and write to files with standard bioinformatic formats. It also reads many common genome file formats, so that you do not have to write and maintain your own file readers.
Web-based databases — You can directly access public databases on the Web and copy sequence and gene expression information into the MATLAB environment.
The sequence databases currently supported are GenBank® (
getgenbank), GenPept (
getgenpept), European Molecular Biology
Laboratory (EMBL) (
and Protein Data Bank (PDB) (
You can also access data from the NCBI Gene Expression Omnibus (GEO)
Web site by using a single function (
Gene Ontology database —
Load the database from the Web into a gene ontology object (
sections of the ontology with methods for the geneont object (
and manipulate data with utility functions (
Reading data formats — The toolbox provides a number of functions for reading data from common bioinformatic file formats.
Multiply aligned sequences: ClustalW and GCG formats
Gene expression data from microarrays: Gene Expression
Omnibus (GEO) data (
geosoftread), GenePix® data
in GPR and GAL files (
galread), SPOT data (
sptread), Affymetrix® GeneChip® data
affyread), and ImaGene® results
Hidden Markov model profiles: PFAM-HMM file (
Writing data formats —
The functions for getting data from the Web include the option to
save the data to a file. However, there is a function to write data
to a file using the FASTA format (
The MATLAB environment has built-in support for other industry-standard file formats including Microsoft® Excel® and comma-separated-value (CSV) files. Additional functions perform ASCII and low-level binary I/O, allowing you to develop custom functions for working with any data format.