fastaread - Read data from FASTA file

Syntax

FASTAData = fastaread(File)
[Header, Sequence] = fastaread(File)

... = fastaread(File, ...'IgnoreGaps', IgnoreGapsValue, ...)
... = fastaread(File, ...'Blockread', BlockreadValue, ...)

Arguments

FileFASTA-formatted file (ASCII text file). Enter a file name, a path and file name, or a URL pointing to a file. File can also be a MATLAB® character array that contains the text for a file name.
IgnoreGapsValueProperty to control removing gap symbols. Enter either true or false (default).
BlockreadValueProperty to control reading a single entry or block of entries from a file containing multiple sequences. Enter a scalar N, to read the Nth entry in the file. Enter a 1-by-2 vector [M1, M2], to read the block of entries starting at entry M1 and ending at entry M2. To read all remaining entries in the file starting at entry M1, enter a positive value for M1 and enter Inf for M2.

Return Values

FASTADataMATLAB structure with the fields Header and Sequence.

Description

fastaread reads data from a FASTA-formatted file into a MATLAB structure with the following fields.

Field
Header
Sequence

A file with a FASTA format begins with a right angle bracket (>) and a single line description. Following this description is the sequence as a series of lines with fewer than 80 characters. Sequences are expected to use the standard IUB/IUPAC amino acid and nucleotide letter codes.

For a list of codes, see aminolookup and baselookup.

FASTAData = fastaread(File) reads a file with a FASTA format and returns the data in a structure. FASTAData.Header is the header information, while FASTAData.Sequence is the sequence stored as a string of letters.

[Header, Sequence] = fastaread(File) reads data from a file into separate variables. If the file contains more than one sequence, then header and sequence are cell arrays of header and sequence information.

... = fastaread(File, ...'PropertyName', PropertyValue, ...) calls fastaread with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. The property name/value pairs can be in any format supported by the function set (for example, name-value string pairs, structures, and name-value cell array pairs). These property name/property value pairs are as follows:


... = fastaread(File, ...'IgnoreGaps', IgnoreGapsValue, ...)
, when IgnoreGapsValue is true, removes any gap symbol ('-' or '.') from the sequences. Default is false.

... = fastaread(File, ...'Blockread', BlockreadValue, ...) lets you read in a single entry or block of entries from a file containing multiple sequences. If BlockreadValue is a scalar N, then fastaread reads the Nth entry in the file. If BlockreadValue is a 1-by-2 vector [M1, M2], then fastaread reads the block of entries starting at entry M1 and ending at entry M2. To read all remaining entries in the file starting at entry M1, enter a positive value for M1 and enter Inf for M2.

Examples

Read the sequence for the human p53 tumor gene.

p53nt = fastaread('p53nt.txt')

Read the sequence for the human p53 tumor protein.

p53aa = fastaread('p53aa.txt')

Read the human mitochondrion genome in FASTA format.

entrezSite = 'http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?'
textOptions = '&txt=on&view=fasta'
genbankID = '&list_uids=NC_001807'
mitochondrion = fastaread([entrezSite textOptions genbankID])

See Also

Bioinformatics Toolbox™ functions: emblread, fastawrite, genbankread, genpeptread, multialignread, seqprofile, seqtool

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS