Skip to Main Content Skip to Search
Product Documentation

fastaread - Read data from FASTA file

Syntax

FASTAData = fastaread(File)
[Header, Sequence] = fastaread(File)

... = fastaread(File, ...'IgnoreGaps', IgnoreGapsValue, ...)
... = fastaread(File, ...'Blockread', BlockreadValue, ...)
... = fastaread(File, ...'TrimHeaders', TrimHeadersValue, ...)

Input Arguments

File

Either of the following:

  • String specifying a file name, a path and file name, or a URL pointing to a file. The referenced file is a FASTA-formatted file (ASCII text file). If you specify only a file name, that file must be on the MATLAB search path or in the MATLAB Current Folder.

  • MATLAB character array that contains the text of a FASTA-formatted file.

IgnoreGapsValueControls the removal of gap symbols. Choices are true or false (default).
BlockreadValueScalar or vector that controls the reading of a single sequence entry or block of sequence entries from a FASTA-formatted file containing multiple sequences. Enter a scalar N to read the Nth entry in the file. Enter a 1-by-2 vector [M1, M2] to read the block of entries starting at the M1 entry and ending at the M2 entry. To read all remaining entries in the file starting at the M1 entry, enter a positive value for M1 and enter Inf for M2.
TrimHeadersValue

Specifies whether to trim the header after the first white space character. White space characters include a space (char(32)) and a tab (char(9)). Choices are true or false (default).

Output Arguments

FASTADataMATLAB structure with the fields Header and Sequence.

Description

fastaread reads data from a FASTA-formatted file into a MATLAB structure with the following fields.

FieldDescription
HeaderHeader information.
SequenceSingle letter-code representation of a nucleotide sequence.

A FASTA-formatted file begins with a right angle bracket (>) and a single line description. Following this description is the sequence as a series of lines with fewer than 80 characters. Sequences must use the standard IUB/IUPAC amino acid and nucleotide letter codes.

For a list of codes, see aminolookup and baselookup.

FASTAData = fastaread(File) reads a FASTA-formatted file and returns the data in a structure. FASTAData.Header is the header information, while FASTAData.Sequence is the sequence stored as a string of letters.

[Header, Sequence] = fastaread(File) reads data from a file into separate variables. If the file contains multiple sequences, then Header and Sequence are cell arrays of header and sequence information.

... = fastaread(File, ...'PropertyName', PropertyValue, ...) calls fastaread with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. The property name/value pairs can be in any format supported by the function set (for example, name-value string pairs, structures, and name-value cell array pairs). These property name/property value pairs are as follows:


... = fastaread(File, ...'IgnoreGaps', IgnoreGapsValue, ...)
, when IgnoreGapsValue is true, removes any gap symbol ('-' or '.') from the sequences. Default is false.

... = fastaread(File, ...'Blockread', BlockreadValue, ...) lets you read in a single sequence entry or block of sequence entries from a file containing multiple sequences. If BlockreadValue is a scalar N, then fastaread reads the Nth entry in the file. If BlockreadValue is a 1-by-2 vector [M1, M2], then fastaread reads the block of entries starting at the M1 entry and ending at the M2 entry. To read all remaining entries in the file starting at the M1 entry, enter a positive value for M1 and enter Inf for M2.

... = fastaread(File, ...'TrimHeaders', TrimHeadersValue, ...) specifies whether to trim the header to the first white space.

Examples

Read the sequence for the human p53 tumor gene:

p53nt = fastaread('p53nt.txt')

Read the sequence for the human p53 tumor protein:

p53aa = fastaread('p53aa.txt')

Read a block of entries from a FASTA file:

% Read the contents of reads 5 through 10 into an array of 
% structures
pf2_5_10 = fastaread('pf00002.fa', 'blockread', [5 10], ...
                     'ignoregaps',true)
pf2_5_10 = 

6x1 struct array with fields:
    Header
    Sequence

See Also

aminolookup | baselookup | BioIndexedFile | emblread | fastainfo | fastawrite | fastqinfo | fastqread | fastqwrite | genbankread | genpeptread | multialignread | saminfo | samread | seqprofile | seqtool | sffinfo | sffread

  


Free Computational Biology Interactive Kit

See how to analyze, visualize, and model biological data and systems using MathWorks products.

Get free kit

Trials Available

Try the latest computational biology products.

Get trial software
 © 1984-2012- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS