blastncbi

Create remote NCBI BLAST report request ID or link to NCBI BLAST report

Syntax

blastncbi(Seq, Program)
RID = blastncbi(Seq, Program)
[RID, RTOE] = blastncbi(Seq, Program)

... blastncbi(Seq, Program, ...'Database', DatabaseValue, ...)
... blastncbi(Seq, Program, ...'Descriptions', DescriptionsValue, ...)
... blastncbi(Seq, Program, ...'Alignments', AlignmentsValue, ...)
... blastncbi(Seq, Program, ...'Filter', FilterValue, ...)
... blastncbi(Seq, Program, ...'Expect', ExpectValue, ...)
... blastncbi(Seq, Program, ...'Word', WordValue, ...)
... blastncbi(Seq, Program, ...'Matrix', MatrixValue, ...)
... blastncbi(Seq, Program, ...'GapOpen', GapOpenValue, ...)
... blastncbi(Seq, Program, ...'ExtendGap', ExtendGapValue, ...)
... blastncbi(Seq, Program, ...'Inclusion', InclusionValue, ...)
... blastncbi(Seq, Program, ...'Pct', PctValue, ...)

Arguments

Seq

Nucleotide or amino acid sequence specified by any of the following:

  • GenBank®, GenPept, or RefSeq accession number

  • GI sequence identifier

  • FASTA file

  • URL pointing to a sequence file

  • String

  • Character array

  • MATLAB® structure containing a Sequence field

Program

String specifying a BLAST program. Choices are:

  • 'blastn' — Search nucleotide query versus nucleotide database.

  • 'blastp' — Search protein query versus protein database.

  • 'blastx' — Search translated nucleotide query versus protein database.

  • 'megablast' — Quickly search for highly similar nucleotide sequences.

  • 'psiblast' — Search protein query using position-specific iterated BLAST.

  • 'tblastn' — Search protein query versus translated nucleotide database.

  • 'tblastx' — Search translated nucleotide query versus translated nucleotide database.

DatabaseValue

String specifying a database. Compatible databases depend on the type of sequence specified by Seq, and the program specified by Program.

Choices for nucleotide sequences are:

  • 'nr' (default)

  • 'refseq_rna'

  • 'refseq_genomic'

  • 'est'

  • 'est_human'

  • 'est_mouse'

  • 'est_others'

  • 'gss'

  • 'htgs'

  • 'pat'

  • 'pdb'

  • 'month'

  • 'alu_repeats'

  • 'dbsts'

  • 'chromosome'

  • 'wgs'

  • 'env_nt'

Choices for amino acid sequences are:

  • 'nr' (default)

  • 'refseq_protein'

  • 'swissprot'

  • 'pat'

  • 'month'

  • 'pdb'

  • 'env_nr'

DescriptionsValueValue specifying the number of short descriptions to include in the report. Default is 100, unless Program = 'psiblast', then default is 500.
AlignmentsValueValue specifying the number of sequences for which high-scoring sequence pairs (HSPs) are reported. Default is 100, unless Program = 'psiblast', then default is 500.
FilterValue

String specifying a filter. Possible choices are:

  • 'L' (default) — Low complexity

  • 'R' — Human repeats

  • 'm' — Mask for lookup table

  • 'lcase' — Turn on the lowercase mask

Choices vary depending on the selected Program. For more information, see the table Choices for Optional Properties by BLAST Program.

ExpectValueValue specifying the statistical significance threshold for matches against database sequences. Choices are any real number. Default is 10.
WordValue

Value specifying a word length for the query sequence.

Choices for amino acid sequences are:

  • 2

  • 3 (default)

Choices for nucleotide sequences are:

  • 7

  • 11 (default)

  • 15

Choices when Program = 'megablast' are:

  • 11

  • 12

  • 16

  • 20

  • 24

  • 28 (default)

  • 32

  • 48

  • 64

MatrixValue

String specifying the substitution matrix for amino acid sequences only. The matrix assigns the score for a possible alignment of any two amino acid residues. Choices are:

  • 'PAM30'

  • 'PAM70'

  • 'BLOSUM45'

  • 'BLOSUM62' (default)

  • 'BLOSUM80'

GapOpenValue

Either of the following:

  • Integer that specifies the penalty for opening a gap in the alignment of amino acid sequences.

  • Vector containing two integers: the first is the penalty for opening a gap, and the second is the penalty for extending the gap.

Choices and default depend on the substitution matrix specified by the 'Matrix' property. For more information, see the table Choices for the GapOpen Property by Matrix.

GapExtendValueInteger that specifies the penalty for extending a gap in the alignment of amino acid sequences. Choices and default depend on the substitution matrix specified by the 'Matrix' property. For more information, see the table Choices for the GapOpen Property by Matrix.
InclusionValueValue specifying the statistical significance threshold for including a sequence in the Position-Specific Score Matrix (PSSM) created by PSI-BLAST for the subsequent iteration. Default is 0.005.

    Note   Specify an InclusionValue only when Program = 'psiblast'.

PctValue

Value specifying the percent identity and the corresponding match and mismatch score for matching existing sequences in a public database. Choices are:

  • None

  • 99 (default) — 99, 1, -3

  • 9898, 1, -3

  • 9595, 1, -3

  • 9090, 1, -2

  • 8585, 1, -2

  • 8080, 2, -3

  • 7575, 4, -5

  • 6060, 1, -1

    Note   Specify a PctValue only when Program = 'megablast'.

Return Values

RIDRequest ID for the NCBI BLAST report.
RTOE

Request Time Of Execution, which is an estimate of the time (in minutes) until completion.

    Tip   Use this time estimate with the 'WaitTime' property when using the getblast function.

Description

The Basic Local Alignment Search Tool (BLAST) offers a fast and powerful comparative analysis of protein and nucleotide sequences against known sequences in online databases.

blastncbi(Seq, Program) sends a BLAST request to NCBI against a Seq, a nucleotide or amino acid sequence, using Program, a specified BLAST program, and then returns a command window link to the NCBI BLAST report. For help in selecting an appropriate BLAST program, visit:

http://www.ncbi.nlm.nih.gov/BLAST/producttable.shtml

RID = blastncbi(Seq, Program) returns RID, the Request ID for the report.

[RID, RTOE] = blastncbi(Seq, Program) returns both RID, the Request ID for the NCBI BLAST report, and RTOE, the Request Time Of Execution, which is an estimate of the time until completion.

... blastncbi(..., 'PropertyName', PropertyValue,...) calls blastncbi with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are explained below. Additional information on these optional properties can be found at:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastcgihelp_new.html


... blastncbi(Seq, Program, ...'Database', DatabaseValue, ...)
specifies a database for the alignment search. For help in selecting an appropriate database, visit:

http://www.ncbi.nlm.nih.gov/BLAST/producttable.shtml

... blastncbi(Seq, Program, ...'Descriptions', DescriptionsValue, ...) specifies the number of short descriptions to include in the report, when you do not specify return values.

... blastncbi(Seq, Program, ...'Alignments', AlignmentsValue, ...) specifies the number of sequences for which high-scoring segment pairs (HSPs) are reported, when you do not specify return values.

... blastncbi(Seq, Program, ...'Filter', FilterValue, ...) specifies the filter to apply to the query sequence.

... blastncbi(Seq, Program, ...'Expect', ExpectValue, ...) specifies a statistical significance threshold for matches against database sequences. Choices are any real number. Default is 10. You can learn more about the statistics of local sequence comparison at:

http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html#head2

... blastncbi(Seq, Program, ...'Word', WordValue, ...) specifies a word size for the query sequence.

... blastncbi(Seq, Program, ...'Matrix', MatrixValue, ...) specifies the substitution matrix for amino acid sequences only. This matrix assigns the score for a possible alignment of two amino acid residues.

... blastncbi(Seq, Program, ...'GapOpen', GapOpenValue, ...) specifies the penalty for opening a gap in the alignment of amino acid sequences.

Choices and default depend on the substitution matrix specified by the 'Matrix' property. For more information, see the table Choices for the GapOpen Property by Matrix.

For more information about allowed gap penalties for various matrices, see:

http://www.ncbi.nlm.nih.gov/blast/html/sub_matrix.html

... blastncbi(Seq, Program, ...'ExtendGap', ExtendGapValue, ...) specifies the penalty for extending a gap greater than one space in the alignment of amino acid sequences. Choices and default depend on the substitution matrix specified by the 'Matrix' property. For more information, see the table Choices for the GapOpen Property by Matrix.

... blastncbi(Seq, Program, ...'Inclusion', InclusionValue, ...) specifies the statistical significance threshold for including a sequence in the Position-Specific Score Matrix (PSSM) created by PSI-BLAST for the subsequent iteration. Default is 0.005.

... blastncbi(Seq, Program, ...'Pct', PctValue, ...) specifies the percent identity and the corresponding match and mismatch score for matching existing sequences in a public database. Default is 99.

Choices for Optional Properties by BLAST Program

When BLAST program is...Then choices for the following properties are...
DatabaseFilterWordMatrixGapOpenPct
'blastn''nr' (default)
'est'
'est_human'
'est_mouse'
'est_others'
'gss'
'htgs'
'pat'
'pdb'
'month'
'alu_repeats'
'dbsts'
'chromosome'
'wgs'
'refseq_rna'
'refseq_genomic'
'env_nt'
'L' (default)
'R'
'm'
'lcase'
7
11 (default)
15
'megablast''L'11
12
16
20
24
28 (default)
32
48
64
None
99 (default)
98
95
90
85
80
75
60
'tblastn''L' (default)
'm'
'lcase'
2
3 (default)
'PAM30'
'PAM70'
'BLOSUM45'
'BLOSUM62' (default)
'BLOSUM80'
See the next table.
'tblastx''L' (default)
'R'
'm'
'lcase'
'blastp''nr' (default)
'swissprot'
'pat'
'pdb'
'month'
'refseq_protein'
'env_nr'
'L' (default)
'm'
'lcase'
'blastx'
'psiblast'

Choices for the GapOpen Property by Matrix

When Substitution Matrix is...Then choices for GapOpen are...
'PAM30'[7 2]
[6 2]
[5 2]
[10 1]
[9 1] (default)
[8 1]
'PAM70'[8 2]
[7 2]
[6 2]
[11 1]
[10 1] (default)
[9 1]
'BLOSUM80'
'BLOSUM45'[13 3]
[12 3]
[11 3]
[10 3]
[15 2] (default)
[14 2]
[13 2]
[12 2]
[19 1]
[18 1]
[17 1]
[16 1]
'BLOSUM62'[9 2]
[8 2]
[7 2]
[12 1]
[11 1] (default)
[10 1]

Examples

% Get a sequence from the Protein Data Bank and create
% a MATLAB structure.
S = getpdb('1CIV')

% Use the structure as input for a BLAST search with an
% expectation of 1e-10.
blastncbi(S,'blastp','expect',1e-10)

% Click the URL link (Link to NCBI BLAST Request) to go
% directly to the NCBI request.

% You can also try a search directly with an accession 
% number and an alternative scoring matrix.
RID = blastncbi('AAA59174','blastp','matrix','PAM70,'...
                             'expect',1e-10)

% The results based on the RID are at
http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi

% or pass the RID to GETBLAST to parse the report and
% load it into a MATLAB structure.
Struct = getblast(RID)

References

[1] Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403–410.

[2] Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.

See Also

Bioinformatics Toolbox™ functions: blastformat, blastlocal, blastread, blastreadlocal, getblast

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS