Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

getgenpept

Retrieve sequence information from GenPept database

Syntax

Data = getgenpept(AccessionNumber)
getgenpept(AccessionNumber)
Data = getgenpept(..., 'PartialSeq', PartialSeqValue, ...)
Data = getgenpept(..., 'ToFile', ToFileValue, ...)
Data = getgenpept(..., 'FileFormat', FileFormatValue, ...)
Data = getgenpept(..., 'SequenceOnly', SequenceOnlyValue, ...)

Arguments

AccessionNumber Character vector specifying a unique alphanumeric identifier for a sequence record.
PartialSeqValueTwo-element array of integers containing the start and end positions of the subsequence [StartAA, EndAA] that specifies a subsequence to retrieve. StartAA is an integer between 1 and EndAA; EndAA is an integer between StartAA and the length of the sequence.
ToFileValue Character vector specifying either a file name or a path and file name for saving the GenPept data. If you specify only a file name, the file is saved to the MATLAB® Current Folder.
FileFormatValueCharacter vector specifying the format for the sequence information. Choices are:
  • 'Genpept' — Default when SequenceOnlyValue is false.

  • 'FASTA' — Default when SequenceOnlyValue is true.

When 'FASTA', then Data contains only two fields, Header and Sequence.

SequenceOnlyValue

Controls the return of only the sequence as a character array. Choices are true or false (default).

Description

getgenpept retrieves a protein (amino acid) sequence information from the GenPept database, which is a translation of the nucleotide sequences in the GenBank® database and is maintained by the National Center for Biotechnology Information (NCBI).

Note

NCBI has changed the name of their protein search engine from GenPept to Entrez Protein. However, the function names in the Bioinformatics Toolbox™ software (getgenpept and genpeptread) are unchanged representing the still-used GenPept report format. For more information on GenPept data, visit https://www.ncbi.nlm.nih.gov/home/about/policies.shtml.

Data = getgenpept(AccessionNumber) searches for the accession number in the GenPept database and returns Data, a MATLAB structure containing information for the sequence.

Tip

If an error occurs while retrieving the GenPept-formatted information, try rerunning the query. Errors can occur due to Internet connectivity issues that are unrelated to the GenPept record.

getgenpept(AccessionNumber) displays information in the MATLAB Command Window without returning data to a variable. The displayed information is only hyperlinks to the URLs used to search for and retrieve the data.

getgenpept(..., 'PropertyName', PropertyValue, ...) calls getgenpept with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

Data = getgenpept(..., 'PartialSeq', PartialSeqValue, ...) returns the specified subsequence in the Sequence field of the MATLAB structure. PartialSeqValue is a two-element array of integers containing the start and end positions of the subsequence [StartAA, EndAA]. StartAA is an integer between 1 and EndAA; EndAA is an integer between StartAA and the length of the sequence.

Data = getgenpept(..., 'ToFile', ToFileValue, ...) saves the data returned from the GenPept database to a file. ToFileValue is a character vector specifying either a file name or a path and file name for saving the GenPept data. If you specify only a file name, the file is saved to the MATLAB Current Folder.

Tip

You can read a GenPept-formatted file back into MATLAB using the genpeptread function.

Tip

To append GenPept data to an existing file, specify that file name, and the data will be added to the end of the file.

If you are using getgenpept in a script, you can disable the append warning message by entering the following command lines before the getgenpept command:

warnState = warning %Save the current warning state
warning('off','Bioinfo:getncbidata:AppendToFile'); 
Then enter the following command line after the getgenpept command:
warning(warnState) %Reset warning state to previous settings

Data = getgenpept(..., 'FileFormat', FileFormatValue, ...) returns the sequence in the specified format. Choices are 'GenPept' or 'FASTA'. When 'FASTA', then Data contains only two fields, Header and Sequence. 'GenPept' is the default when SequenceOnlyValue is false. 'FASTA' is the default when SequenceOnlyValue is true.

Data = getgenpept(..., 'SequenceOnly', SequenceOnlyValue, ...) returns only the sequence in Data, a character array. Choices are true or false (default).

Note

If you use the 'SequenceOnly' and 'ToFile' properties together, the output is always a FASTA-formatted file.

Examples

Example 37. Retrieving a Peptide Sequence

To retrieve the sequence for the human insulin receptor and store it in a structure, Seq, in the MATLAB Command Window, type:

Seq = getgenpept('AAA59174')

Seq = 

                LocusName: 'AAA59174'
      LocusSequenceLength: '1382'
     LocusNumberofStrands: ''
            LocusTopology: 'linear'
        LocusMoleculeType: ''
     LocusGenBankDivision: 'PRI'
    LocusModificationDate: '06-JAN-1995'
               Definition: 'insulin receptor precursor.'
                Accession: 'AAA59174'
                  Version: 'AAA59174.1'
                       GI: '307070'
                  Project: []
                 DBSource: 'locus HUMINSR accession M10051.1'
                 Keywords: ''
                   Source: 'Homo sapiens (human)'
           SourceOrganism: [4x65 char]
                Reference: {[1x1 struct]}
                  Comment: [14x67 char]
                 Features: [40x64 char]
                 Sequence: [1x1382 char]
                SearchURL: [1x104 char]
              RetrieveURL: [1x92 char]
Example 38. Retrieving a Partial Peptide Sequence

By looking at the Features field of the structure, you can determine that the furin-like repeats domain is positions 234 through 281. To retrieve only the furin-like repeats domain from the sequence for the human insulin receptor and store it in a structure, Fur, in the MATLAB Command Window, type:

Fur = getgenpept('AAA59174','PARTIALSEQ',[234,281]);

Introduced before R2006a

Was this topic helpful?