Documentation Center

  • Trial Software
  • Product Updates

getpdb

Retrieve protein structure data from Protein Data Bank (PDB) database

Syntax

PDBStruct = getpdb(PDBid)

PDBStruct = getpdb(PDBid, ...'ToFile', ToFileValue, ...)
PDBStruct = getpdb(PDBid, ...'SequenceOnly', SequenceOnlyValue, ...)

Input Arguments

PDBidString specifying a unique identifier for a protein structure record in the PDB database.

    Note:   Each structure in the PDB database is represented by a four-character alphanumeric identifier. For example, 4hhb is the identifier for hemoglobin.

ToFileValueString specifying a file name or a path and file name for saving the PDB-formatted data. If you specify only a file name, that file will be saved in the MATLAB® Current Folder.

    Tip   After you save the protein structure record to a local PDB-formatted file, you can use the pdbread function to read the file into the MATLAB software offline or use the molviewer function to display and manipulate a 3-D image of the structure.

SequenceOnlyValueControls the return of the protein sequence only. Choices are true or false (default).

If there is one sequence, it is returned as a character array. If there are multiple sequences, they are returned as a cell array.

Output Arguments

PDBStructMATLAB structure containing a field for each PDB record.

Description

The Protein Data Bank (PDB) database is an archive of experimentally determined 3-D biological macromolecular structure data. For more information about the PDB format, see:

http://www.wwpdb.org/documentation/format23/v2.3.html

getpdb retrieves protein structure data from the Protein Data Bank (PDB) database, which contains 3-D biological macromolecular structure data.

PDBStruct = getpdb(PDBid) searches the PDB database for the protein structure record specified by the identifier PDBid and returns the MATLAB structure PDBStruct, which contains a field for each PDB record. The following table summarizes the possible PDB records and the corresponding fields in the MATLAB structure PDBStruct:

PDB Database RecordField in the MATLAB Structure
HEADERHeader
OBSLTEObsolete
TITLETitle
CAVEATCaveat
COMPNDCompound
SOURCESource
KEYWDSKeywords
EXPDTAExperimentData
AUTHORAuthors
REVDATRevisionDate
SPRSDESuperseded
JRNLJournal
REMARK 1Remark1
REMARK N

    Note:   N equals 2 through 999.

Remarkn

    Note:   n equals 2 through 999.

DBREFDBReferences
SEQADVSequenceConflicts
SEQRESSequence
FTNOTEFootnote
MODRESModifiedResidues
HETHeterogen
HETNAMHeterogenName
HETSYNHeterogenSynonym
FORMULFormula
HELIXHelix
SHEETSheet
TURNTurn
SSBONDSSBond
LINKLink
HYDBNDHydrogenBond
SLTBRGSaltBridge
CISPEPCISPeptides
SITESite
CRYST1Cryst1
ORIGXnOriginX
SCALEnScale
MTRIXnMatrix
TVECTTranslationVector
MODELModel
ATOMAtom
SIGATMAtomSD
ANISOUAnisotropicTemp
SIGUIJAnisotropicTempSD
TERTerminal
HETATMHeterogenAtom
CONECTConnectivity

PDBStruct = getpdb(PDBid, ...'PropertyName', PropertyValue, ...) calls getpdb with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:


PDBStruct = getpdb(PDBid, ...'ToFile', ToFileValue, ...)
saves the data returned from the database to a PDB-formatted file, ToFileValue.

    Tip   After you save the protein structure record to a local PDB-formatted file, you can use the pdbread function to read the file into the MATLAB software offline or use the molviewer function to display and manipulate a 3-D image of the structure.

PDBStruct = getpdb(PDBid, ...'SequenceOnly', SequenceOnlyValue, ...) controls the return of the protein sequence only. Choices are true or false (default). If there is one sequence, it is returned as a character array. If there are multiple sequences, they are returned as a cell array.

The Sequence Field

The Sequence field is also a structure containing sequence information in the following subfields:

  • NumOfResidues

  • ChainID

  • ResidueNames — Contains the three-letter codes for the sequence residues.

  • Sequence — Contains the single-letter codes for the sequence residues.

    Note:   If the sequence has modified residues, then the ResidueNames subfield might not correspond to the standard three-letter amino acid codes. In this case, the Sequence subfield will contain the modified residue code in the position corresponding to the modified residue. The modified residue code is provided in the ModifiedResidues field.

The Model Field

The Model field is also a structure or an array of structures containing coordinate information. If the MATLAB structure contains one model, the Model field is a structure containing coordinate information for that model. If the MATLAB structure contains multiple models, the Model field is an array of structures containing coordinate information for each model. The Model field contains the following subfields:

  • Atom

  • AtomSD

  • AnisotropicTemp

  • AnisotropicTempSD

  • Terminal

  • HeterogenAtom

The Atom Field

The Atom field is also an array of structures containing the following subfields:

  • AtomSerNo

  • AtomName

  • altLoc

  • resName

  • chainID

  • resSeq

  • iCode

  • X

  • Y

  • Z

  • occupancy

  • tempFactor

  • segID

  • element

  • charge

  • AtomNameStruct — Contains three subfields: chemSymbol, remoteInd, and branch.

Examples

Retrieve the structure information for the electron transport (heme) protein that has a PDB identifier of 5CYT, read the information into a MATLAB structure pdbstruct, and save the information to a PDB-formatted file electron_transport.pdb in the MATLAB Current Folder.

pdbstruct = getpdb('5CYT', 'ToFile', 'electron_transport.pdb')

See Also

| | | | | | | |

Was this topic helpful?