Documentation

Connecting to the KEGG API Web Service

This example shows how to access the KEGG [1,2] database using the REST-style KEGG API from within MATLAB to retrieve some information about the human fatty acid degradation pathway and its components.

This example was developed on October 21, 2015 (KEGG Release 76.0). Because data in public repositories is frequently curated and updated, the results shown in this example may differ from the results you get when you use up-to-date data.

The KEGG database [1,2] is developed by Kanehisa Laboratories (http://www.kanehisa.jp/), and its use is subject to various subscription or license terms and charges depending on your use of the data. For details, see http://www.kegg.jp/kegg/download/.

For more details about the KEGG API, see http://www.kegg.jp/kegg/rest/keggapi.html.

Displaying the current statistics of KEGG PATHWAY database

The info operation can be used to display the current statistics of a given database. For example, you can display the version of the current Pathway database and the number of entries represented.

% Define the base URL, the |info| operation, and the database to consider.
base = 'http://rest.kegg.jp/';
operation = 'info/';
database = 'pathway';

% Retrieve the current stats of the Pathway database.
pathwayDbInfo = urlread(strcat(base,operation,database))
pathwayDbInfo =

pathway          KEGG Pathway Database
path             Release 76.0+/10-22, Oct 15
                 Kanehisa Laboratories
                 414,780 entries


Converting KEGG identifiers to/from outside identifiers

You can use the conv operation to convert NCBI-GI identifiers to KEGG identifiers.

operation = 'conv/';
database = 'genes/';
dbentry = 'ncbi-geneid:14751';
kegg_id = regexpi(urlread(strcat(base,operation,database,dbentry)),'(?<=(??@dbentry)\s+)\w+\W+\w*','match')
kegg_id = 

    'mmu:14751'

Retrieving the list of organisms from the KEGG taxonomic classification database

You can use the list operation to retrieve the list of organisms from the KEGG database with taxonomic classification.

operation = 'list/';
database = 'organism';
organisms = urlread(strcat(base,operation,database));
organisms = regexpi(organisms,'[^\n]+','match')'; % convert to cellstr

organisms is an array of cell strings, one for each organism in the KEGG taxonomic classification database. Find an entry with the string Homo sapiens and notice that the KEGG organism code for Homo sapiens is hsa.

hsa_idx = find(~cellfun(@isempty,regexpi(organisms,'Homo sapiens')));
organisms(hsa_idx)
ans = 

    'T01001	hsa	Homo sapiens (human)	Eukaryotes;Animals;Vertebrates;Mam...'

Retrieving the list of pathways for Homo sapiens in the KEGG PATHWAY database

You can use the operation list to retrieve a list of entry identifiers and associated definitions relative to Homo sapiens.

operation = 'list/';
database = 'pathway/';
organismCode = 'hsa';
pathway_list = urlread(strcat(base,operation,database,organismCode));
pathway_list = regexpi(pathway_list,'[^\n]+','match')'; % convert to cellstr
num_pathways = numel(pathway_list) % total number of pathways
num_pathways =

   299

Retrieving the lists of genes, compounds, enzymes, and reactions

You can extract the lists of genes, compounds, enzymes and reactions relative to the pathway Fatty Acid Degradation by retrieving the complete record for this pathway with the operation get. The you can link to other databases with the operation link.

fadp_idx = find(~cellfun(@isempty,regexpi(pathway_list,'Fatty acid degradation')));
pathway_list(fadp_idx)
fadp_id = regexpi(pathway_list(fadp_idx),'(?<=path:)\w+','match');
fadp_id{1}
ans = 

    'path:hsa00071	Fatty acid degradation - Homo sapiens (human)'


ans = 

    'hsa00071'

Get the complete record of fatty acid degradation pathway.

operation = 'get/';
fadp_record = urlread(char(strcat(base,operation,fadp_id{1})))
fadp_record =

ENTRY       hsa00071                    Pathway
NAME        Fatty acid degradation - Homo sapiens (human)
CLASS       Metabolism; Lipid metabolism
PATHWAY_MAP hsa00071  Fatty acid degradation
MODULE      hsa_M00086  beta-Oxidation, acyl-CoA synthesis [PATH:hsa00071]
            hsa_M00087  beta-Oxidation [PATH:hsa00071]
DISEASE     H00162  Sjogren-Larsson syndrome
            H00178  Glutaric acidemia
            H00407  Peroxisomal beta-oxidation enzyme deficiency
            H00525  Disorders of fatty-acid oxidation
            H01267  Familial hyperinsulinemic hypoglycemia (HHF)
            H01352  Mitochondrial trifunctional protein (TFP) deficiency
            H01364  3-Hydroxyacyl-CoA dehydrogenase deficiency
            H01400  Secondary hyperammonemia
DRUG        D00123  Cyanamide (JP16)
            D00131  Disulfiram (JP16/USP/INN)
            D00707  Fomepizole (JAN/USAN/INN)
            D05292  Oxfenicine (USAN/INN)
ORGANISM    Homo sapiens (human) [GN:hsa]
GENE        39  ACAT2; acetyl-CoA acetyltransferase 2 [KO:K00626] [EC:2.3.1.9]
            38  ACAT1; acetyl-CoA acetyltransferase 1 [KO:K00626] [EC:2.3.1.9]
            30  ACAA1; acetyl-CoA acyltransferase 1 [KO:K07513] [EC:2.3.1.16]
            10449  ACAA2; acetyl-CoA acyltransferase 2 [KO:K07508] [EC:2.3.1.16]
            3032  HADHB; hydroxyacyl-CoA dehydrogenase/3-ketoacyl-CoA thiolase/enoyl-CoA hydratase (trifunctional protein), beta subunit [KO:K07509] [EC:2.3.1.16]
            3033  HADH; hydroxyacyl-CoA dehydrogenase [KO:K00022] [EC:1.1.1.35]
            3030  HADHA; hydroxyacyl-CoA dehydrogenase/3-ketoacyl-CoA thiolase/enoyl-CoA hydratase (trifunctional protein), alpha subunit [KO:K07515] [EC:1.1.1.211 4.2.1.17]
            1962  EHHADH; enoyl-CoA, hydratase/3-hydroxyacyl CoA dehydrogenase [KO:K07514] [EC:5.3.3.8 1.1.1.35 4.2.1.17]
            1892  ECHS1; enoyl CoA hydratase, short chain, 1, mitochondrial [KO:K07511] [EC:4.2.1.17]
            8310  ACOX3; acyl-CoA oxidase 3, pristanoyl [KO:K00232] [EC:1.3.3.6]
            51  ACOX1; acyl-CoA oxidase 1, palmitoyl [KO:K00232] [EC:1.3.3.6]
            35  ACADS; acyl-CoA dehydrogenase, C-2 to C-3 short chain [KO:K00248] [EC:1.3.8.1]
            34  ACADM; acyl-CoA dehydrogenase, C-4 to C-12 straight chain [KO:K00249] [EC:1.3.8.7]
            33  ACADL; acyl-CoA dehydrogenase, long chain [KO:K00255] [EC:1.3.8.8]
            36  ACADSB; acyl-CoA dehydrogenase, short/branched chain [KO:K09478] [EC:1.3.99.12]
            37  ACADVL; acyl-CoA dehydrogenase, very long chain [KO:K09479] [EC:1.3.8.9]
            2639  GCDH; glutaryl-CoA dehydrogenase [KO:K00252] [EC:1.3.8.6]
            23305  ACSL6; acyl-CoA synthetase long-chain family member 6 [KO:K01897] [EC:6.2.1.3]
            2182  ACSL4; acyl-CoA synthetase long-chain family member 4 [KO:K01897] [EC:6.2.1.3]
            2180  ACSL1; acyl-CoA synthetase long-chain family member 1 [KO:K01897] [EC:6.2.1.3]
            51703  ACSL5; acyl-CoA synthetase long-chain family member 5 [KO:K01897] [EC:6.2.1.3]
            2181  ACSL3; acyl-CoA synthetase long-chain family member 3 [KO:K01897] [EC:6.2.1.3]
            23205  ACSBG1; acyl-CoA synthetase bubblegum family member 1 [KO:K15013] [EC:6.2.1.3]
            81616  ACSBG2; acyl-CoA synthetase bubblegum family member 2 [KO:K15013] [EC:6.2.1.3]
            1374  CPT1A; carnitine palmitoyltransferase 1A (liver) [KO:K08765] [EC:2.3.1.21]
            1375  CPT1B; carnitine palmitoyltransferase 1B (muscle) [KO:K19523] [EC:2.3.1.21]
            126129  CPT1C; carnitine palmitoyltransferase 1C [KO:K19524] [EC:2.3.1.21]
            1376  CPT2; carnitine palmitoyltransferase 2 [KO:K08766] [EC:2.3.1.21]
            1632  ECI1; enoyl-CoA delta isomerase 1 [KO:K13238] [EC:5.3.3.8]
            10455  ECI2; enoyl-CoA delta isomerase 2 [KO:K13239] [EC:5.3.3.8]
            1579  CYP4A11; cytochrome P450, family 4, subfamily A, polypeptide 11 [KO:K17687] [EC:1.14.15.3]
            284541  CYP4A22; cytochrome P450, family 4, subfamily A, polypeptide 22 [KO:K17688] [EC:1.14.15.3]
            124  ADH1A; alcohol dehydrogenase 1A (class I), alpha polypeptide [KO:K13951] [EC:1.1.1.1]
            125  ADH1B; alcohol dehydrogenase 1B (class I), beta polypeptide [KO:K13951] [EC:1.1.1.1]
            126  ADH1C; alcohol dehydrogenase 1C (class I), gamma polypeptide [KO:K13951] [EC:1.1.1.1]
            131  ADH7; alcohol dehydrogenase 7 (class IV), mu or sigma polypeptide [KO:K13951] [EC:1.1.1.1]
            127  ADH4; alcohol dehydrogenase 4 (class II), pi polypeptide [KO:K13980] [EC:1.1.1.1]
            128  ADH5; alcohol dehydrogenase 5 (class III), chi polypeptide [KO:K00121] [EC:1.1.1.1 1.1.1.284]
            130  ADH6; alcohol dehydrogenase 6 (class V) [KO:K13952] [EC:1.1.1.1]
            217  ALDH2; aldehyde dehydrogenase 2 family (mitochondrial) [KO:K00128] [EC:1.2.1.3]
            224  ALDH3A2; aldehyde dehydrogenase 3 family, member A2 [KO:K00128] [EC:1.2.1.3]
            219  ALDH1B1; aldehyde dehydrogenase 1 family, member B1 [KO:K00128] [EC:1.2.1.3]
            501  ALDH7A1; aldehyde dehydrogenase 7 family, member A1 [KO:K14085] [EC:1.2.1.3 1.2.1.8 1.2.1.31]
            223  ALDH9A1; aldehyde dehydrogenase 9 family, member A1 [KO:K00149] [EC:1.2.1.3 1.2.1.47]
COMPOUND    C00010  CoA
            C00024  Acetyl-CoA
            C00071  Aldehyde
            C00136  Butanoyl-CoA
            C00154  Palmitoyl-CoA
            C00162  Fatty acid
            C00226  Primary alcohol
            C00229  Acyl-carrier protein
            C00249  Hexadecanoic acid
            C00332  Acetoacetyl-CoA
            C00340  Reduced rubredoxin
            C00435  Oxidized rubredoxin
            C00489  Glutarate
            C00517  Hexadecanal
            C00527  Glutaryl-CoA
            C00638  Long-chain fatty acid
            C00823  1-Hexadecanol
            C00877  Crotonoyl-CoA
            C01144  (S)-3-Hydroxybutanoyl-CoA
            C01371  Alkane
            C01832  Lauroyl-CoA
            C01944  Octanoyl-CoA
            C02593  Tetradecanoyl-CoA
            C02990  L-Palmitoylcarnitine
            C03221  2-trans-Dodecenoyl-CoA
            C03547  omega-Hydroxy fatty acid
            C03561  (R)-3-Hydroxybutanoyl-CoA
            C05102  alpha-Hydroxy fatty acid
            C05258  (S)-3-Hydroxyhexadecanoyl-CoA
            C05259  3-Oxopalmitoyl-CoA
            C05260  (S)-3-Hydroxytetradecanoyl-CoA
            C05261  3-Oxotetradecanoyl-CoA
            C05262  (S)-3-Hydroxydodecanoyl-CoA
            C05263  3-Oxododecanoyl-CoA
            C05264  (S)-Hydroxydecanoyl-CoA
            C05265  3-Oxodecanoyl-CoA
            C05266  (S)-3-Hydroxyoctanoyl-CoA
            C05267  3-Oxooctanoyl-CoA
            C05268  (S)-Hydroxyhexanoyl-CoA
            C05269  3-Oxohexanoyl-CoA
            C05270  Hexanoyl-CoA
            C05271  trans-Hex-2-enoyl-CoA
            C05272  trans-Hexadec-2-enoyl-CoA
            C05273  trans-Tetradec-2-enoyl-CoA
            C05274  Decanoyl-CoA
            C05275  trans-Dec-2-enoyl-CoA
            C05276  trans-Oct-2-enoyl-CoA
            C05279  trans,cis-Lauro-2,6-dienoyl-CoA
            C05280  cis,cis-3,6-Dodecadienoyl-CoA
            C20683  Long-chain acyl-[acyl-carrier protein]
REFERENCE   PMID:869535
  AUTHORS   Parekh VR, Traxler RW, Sobek JM
  TITLE     N-Alkane oxidation enzymes of a pseudomonad.
  JOURNAL   Appl Environ Microbiol 33:881-4 (1977)
KO_PATHWAY  ko00071
///


Retrieve the KO_PATHWAY id and all other alias pathway entries for it.

ko_id = regexpi(fadp_record,'(?<=KO\w+PATHWAY\s+)\w*','match')
operation = 'link/';
database = 'pathway/';
allPathwayIDs = urlread(strcat(base,operation,database,ko_id{1}))
ko_id = 

    'ko00071'


allPathwayIDs =

path:ko00071	path:map00071


Retrieve the map_id for later uses.

map_id = regexpi(allPathwayIDs,'(?<=\w+\W+\w+\s+path:)(?=map)\w*', 'match')
map_id = 

    'map00071'

Retrieve the list of genes involved in the pathway.

operation = 'link/';
database = 'genes/';
fadp_genes = urlread(char(strcat(base,operation,database,fadp_id{1})));
fadp_genes = regexpi(fadp_genes, '[^\n]+','match'); % conver to cellstr
num_genes = numel(fadp_genes)
num_genes =

    44

Retrieve the list of compounds involved in the pathway using the map_id.

operation = 'link/';
database = 'cpd/';
fadp_cpds = urlread(strcat(base,operation,database,map_id{1}));
fadp_cpds = regexpi(fadp_cpds,'(?<=\w+\:\w*\s+)cpd:\w*','match');
num_cpds = numel(fadp_cpds)
num_cpds =

    50

Get the list of enzymes using the map_id of the pathway.

operation = 'link/';
database = 'enzyme/';
fadp_enzymes = urlread(strcat(base,operation,database,map_id{1}));
fadp_enzymes = regexpi(fadp_enzymes, '[^\n]+','match');
num_enzymes = numel(fadp_enzymes)
num_enzymes =

    30

Get the list of reactions using the map_id of the pathway.

operation = 'link/';
database = 'rn/';
fadp_reactions = urlread(strcat(base,operation,database,map_id{1}));
fadp_reactions = regexpi(fadp_reactions,'[^\n]+','match');
num_reactions = numel(fadp_reactions)
num_reactions =

    47

Coloring Pathways

In KEGG pathway maps, a gene or enzyme is represented by a rectangle, and a compound is shown as a circle. In this example, the fatty acid degradation pathway map returned by KEGG has the human-specific enzymes colored already in green.

base_pathway_map = 'http://www.kegg.jp/pathway/';
web(char(strcat(base_pathway_map,fadp_id{1})),'-browser')

You can color more components in the pathway. These additional components are highlighted in red by default. For example, you can color the first five compounds from the compound list.

additional_components = fadp_cpds(1:5)';
final_url = char(strcat(base_pathway_map,fadp_id{1}));
for i = 1:size(additional_components,1)
    final_url = strcat(final_url,'+',additional_components{i});
end
web(final_url,'-browser')

Custom-coloring Pathways

You can also add custom colors to selected components of the pathway. An object is custom-colored by specifying its ID followed by the background color and foreground color. The ASCII code %09 is used to represent a tab.

base_custom_color_map = 'http://www.kegg.jp/kegg-bin/show_pathway?';
final_url_custom_color_map = char(strcat(base_custom_color_map,fadp_id{1},'/'));
fgcolor = {'red','blue','green','magenta','yellow'}';
bgcolor = {'blue','magenta','cyan','red','blue'}';
for i = 1:size(additional_components,1)
    final_url_custom_color_map = strcat(final_url_custom_color_map,...
                                        additional_components{i},'%09',...
                                        bgcolor{i},',',fgcolor{i},'/');
end
web(final_url_custom_color_map,'-browser')

You can apply just one custom color to all selected components. Use the map_id instead of HSA id so that gene products related to HSA are not automatically highlighted by KEGG.

dcolor = 'cyan';
final_url_one_color = strcat(base_custom_color_map,map_id{1},'/default%3d',dcolor,'/');
for i = 1:size(additional_components,1)
    final_url_one_color = strcat(final_url_one_color,additional_components{i},'/');
end
web(final_url_one_color,'-browser')

Displaying a static map in a figure

You can display the static pathway map in a figure. You need to set the color map of the figure to cmap. Note: If you have Image Processing Toolbox™, just use imshow(x,cmap) to display the pathway map.

operation = 'get/';
static_url = char(strcat(base,operation,fadp_id{1},'/image'));
[x,cmap] = imread(static_url);
hfig = figure('Colormap', cmap);
hax = axes('Parent', hfig);
himg = image(x, 'Parent', hax);
hax.Visible = 'off';
scaleimagefigure(hfig, hax, himg);

References

[1] Kanehisa, M., Goto, S., Sato, Y., Furumichi, M., and Tanabe, M. "KEGG for integration and interpretation of large-scale molecular datasets", Nucleic Acids Research, 40:D109-14, 2012.

[2] Kanehisa, M. and Goto, S. "KEGG: Kyoto Encyclopedia of Genes and Genomes", Nucleic Acids Research, 28(1):27-30, 2000.

Was this topic helpful?