KEGG pathway database is a collection of manually drawn pathway maps representing current knowledge on molecular interaction and reaction networks, accompanied with KGML (KEGG pathway xml format) files for automatic computational analyses and modeling of metabolic and signaling networks. In a KGML file the pathway is represented as a graph object with entry elements (gene products, compounds, pathways) as its nodes, and relations between elements as edges. However, in most cases there is a lack of correspondence between static images of pathways and accompanying KGML files, so preprocessing of information contained in a KGML file is needed before it can be used in automatic analysis. Several KGML parsers have been developed recently, both standalone or integrated in different bioinformatics packages.
We introduce KEGGParser (pathway parser/editor), which is based on Matlab biograph class.
1. Retrieval, parsing and automatic correction for protein-compound-protein interactions, group nodes and binding directions;
2. Pathway graph editing (edge and node manipulations) based on "Graph manipulation" (http://www.mathworks.com/matlabcentral/fileexchange/37475-graph-manipulation);
3. Analysis of parsed pathways can be performed using Matlab built-in graph-based calculations.
See more details in Arakelyan A, Nersisyan L.KEGGParser: parsing and editing KEGG pathway maps in Matlab. Bioinformatics. 2013 Feb 15;29(4):518-9. doi: 10.1093/bioinformatics/bts730.
I have little knowledge about bioinformatics and been looking for tools to edit pathways. This tool helps me out. I still have a problem to export xml after editing the pathways. By the way, if a pathway is not from Homo sapiens (hsa), the corresponding static image cannot be loaded. This might due to the citation of url in load_image.m file.
Dear Lee, thanks for your comments and suggestions. I'll have a look to the code to find out the problem. Plus will add adjacency matrix. I'd like to know your opinion on the matrix format. It can be n-by-n (n - number of vertices) matrix with 1 indicating edge and 0 otherwise, or N-by-2 matrix, first element of the row defining source and second element in the row defining sink.
There may be a problem with the code - when I parse hsa04010.txt, it seems that ELK4 should be pointing to ELK1. When I check the biograph matrix, it seems that the opposite is recorded. Other binding/association interactions seem to be pointing in the correct direction, but not this one.
I rarely comment on submissions - but this toolbox is so helpful that I have to. Really saved my students a great deal of time and will get a lot of future use. The only extension I would suggest is the capability to extract an adjacency matrix indicating the topology from the graph structure. Many tools expect this type of input - I realize this is non trivial though for complex interactions.
Your comments on functionality or improvement suggestions are very important for me. Feedback is highly appreciated.
1. Minor bugs are fixed.
2. parse_KEGG_xml.m updated to allow creation of graphs without edges.
Fixed problem with parsing relations without subtypes. Added automatic correction for protein-compound-protein interactions, group nodes and binding directions.
I have found a bug, that results in wrong assignment of interaction types. It is fixed. Scilab version of KEGGParser is also included. Works with Scilab 5.4+metanet