How to extract repeatedly information from a text and store a specific table?
Hello, I have SDF file containing structure information over 150K substances and I'm interested in extract some information from there. I had to convert to .xls due to matlab supporting.
You can visualize an example of this file below. The ORIGINAL TEXT FILE IS AVAILABLE AS ONTHOLOGY.XLS
Each structure starts with a code (in this example: Q2785366) and finishes with a $$$$.
Q2785366-1 %name of structure
% A set of numbers that represent a .mol file. I'm not interested on them.
"> InChIKey" InChIKey=FPRJHXLFTAYEJH-UHFFFAOYSA-N
"> SMILES" COC1=C2C(OC)=C3C=COC3=NC2=C(OCC(O)C(C)(C)O)C=C1
> Kingdom Organic compounds
"> Superclass" Organoheterocyclic compounds
"> Class" Quinolines and derivatives
"> Subclass" Furanoquinolines
"> Parent" Furanoquinolines
"> Parents" Furopyridines
"> Framework" Aromatic heteropolycyclic compounds
"> Substituents" Furanoquinoline
"> description" This compound belongs to the class of organic compounds known as furanoquinolines. These are compounds containing a furan ring fused to a quinoline.
"> Ancestors" 1,2-diols
I'm interested in organize this data in a new table. The rows related the substances and columns with features. Below are described the features which I´m interested.
Column 1: InChIKey (using only information after "=")
Column 2: SMILES (Only Code)
Column 3: Kingdom (Only text)
Column 4: Superclass (Only text)
Column 5: Class (Only text)
Column 6: Subclass (Only text)
Column 7: Framework (Only text)