MATLAB Answers

How to extract repeatedly information from a text and store a specific table?

1 view (last 30 days)
Alan Cesar Pilon Miro
Alan Cesar Pilon Miro on 25 Apr 2018
Answered: Sarah Palfreyman on 30 Apr 2018

Hello, I have SDF file containing structure information over 150K substances and I'm interested in extract some information from there. I had to convert to .xls due to matlab supporting.

You can visualize an example of this file below. The ORIGINAL TEXT FILE IS AVAILABLE AS ONTHOLOGY.XLS

Each structure starts with a code (in this example: Q2785366) and finishes with a $$$$.

Q2785366-1 %name of structure

% A set of numbers that represent a .mol file. I'm not interested on them.

%Attributes

"> InChIKey" InChIKey=FPRJHXLFTAYEJH-UHFFFAOYSA-N

"> SMILES" COC1=C2C(OC)=C3C=COC3=NC2=C(OCC(O)C(C)(C)O)C=C1

> Kingdom Organic compounds

"> Superclass" Organoheterocyclic compounds

"> Class" Quinolines and derivatives

"> Subclass" Furanoquinolines

"> Nodes"

"> Parent" Furanoquinolines

"> Parents" Furopyridines

"> Framework" Aromatic heteropolycyclic compounds

"> Substituents" Furanoquinoline

"> description" This compound belongs to the class of organic compounds known as furanoquinolines. These are compounds containing a furan ring fused to a quinoline.

"> Ancestors" 1,2-diols

> Descriptors

$$$$

I'm interested in organize this data in a new table. The rows related the substances and columns with features. Below are described the features which I´m interested.

Column 1: InChIKey (using only information after "=")

Column 2: SMILES (Only Code)

Column 3: Kingdom (Only text)

Column 4: Superclass (Only text)

Column 5: Class (Only text)

Column 6: Subclass (Only text)

Column 7: Framework (Only text)

Thank you.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!