This script reads a list of SMILES (Simplified Molecular Input Line Entry System) strings from a raw SMILES file, converts them into chemical structures using RDKit, and displays the resulting structures as images directly within a Jupyter notebook. Each structure is labeled with its corresponding entry number for easy identification.
Tutorial:
-
Generate a csv file, for example, for a specific protein target, you can download a csv file from Chembl database
-
Open this csv file with your excel or Mac number software, export it to csv again, since the downloaded csv may have some minor format issue not recognised by this code.
-
Upload this new csv file to the code via the Colab budge link, and it is ready to process through the SMILES inside the csv you provided.
-
read the file by modifying the file name
df = pd.read_csv('11.csv') df
-
Change the column number based on where is your SMILES in the csv file, if they are stored at 9th column, it should be
!awk -F "\"*,\"*" '{print $8}' 11.csv > smile.smi
(6. Delete rows that not a SMILES, and export the new smiles string to a new csv file.)
Display the molecules as images inside Jupyter Notebook