Systemic Evolutionary Chemical Space Exploration for Drug Discovery

Overview

SECSE


SECSE: Systemic Evolutionary Chemical Space Explorer

plot

Chemical space exploration is a major task of the hit-finding process during the pursuit of novel chemical entities. Compared with other screening technologies, computational de novo design has become a popular approach to overcome the limitation of current chemical libraries. Here, we reported a de novo design platform named systemic evolutionary chemical space explorer (SECSE). The platform was conceptually inspired by fragment-based drug design, that miniaturized a “lego-building” process within the pocket of a certain target. The key of virtual hits generation was then turned into a computational search problem. To enhance search and optimization, human intelligence and deep learning were integrated. SECSE has the potential in finding novel and diverse small molecules that are attractive starting points for further validation.

Tutorials and Usage


  1. Set Environment Variables
    export $SECSE=path/to/SECSE
    if you use AutoDock Vina for docking: (download here)
    export $VINA=path/to/AutoDockVINA
    if you use Gilde for docking (additional installation & license required):
    export $SCHRODINGER=path/to/SCHRODINGER

  2. Give execution permissions to the SECSE directory
    chmod -R +X path/to/SECSE

  3. Input fragments: a tab split .smi file without header. See demo here.

  4. Parameters in config file:
    [DEFAULT]

    • workdir, working directory, create if not exists, otherwise overwrite, type=str
    • fragments, file path to seed fragments, smi format, type=str
    • num_gen, number of generations, type=int
    • num_per_gen, number of molecules generated each generation, type=int
    • seed_per_gen, number of selected seed molecules per generation, default=1000, type=int
    • start_gen, number of staring generation, default=0, type=int
    • docking_program, name of docking program, AutoDock-Vina (input vina) or Glide (input glide) , default=vina, type=str

    [docking]

    • target, protein PDBQT if use AutoDock Vina; Grid file if choose Glide, type=str
    • RMSD, docking pose RMSD cutoff between children and parent, default=2, type=float
    • delta_score, decreased docking score cutoff between children and parent, default=-1.0, type=float
    • score_cutoff, default=-9, type=float

    Parameters when docking by AutoDock Vina:

    • x, Docking box x, type=float
    • y, Docking box y, type=float
    • z, Docking box z, type=float
    • box_size_x, Docking box size x, default=20, type=float
    • box_size_y, Docking box size y, default=20, type=float
    • box_size_z, Docking box size z, default=20, type=float

    [deep learning]

    • mode, mode of deep learning modeling, 0: not use, 1: modeling per generation, 2: modeling overall after all the generation, default=0, type=int
    • dl_per_gen, top N predicted molecules for docking, default=100, type=int
    • dl_score_cutoff, default=-9, type=float

    [properties]

    • MW, molecular weights cutoff, default=450, type=int
    • logP_lower, minimum of logP, default=0.5, type=float
    • logP_upper, maximum of logP, default=7, type=float
    • chiral_center, maximum of chiral center,default=3, type=int
    • heteroatom_ratio, maximum of heteroatom ratio, default=0.35, type=float
    • rotatable_bound_num, maximum of rotatable bound, default=5, type=int
    • rigid_body_num, default=2, type=int

    Config file of a demo case phgdh_demo_vina.ini

  5. Run SECSE
    python $SECSE/run_secse.py --config path/to/config

  6. Output files

    • merged_docked_best_timestamp_with_grow_path.csv: selected molecules and growing path
    • selected.sdf: 3D conformers of all selected molecules

Dependencies


GNU Parallel installation

numpy~=1.20.3, pandas~=1.3.3, pandarallel~=1.5.2, tqdm~=4.62.2, biopandas~=0.2.9, openbabel~=3.1.1, rdkit~=2021.03.5, chemprop~=1.3.1, torch~=1.9.0+cu111

Citation


Lu, C.; Liu, S.; Shi, W.; Yu, J.; Zhou, Z.; Zhang, X.; Lu, X.; Cai, F.; Xia, N.; Wang, Y. Systemic Evolutionary Chemical Space Exploration For Drug Discovery. ChemRxiv 2021. This content is a preprint and has not been peer-reviewed.

License


SECSE is released under Apache License, Version 2.0.

You might also like...
ETMO: Evolutionary Transfer Multiobjective Optimization

ETMO: Evolutionary Transfer Multiobjective Optimization To promote the research on ETMO, benchmark problems are of great importance to ETMO algorithm

Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop
Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop

Guiding Evolutionary Strategies by Differentiable Robot Simulators In recent years, Evolutionary Strategies were actively explored in robotic tasks fo

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

Balanced-Evolutionary-Semi-Stacking Code for the paper ''BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalan

This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization This is the code for our paper ``SumGNN: Multi-typed Drug

Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network
Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

DeepCDR Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network This work has been accepted to ECCB2020 and was also published in the

Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2

CoaDTI Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2 Abstract Environment The test was conducted i

The code for SAG-DTA: Prediction of Drug–Target Affinity Using Self-Attention Graph Network.

SAG-DTA The code is the implementation for the paper 'SAG-DTA: Prediction of Drug–Target Affinity Using Self-Attention Graph Network'. Requirements py

[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.
[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

[ICLR 2021] RAPID: A Simple Approach for Exploration in Reinforcement Learning This is the Tensorflow implementation of ICLR 2021 paper Rank the Episo

A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer  from NNAISENSE.
A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

PGPElib A mini library for Policy Gradients with Parameter-based Exploration [1] and friends. This library serves as a clean re-implementation of the

Comments
  • Problem running demo

    Problem running demo

    Hi!

    When I try to run the demo with the command below. python $SECSE/run_secse.py --config demo/phgdh_demo_vina.ini

    It generates pandas.errors.EmptyDataError: No columns to parse from file, what should I do to solve it? Thank you!

    Here is the output

    **************************************************************************************** 
          ____    _____    ____   ____    _____ 
         / ___|  | ____|  / ___| / ___|  | ____|
         \___ \  |  _|   | |     \___ \  |  _|  
          ___) | | |___  | |___   ___) | | |___ 
         |____/  |_____|  \____| |____/  |_____|
    /home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/core/generic.py:2882: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores.
     method=method,
    Table 'G-001' already exists.
    
    ******************************************************************
    Input fragment file: /home/bruce/Work/CADD/SECSE/code/demo/demo_1020.smi
    Target grid file: /home/bruce/Work/CADD/SECSE/code/demo/PHGDH_6RJ3_for_vina.pdbqt
    Workdir: /home/bruce/Work/CADD/SECSE/code/res/
    
    
    ************************************************** 
    Generation  0 ...
    Step 1: Docking with Autodock Vina ...
    /home/bruce/Work/CADD/SECSE/code/secse/evaluate/ligprep_vina_parallel.sh /home/bruce/Work/CADD/SECSE/code/res/generation_0 /home/bruce/Work/CADD/SECSE/code/demo/demo_1020.smi /home/bruce/Work/CADD/SECSE/code/demo/PHGDH_6RJ3_for_vina.pdbqt 20.9 -10.4 3.0 20.0 20.0 25.0 10
    find /home/bruce/Work/CADD/SECSE/code/res/generation_0/sdf_files -name "*sdf" | xargs -n 100 cat > /home/bruce/Work/CADD/SECSE/code/res/generation_0/docking_outputs_with_score.sdf
    Docking time cost: 0.12 min.
    Step 2: Ranking docked molecules...
    9 cmpds after evaluate
    The evaluate score cutoff is: -9.0
    9 final seeds.
    
    ************************************************** 
    Generation  1 ...
    Step 1: Mutation
    No rule class:  B-001
    No rule class:  G-003
    No rule class:  G-004
    No rule class:  G-005
    No rule class:  G-006
    No rule class:  G-007
    No rule class:  M-001
    No rule class:  M-002
    No rule class:  M-003
    No rule class:  M-004
    No rule class:  M-005
    No rule class:  M-006
    No rule class:  M-007
    No rule class:  M-008
    No rule class:  M-009
    No rule class:  M-010
    No rule class: G-002
    Step 2: Filtering all mutated mols
    sh /home/bruce/Work/CADD/SECSE/code/secse/growing/filter_parallel.sh /home/bruce/Work/CADD/SECSE/code/res/generation_1 1 demo/phgdh_demo_vina.ini 10
    Filter runtime: 0.00 min.
    Traceback (most recent call last):
     File "/home/bruce/Work/CADD/SECSE/code/secse/run_secse.py", line 80, in <module>
       main()
     File "/home/bruce/Work/CADD/SECSE/code/secse/run_secse.py", line 65, in main
       workflow.grow()
     File "/home/bruce/Work/CADD/SECSE/code/secse/grow_processes.py", line 208, in grow
       self._filter_df = pd.read_csv(os.path.join(self.workdir_now, "filter.csv"), header=None)
     File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
       return func(*args, **kwargs)
     File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
       return _read(filepath_or_buffer, kwds)
     File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
       parser = TextFileReader(filepath_or_buffer, **kwds)
     File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
       self._engine = self._make_engine(self.engine)
     File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
       return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
     File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 69, in __init__
       self._reader = parsers.TextReader(self.handles.handle, **kwds)
     File "pandas/_libs/parsers.pyx", line 549, in pandas._libs.parsers.TextReader.__cinit__
    pandas.errors.EmptyDataError: No columns to parse from file
    
    opened by BW15061999 17
  • Question about running the demo code

    Question about running the demo code

    Hi authors,

    I have tried to run your demo code in README.md, but got some errors.

    Command

    python /home/xxx/workspace/off-SECSE/secse/run_secse.py --config ./config.ini
    

    Output

     **************************************************************************************** 
           ____    _____    ____   ____    _____ 
          / ___|  | ____|  / ___| / ___|  | ____|
          \___ \  |  _|   | |     \___ \  |  _|  
           ___) | | |___  | |___   ___) | | |___ 
          |____/  |_____|  \____| |____/  |_____|
    
    ******************************************************************
    Input fragment file: /home/xxx/workspace/off-SECSE/fy-run/demo001/ligand.smi
    Target grid file: /home/xxx/workspace/off-SECSE/fy-run/demo001/receptor.pdbqt
    Workdir: /home/xxx/workspace/off-SECSE/fy-run/demo001/
    
    Step 1: Docking with Autodock Vina ...
    /home/xxx/workspace/off-SECSE/secse/evaluate/ligprep_vina_parallel.sh /home/xxx/workspace/off-SECSE/fy-run/demo001/generation_0 /home/xxx/workspace/off-SECSE/fy-run/demo001/ligand.smi /home/t-yafan/workspace/off-SECSE/fy-run/demo001/receptor.pdbqt 20.9 -10.4 3.0 20.0 20.0 25.0 10
    find /home/xxx/workspace/off-SECSE/fy-run/demo001/generation_0/sdf_files -name "*sdf" | xargs -n 100 cat > /home/xxx/workspace/off-SECSE/fy-run/demo001/generation_0/docking_outputs_with_score.sdf
    Docking time cost: 0.11 min.
    Step 2: Ranking docked molecules...
    9 cmpds after evaluate
    The evaluate score cutoff is: -9.0
    9 final seeds.
    
     ************************************************** 
    Generation  1 ...
    Step 1: Mutation
    Traceback (most recent call last):
      File "/home/xxx/workspace/off-SECSE/secse/run_secse.py", line 70, in <module>
        main()
      File "/home/xxx/workspace/off-SECSE/secse/run_secse.py", line 55, in main
        workflow.grow()
      File "/home/xxx/workspace/off-SECSE/secse/grow_processes.py", line 159, in grow
        header = mutation_df(self.winner_df, self.workdir, self.cpu_num, self.gen)
      File "/home/xxx/workspace/off-SECSE/secse/growing/mutation/mutation.py", line 166, in mutation_df
        mutation = Mutation(5000, workdir)
      File "/home/xxx/workspace/off-SECSE/secse/growing/mutation/mutation.py", line 29, in __init__
        self.load_common_rules()
      File "/home/xxx/workspace/off-SECSE/secse/growing/mutation/mutation.py", line 50, in load_common_rules
        c.execute(sql)
    sqlite3.OperationalError: no such table: B-001
    

    It seems that the file secse/growing/mutation/rules_demo.db is missing in the repo. How can I fix it?

    Thanks!

    opened by fyabc 5
  • All dockings do not work because there's no gridding process.

    All dockings do not work because there's no gridding process.

    Hi, I was trying out the repo when I realised that neither the autodock nor glide is able to run because there was no gridding process, resulting in no grid files. >.<

    opened by yipy0005 3
Releases(v1.1.0)
🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

SGLKT-VisDial Pytorch Implementation for the paper: Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer Gi-Cheon Kang, Junseok P

Gi-Cheon Kang 9 Jul 05, 2022
PyTorch implementation for MINE: Continuous-Depth MPI with Neural Radiance Fields

MINE: Continuous-Depth MPI with Neural Radiance Fields Project Page | Video PyTorch implementation for our ICCV 2021 paper. MINE: Towards Continuous D

Zijian Feng 325 Dec 29, 2022
Active and Sample-Efficient Model Evaluation

Active Testing: Sample-Efficient Model Evaluation Hi, good to see you here! 👋 This is code for "Active Testing: Sample-Efficient Model Evaluation". P

Jannik Kossen 19 Oct 30, 2022
Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021 Official Pytorch implementation of PCME | Paper Sanghyuk Chun1 Seong Joon Oh1 Rafael Sampaio de R

NAVER AI 87 Dec 21, 2022
SARS-Cov-2 Recombinant Finder for fasta sequences

Sc2rf - SARS-Cov-2 Recombinant Finder Pronounced: Scarf What's this? Sc2rf can search genome sequences of SARS-CoV-2 for potential recombinants - new

Lena Schimmel 41 Oct 03, 2022
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

🌈 ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

Hyungtae Lim 225 Dec 29, 2022
Code for reproducing experiments in "Improved Training of Wasserstein GANs"

Improved Training of Wasserstein GANs Code for reproducing experiments in "Improved Training of Wasserstein GANs". Prerequisites Python, NumPy, Tensor

Ishaan Gulrajani 2.2k Jan 01, 2023
Title: Heart-Failure-Classification

This Notebook is based off an open source dataset available on where I have created models to classify patients who can potentially witness heart failure on the basis of various parameters. The best

Akarsh Singh 2 Sep 13, 2022
This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Github Code of "MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices" Introduction This repo is official PyTorch implementatio

Choi Sang Bum 203 Jan 05, 2023
WatermarkRemoval-WDNet-WACV2021

WatermarkRemoval-WDNet-WACV2021 Thank you for your attention. Citation Please cite the related works in your publications if it helps your research: @

LUYI 63 Dec 05, 2022
Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation

FCN_MSCOCO_Food_Segmentation Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation Input data: [http://mscoco.org/dataset/#ove

Alexander Kalinovsky 11 Jan 08, 2019
A nutritional label for food for thought.

Lexiscore As a first effort in tackling the theme of information overload in content consumption, I've been working on the lexiscore: a nutritional la

Paul Bricman 34 Nov 08, 2022
Doge-Prediction - Coding Club prediction ig

Doge-Prediction Coding Club prediction ig Basically: Create an application that

1 Jan 10, 2022
Source code for Acorn, the precision farming rover by Twisted Fields

Acorn precision farming rover This is the software repository for Acorn, the precision farming rover by Twisted Fields. For more information see twist

Twisted Fields 198 Jan 02, 2023
Learning Calibrated-Guidance for Object Detection in Aerial Images

Learning Calibrated-Guidance for Object Detection in Aerial Images arxiv We propose a simple yet effective Calibrated-Guidance (CG) scheme to enhance

51 Sep 22, 2022
UNet model with VGG11 encoder pre-trained on Kaggle Carvana dataset

TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation By Vladimir Iglovikov and Alexey Shvets Introduction TernausNet is

Vladimir Iglovikov 1k Dec 28, 2022
Using a Seq2Seq RNN architecture via TensorFlow to predict future Bitcoin prices

Recurrent Bitcoin Network A Data Science Thesis Project About This repository contains the source code for implementing Bitcoin price prediciton using

Frizu 6 Sep 08, 2022
ArcaneGAN by Alex Spirin

ArcaneGAN by Alex Spirin

Alex 617 Dec 28, 2022
🥈78th place in Riiid Answer Correctness Prediction competition

Riiid Answer Correctness Prediction Introduction This repository is the code that placed 78th in Riiid Answer Correctness Prediction competition. Requ

Jungwoo Park 10 Jul 14, 2022
September-Assistant - Open-source Windows Voice Assistant

September - Windows Assistant September is an open-source Windows personal assis

The Nithin Balaji 9 Nov 22, 2022