MolRep: A Deep Representation Learning Library for Molecular Property Prediction

Related tags

Deep LearningMolRep
Overview

MolRep: A Deep Representation Learning Library for Molecular Property Prediction

Summary

MolRep is a Python package for fairly measuring algorithmic progress on chemical property datasets. It currently provides a complete re-evaluation of 16 state-of-the-art deep representation models over 16 benchmark property datsaets.

architecture

If you found this package useful, please cite biorxiv for now:


Install & Usage

We provide a script to install the environment. You will need the conda package manager, which can be installed from here.

To install the required packages, follow there instructions (tested on a linux terminal):

  1. clone the repository

    git clone https://github.com/Jh-SYSU/MolRep

  2. cd into the cloned directory

    cd MolRep

  3. run the install script

    source install.sh [<your_cuda_version>]

Where <your_cuda_version> is an optional argument that can be either cpu, cu92, cu100, cu101. If you do not provide a cuda version, the script will default to cpu. The script will create a virtual environment named MolRep, with all the required packages needed to run our code. Important: do NOT run this command using bash instead of source!

Data

Data could be download from Google_Driver

Current Dataset

Dataset Task Task type #Molecule Splits Metric Reference
QM7 1 Regression 7160 Stratified MAE Wu et al.
QM8 12 Regression 21786 Random MAE Wu et al.
QM9 12 Regression 133885 Random MAE Wu et al.
ESOL 1 Regression 1128 Random RMSE Wu et al.
FreeSolv 1 Regression 642 Random RMSE Wu et al.
Lipophilicity 1 Regression 4200 Random RMSE Wu et al.
BBBP 1 Classification 2039 Scaffold ROC-AUC Wu et al.
Tox21 12 Classification 7831 Random ROC-AUC Wu et al.
SIDER 27 Classification 1427 Random ROC-AUC Wu et al.
ClinTox 2 Classification 1478 Random ROC-AUC Wu et al.
Liver injury 1 Classification 2788 Random ROC-AUC Xu et al.
Mutagenesis 1 Classification 6511 Random ROC-AUC Hansen et al.
hERG 1 Classification 4813 Random ROC-AUC Li et al.
MUV 17 Classification 93087 Random PRC-AUC Wu et al.
HIV 1 Classification 41127 Random ROC-AUC Wu et al.
BACE 1 Classification 1513 Random ROC-AUC Wu et al.

Methods

Current Methods

Self-/unsupervised Models

Methods Descriptions Reference
Mol2Vec Mol2Vec is an unsupervised approach to learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Jaeger et al.
N-Gram graph N-gram graph is a simple unsupervised representation for molecules that first embeds the vertices in the molecule graph and then constructs a compact representation for the graph by assembling the ver-tex embeddings in short walks in the graph. Liu et al.
FP2Vec FP2Vec is a molecular featurizer that represents a chemical compound as a set of trainable embedding vectors and combine with CNN model. Jeon et al.
VAE VAE is a framework for training two neural networks (encoder and decoder) to learn a mapping from high-dimensional molecular representation into a lower-dimensional space. Kingma et al.

Sequence Models

Methods Descriptions Reference
BiLSTM BiLSTM is an artificial recurrent neural network (RNN) architecture to encoding sequences from compound SMILES strings. Hochreiter et al.
SALSTM SALSTM is a self-attention mechanism with improved BiLSTM for molecule representation. Zheng et al
Transformer Transformer is a network based solely on attention mechanisms and dispensing with recurrence and convolutions entirely to encodes compound SMILES strings. Vaswani et al.
MAT MAT is a molecule attention transformer utilized inter-atomic distances and the molecular graph structure to augment the attention mechanism. Maziarka et al.

Graph Models

Methods Descriptions Reference
DGCNN DGCNN is a deep graph convolutional neural network that proposes a graph convolution model with SortPooling layer which sorts graph vertices in a consistent order to learning the embedding of molec-ular graph. Zhang et al.
GraphSAGE GraphSAGE is a framework for inductive representation learning on molecular graphs that used to generate low-dimensional representations for atoms and performs sum, mean or max-pooling neigh-borhood aggregation to updates the atom representation and molecular representation. Hamilton et al.
GIN GIN is the Graph Isomorphism Network that builds upon the limitations of GraphSAGE to capture different graph structures with the Weisfeiler-Lehman graph isomorphism test. Xu et al.
ECC ECC is an Edge-Conditioned Convolution Network that learns a different parameter for each edge label (bond type) on the molecular graph, and neighbor aggregation is weighted according to specific edge parameters. Simonovsky et al.
DiffPool DiffPool combines a differentiable graph encoder with its an adaptive pooling mechanism that col-lapses nodes on the basis of a supervised criterion to learning the representation of molecular graphs. Ying et al.
MPNN MPNN is a message-passing graph neural network that learns the representation of compound molecular graph. It mainly focused on obtaining effective vertices (atoms) embedding Gilmer et al.
D-MPNN DMPNN is another message-passing graph neural network that messages associated with directed edges (bonds) rather than those with vertices. It can make use of the bond attributes. Yang et al.
CMPNN CMPNN is the graph neural network that improve the molecular graph embedding by strengthening the message interactions between edges (bonds) and nodes (atoms). Song et al.

Training

To train a model by K-fold, run 5-fold-training_example.ipynb.

Testing

To test a pretrained model, run testing-example.ipynb.

Results

Results on Classification Tasks.

Datasets BBBP Tox21 SIDER ClinTox MUV HIV BACE
Mol2Vec 0.9213±0.0052 0.8139±0.0081 0.6043±0.0061 0.8572±0.0054 0.1178±0.0032 0.8413±0.0047 0.8284±0.0023
N-Gram graph 0.9012±0.0385 0.8371±0.0421 0.6482±0.0437 0.8753±0.0077 0.1011±0.0000 0.8378±0.0034 0.8472±0.0057
FP2Vec 0.8076±0.0032 0.8578±0.0076 0.6678±0.0068 0.8834±0.0432 0.0856±0.0031 0.7894±0.0052 0.8129±0.0492
VAE 0.8378±0.0031 0.8315±0.0382 0.6493±0.0762 0.8674±0.0124 0.0794±0.0001 0.8109±0.0381 0.8368±0.0762
BiLSTM 0.8391±0.0032 0.8279±0.0098 0.6092±0.0303 0.8319±0.0120 0.0382±0.0000 0.7962±0.0098 0.8263±0.0031
SALSTM 0.8482±0.0329 0.8253±0.0031 0.6308±0.0036 0.8317±0.0003 0.0409±0.0000 0.8034±0.0128 0.8348±0.0019
Transformer 0.9610±0.0119 0.8129±0.0013 0.6017±0.0012 0.8572±0.0032 0.0716±0.0017 0.8372±0.0314 0.8407±0.0738
MAT 0.9620±0.0392 0.8393±0.0039 0.6276±0.0029 0.8777±0.0149 0.0913±0.0001 0.8653±0.0054 0.8519±0.0504
DGCNN 0.9311±0.0434 0.7992±0.0057 0.6007±0.0053 0.8302±0.0126 0.0438±0.0000 0.8297±0.0038 0.8361±0.0034
GraphSAGE 0.9630±0.0474 0.8166±0.0041 0.6403±0.0045 0.9116±0.0146 0.1145±0.0000 0.8705±0.0724 0.9316±0.0360
GIN 0.8746±0.0359 0.8178±0.0031 0.5904±0.0000 0.8842±0.0004 0.0832±0.0000 0.8015±0.0328 0.8275±0.0034
ECC 0.9620±0.0003 0.8677±0.0090 0.6750±0.0092 0.8862±0.0831 0.1308±0.0013 0.8733±0.0025 0.8419±0.0092
DiffPool 0.8732±0.0391 0.8012±0.0130 0.6087±0.0130 0.8345±0.0233 0.0934±0.0001 0.8452±0.0042 0.8592±0.0391
MPNN 0.9321±0.0312 0.8440±0.014 0.6313±0.0121 0.8414±0.0294 0.0572±0.0001 0.8032±0.0092 0.8493±0.0013
DMPNN 0.9562±0.0070 0.8429±0.0391 0.6378±0.0329 0.8692±0.0051 0.0867±0.0032 0.8137±0.0072 0.8678±0.0372
CMPNN 0.9854±0.0215 0.8593±0.0088 0.6581±0.0020 0.9169±0.0065 0.1435±0.0002 0.8687±0.0003 0.8932±0.0019

More results will be updated soon.

Owner
AI-Health @NSCC-gz
AI-Health @NSCC-gz
Save-restricted-v-3 - Save restricted content Bot For telegram

Save restricted content Bot Contact: Telegram A stable telegram bot to get restr

DEVANSH 11 Dec 21, 2022
Continuum Learning with GEM: Gradient Episodic Memory

Gradient Episodic Memory for Continual Learning Source code for the paper: @inproceedings{GradientEpisodicMemory, title={Gradient Episodic Memory

Facebook Research 360 Dec 27, 2022
Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)

Discovering Non-monotonic Autoregressive Orderings with Variational Inference Description This package contains the source code implementation of the

Xuanlin (Simon) Li 10 Dec 29, 2022
This repository contains a Ruby API for utilizing TensorFlow.

tensorflow.rb Description This repository contains a Ruby API for utilizing TensorFlow. Linux CPU Linux GPU PIP Mac OS CPU Not Configured Not Configur

somatic labs 825 Dec 26, 2022
Disturbing Target Values for Neural Network regularization: attacking the loss layer to prevent overfitting

Disturbing Target Values for Neural Network regularization: attacking the loss layer to prevent overfitting 1. Classification Task PyTorch implementat

Yongho Kim 0 Apr 24, 2022
Code for "Localization with Sampling-Argmax", NeurIPS 2021

Localization with Sampling-Argmax [Paper] [arXiv] [Project Page] Localization with Sampling-Argmax Jiefeng Li, Tong Chen, Ruiqi Shi, Yujing Lou, Yong-

JeffLi 71 Dec 17, 2022
Masked regression code - Masked Regression

Masked Regression MR - Python Implementation This repositery provides a python implementation of MR (Masked Regression). MR can efficiently synthesize

Arbish Akram 1 Dec 23, 2021
Must-read Papers on Physics-Informed Neural Networks.

PINNpapers Contributed by IDRL lab. Introduction Physics-Informed Neural Network (PINN) has achieved great success in scientific computing since 2017.

IDRL 330 Jan 07, 2023
Code base for the paper "Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation"

This repository contains code for the paper Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiati

8 Aug 28, 2022
Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.

lacmus The program for searching through photos from the air of lost people in the forest using Retina Net neural nwtwork. The project is being develo

Lacmus Foundation 168 Dec 27, 2022
DGL-TreeSearch and the Gurobi-MWIS interface

Independent Set Benchmarking Suite This repository contains the code for our maximum independent set benchmarking suite as well as our implementations

Maximilian Böther 19 Nov 22, 2022
Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images (ICCV 2021)

Table of Content Introduction Getting Started Datasets Installation Experiments Training & Testing Pretrained models Texture fine-tuning Demo Toward R

VinAI Research 42 Dec 05, 2022
Realtime segmentation with ENet, the fast and accurate segmentation net.

Enet This is a realtime segmentation net with almost 22 fps on GTX1080 ti, and the model size is very small with only 28M. This repo contains the infe

JinTian 14 Aug 30, 2022
TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication)

Parameterization of Hypercomplex Multiplications (PHM) This repository contains the TensorFlow implementation of PHM (Parameterization of Hypercomplex

Aston Zhang 9 Oct 26, 2022
Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting

QAConv Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting This PyTorch code is proposed in

Shengcai Liao 166 Dec 28, 2022
A PyTorch library and evaluation platform for end-to-end compression research

CompressAI CompressAI (compress-ay) is a PyTorch library and evaluation platform for end-to-end compression research. CompressAI currently provides: c

InterDigital 680 Jan 06, 2023
Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5)

YOLOv5-GUI 🎉 YOLOv5算法(ver.6及ver.5)的Qt-GUI实现 🎉 Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5). 基于YOLOv5的v5版本和v6版本及Javacr大佬的UI逻辑进行编写

EricFang 12 Dec 28, 2022
Small utility to demangle Nim symbols in callgrind files

nim_callgrind A small utility to demangle Nim symbols from callgrind files. Usage Run your (Nim) program with something like this: valgrind --tool=cal

kraptor 3 Feb 15, 2022
Image Recognition using Pytorch

PyTorch Project Template A simple and well designed structure is essential for any Deep Learning project, so after a lot practice and contributing in

Sarat Chinni 1 Nov 02, 2021
kullanışlı ve işinizi kolaylaştıracak bir araç

Hey merhaba! işte çok sorulan sorularının cevabı ve sorunlarının çözümü; Soru= İçinde var denilen birçok şeyi göremiyorum bunun sebebi nedir? Cevap= B

Sexettin 16 Dec 17, 2022