nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

Related tags

Deep LearningnextPARS
Overview

nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

Here you will find the scripts necessary to produce the scores described in our paper from fastq files obtained during the experiment.

Install Prerequisites

First install git:

sudo apt-get update
sudo apt-get install git-all

Then clone this repository

git clone https://github.com/jwill123/nextPARS.git

Now, ensure the necessary python packages are installed, and can be found in the $PYTHONPATH environment variable by running the script packages_for_nextPARS.sh in the nextPARS directory.

cd nextPARS/conf
chmod 775 packages_for_nextPARS.sh
./packages_for_nextPARS.sh

Convert fastq to tab

In order to go from the fastq outputs of the nextPARS experiments to a format that allows us to calculate scores, first map the reads in the fastq files to a reference using the program of your choice. Once you have obtained a bam file, use PARSParser_0.67.b.jar. This program counts the number of reads beginning at each position (which indicates a cut site for the enzyme in the file name) and outputs it in .tab format (count values for each position are separated by semi-colons).

Example usage:

java -jar PARSParser_0.67.b.jar -a bamFile -b bedFile -out outFile -q 20 -m 5

where the required arguments are:

  • -a gives the bam file of interest
  • -b is the bed file for the reference
  • -out is the name given to the output file in .tab format

Also accepts arguments:

  • -q for minimum mapping quality for reads to be included [default = 0]
  • -m for minimum average counts per position for a given transcript [default = 5.0]

Sample Data

There are sample data files found in the folder nextPARS/data, as well as the necessary fasta files in nextPARS/data/SEQS/PROBES, and the reference structures obtained from PDB in nextPARS/data/STRUCTURES/REFERENCE_STRUCTURES There are also 2 folders of sample output files from the PARSParser_0.67.b.jar program that can be used as further examples of the nextPARS score calculations described below. These folders are found in nextPARS/data/PARSParser_outputs. NOTE: these are randomly generated sequences with random enzyme values, so they are just to be used as examples for the usage of the scripts, good results should not be expected with these.

nextPARS Scores

To obtain the scores from nextPARS experiments, use the script get_combined_score.py. Sample data for the 5 PDB control structures can be found in the folder nextPARS/data/

There are a number of different command line options in the script, many of which were experimental or exploratory and are not relevant here. The useful ones in this context are the following:

  • Use the -i option [REQUIRED] to indicate the molecule for which you want scores (all available data files will be included in the calculations -- molecule name must match that in the data file names)

  • Use the -inDir option to indicate the directory containing the .tab files with read counts for each V1 and S1 enzyme cuts

  • Use the -f option to indicate the path to the fasta file for the input molecule

  • Use the -s option to produce an output Structure Preference Profile (SPP) file. Values for each position are separated by semi-colons. Here 0 = paired position, 1 = unpaired position, and NA = position with a score too low to determine its configuration.

  • Use the -o option to output the calculated scores, again with values for each position separated by semi-colons.

  • Use the --nP_only option to output the calculated nextPARS scores before incorporating the RNN classifier, again with values for each position separated by semi-colons.

  • Use the option {-V nextPARS} to produce an output with the scores that is compatible with the structure visualization program VARNA1

  • Use the option {-V spp} to produce an output with the SPP values that is compatible with VARNA.

  • Use the -t option to change the threshold value for scores when determining SPP values [default = 0.8, or -0.8 for negative scores]

  • Use the -c option to change the percentile cap for raw values at the beginning of calculations [default = 95]

  • Use the -v option to print some statistics in the case that there is a reference CT file available ( as with the example molecules, found in nextPARS/data/STRUCTURES/REFERENCE_STRUCTURES ). If not, will still print nextPARS scores and info about the enzyme .tab files included in the calculations.

Example usage:

# to produce an SPP file for the molecule TETp4p6
python get_combined_score.py -i TETp4p6 -s
# to produce a Varna-compatible output with the nextPARS scores for one of the 
# randomly generated example molecules
python get_combined_score.py -i test_37 -inDir nextPARS/data/PARSParser_outputs/test1 \
  -f nextPARS/data/PARSParser_outputs/test1/test1.fasta -V nextPARS

RNN classifier (already incorporated into the nextPARS scores above)

To run the RNN classifier separately, using a different experimental score input (in .tab format), it can be run like so with the predict2.py script:

python predict2.py -f molecule.fasta -p scoreFile.tab -o output.tab

Where the command line options are as follows:

  • the -f option [REQUIRED] is the input fasta file
  • the -p option [REQUIRED] is the input Score tab file
  • the -o option [REQUIRED] is the final Score tab output file.
  • the -w1 option is the weight for the RNN score. [default = 0.5]
  • the -w2 option is the weight for the experimental data score. [default = 0.5]

References:

  1. Darty,K., Denise,A. and Ponty,Y. (2009) VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinforma. Oxf. Engl., 25, 1974–197
Owner
Jesse Willis
Jesse Willis
Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

Despite its importance, there are few previous works applying I2I translation to webtoon. I collected dataset from naver webtoon 연애혁명 and tried to transfer human faces to webtoon domain.

이상윤 64 Oct 19, 2022
Implementation of Monocular Direct Sparse Localization in a Prior 3D Surfel Map (DSL)

DSL Project page: https://sites.google.com/view/dsl-ram-lab/ Monocular Direct Sparse Localization in a Prior 3D Surfel Map Authors: Haoyang Ye, Huaiya

Haoyang Ye 93 Nov 30, 2022
VIsually-Pivoted Audio and(N) Text

VIP-ANT: VIsually-Pivoted Audio and(N) Text Code for the paper Connecting the Dots between Audio and Text without Parallel Data through Visual Knowled

Yän.PnG 16 Nov 04, 2022
Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)

Self-Supervised Models are Continual Learners This is the official repository for the paper: Self-Supervised Models are Continual Learners Enrico Fini

Enrico Fini 73 Dec 18, 2022
Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.

Video Representation Learning by Recognizing Temporal Transformations [Project Page] Simon Jenni, Givi Meishvili, and Paolo Favaro. In ECCV, 2020. Thi

Simon Jenni 46 Nov 14, 2022
PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition. Transformer models are good at capturing content-based

Soohwan Kim 565 Jan 04, 2023
[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

Attention Helps CNN See Better: Hybrid Image Quality Assessment Network [CVPRW 2022] Code for Hybrid Image Quality Assessment Network [paper] [code] T

IIGROUP 49 Dec 11, 2022
Colab notebook for openai/glide-text2im.

GLIDE text2im on Colab This repository provides a Colab notebook to produce images conditioned on text prompts with GLIDE [1]. Usage Run text2im.ipynb

Wok 19 Oct 19, 2022
i3DMM: Deep Implicit 3D Morphable Model of Human Heads

i3DMM: Deep Implicit 3D Morphable Model of Human Heads CVPR 2021 (Oral) Arxiv | Poject Page This project is the official implementation our work, i3DM

Tarun Yenamandra 60 Jan 03, 2023
A Fast Monotone Rotating Shallow Water model

pyRSW A Fast Monotone Rotating Shallow Water model How fast? As fast as a sustained 2 Gflop/s per core on a 2.5 GHz cpu (or 2048 Gflop/s with 1024 cor

Guillaume Roullet 13 Sep 28, 2022
External Attention Network

Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks paper : https://arxiv.org/abs/2105.02358 EAMLP will come soon Jitto

MenghaoGuo 357 Dec 11, 2022
Code for Understanding Pooling in Graph Neural Networks

Select, Reduce, Connect This repository contains the code used for the experiments of: "Understanding Pooling in Graph Neural Networks" Setup Install

Daniele Grattarola 37 Dec 13, 2022
Empowering journalists and whistleblowers

Onymochat Empowering journalists and whistleblowers Onymochat is an end-to-end encrypted, decentralized, anonymous chat application. You can also host

Samrat Dutta 19 Sep 02, 2022
📚 A collection of all the Deep Learning Metrics that I came across which are not accuracy/loss.

📚 A collection of all the Deep Learning Metrics that I came across which are not accuracy/loss.

Rahul Vigneswaran 1 Jan 17, 2022
PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

End-to-End Coreference Resolution with Different Higher-Order Inference Methods This repository contains the implementation of the paper: Revealing th

Liyan 52 Jan 04, 2023
Study of human inductive biases in CNNs and Transformers.

Are Convolutional Neural Networks or Transformers more like human vision? This repository contains the code and fine-tuned models of popular Convoluti

Shikhar Tuli 39 Dec 08, 2022
Adds timm pretrained backbone to pytorch's FasterRcnn model

Operating Systems Lab (ETCS-352) Experiments for Operating Systems Lab (ETCS-352) performed by me in 2021 at uni. All codes are written by me except t

Mriganka Nath 12 Dec 03, 2022
RRL: Resnet as representation for Reinforcement Learning

Resnet as representation for Reinforcement Learning (RRL) is a simple yet effective approach for training behaviors directly from visual inputs. We demonstrate that features learned by standard image

Meta Research 21 Dec 07, 2022
An imperfect information game is a type of game with asymmetric information

DecisionHoldem An imperfect information game is a type of game with asymmetric information. Compared with perfect information game, imperfect informat

Decision AI 25 Dec 23, 2022
Symbolic Music Generation with Diffusion Models

Symbolic Music Generation with Diffusion Models Supplementary code release for our work Symbolic Music Generation with Diffusion Models. Installation

Magenta 119 Jan 07, 2023