Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

Last update: Dec 28, 2022

Overview

Full Spectrum Bioinformatics is a free online text designed to introduce key topics in Bioinformatics using the Python programming language. The text is written in interactive Jupyter Notebooks, which allow you to try out and modify example code and analyses.

In addition to explanations of concepts, Full Spectrum Bioinformatics also includes Bioinformatics Vignettes written by readers of the text. Each vignette is focused around a particular core concept, and show how readers have applied that concepts to their research projects.

If you happen to already be familiar with GitHub and Jupyter Notebooks, you can download the entire project and run it interactively, or click the 'Open in Colab' links to open interactive versions of each section in Google Colab (you will need to 'Save as' your own copy in order to change code). You can also view a static version of each section using the nbviewer links. If using the direct GitHub links, you may sometimes get a GitHub error message. Usually hitting reload page or using the nbviewer link avoids this issue.

Lead Author: Jesse Zaneveld¹
Vignette Authors: Nia Prabhu^*¹, Aziz Bajouri^*^1,2, Ayomikun Akinrinade^{*^{1,3

* Vignette authors contributed equally and are listed in chronological order of first contribution.
1 Division of Biological Sciences, School of STEM, University of Washington, Bothell, Washington, USA
2 Division of Computer and Software Systems, School of STEM, University of Washington, Bothell, Washington, USA
3 Division of Health Studies, School of Nursing and Health Studies, University of Washington, Bothell, Washington, USA}}

The text is currently in prototype status. Chapters with content you can preview are linked below:

Chapter 1. Foreword
Chapter 2. Introduction
- The Many Paths to Bioinformatics
- Speaking Each Other's Language
  - An Absurdly Brief Introduction to Biology
  - An Absurdly Brief Introduction to Computer Science
  - An Absurdly Brief Introduction to Statistics
Chapter 3. The Command Line
- Using the Command Line
- Exercise: Little Brother is Missing
Chapter 4. Exploring Python
- Warm-up Exercise: Spot the Difference
- Exploring Python
- A Tour of Python Data Types
- A Tour of Python Syntax (functions, conditions, iteration, classes)
Chapter 5. Project Design
- Using Literature Surveys to Ask Good Questions and Propose Testable Hypotheses
Chapter 6. Biological Sequences
- An introduction to Biological Sequences
- Representing and Manipulating Biological Sequences as Python Strings
- Analyzing Biological Sequences with For Loops and If Statements
- Reading and writing FASTA files using Python
- Bioinformatics Vignette (Aziz Bajouri): Using set objects to find circular RNAs involved in multiple diseases
- Exercise: Error Bingo
- Error Messages in Python
- Bioinformatics Vignette (Nia Prabhu): Using For Loops and Dictionaries to Compare Nucleotide Composition in Pandemic and Non-Pandemic Causing Influenza Strains
- Capstone: testing for depletion of CG dinucleotides in the human genome
Chapter 7. 'Omics
- An Introduction to 'Omics
- Working with Tabular 'Omic data in Python using Pandas
- Analyzing Microbiome Alpha Diversity in Python
- Analyzing Microbiome Beta Diversity in Python
- Simulating the Effect of Sequencing Depth on Diversity Estimates
Chapter 8. Visualization
- Graphs as a Visual Language
- Exercise: Anger Tufte
- Representing Correlation
- Representing Distribution
Chapter 9. Alignment and Phylogenetics
- 9a. Alignment
- Homology and Alignment
- Global Alignment with the Needleman-Wunsch algorithm
- Local Alignment with the Smith-Waterman algorithm
- BLAST and the k-mer trick
- Exercise: Duck vs. Yeast
- 9b. Phylogenetics
- Tree thinking
- Representing Phylogenetic Trees with Python Classes
- Generating Trees Using Birth-Death Models
- Working with Traits on Trees
- Maximum Parsimony Ancestral State Reconstruction
- Hidden State Prediction
- Phylogenetic Comparative Methods
Chapter 10. Simulation
- Simulating Biological Networks
- Simulating the Population Genetics of Natural Selection and Genetic Drift
- Simulating the Evolution of Social Behavior
Chapter 11. Statistics
- Linear Models - a Statistical Swiss Army Knife
- Monte Carlo simulation and the Fundamental Unity of Statistical Hypothesis Tests
- Statistical Distributions and Parametric Tests
- Rank Transformations
- Monte Carlo simulation of Effect Size, Sample Size, and Significance
- Dealing with Multiple Comparisons
- Exercise: Revising your writing about statistical results
- An Introduction to Maximum Likelihood optimization
- The Best Model of A Cat is a Cat - model complexity, overfitting, and the AIC
- An Introduction to Bayesian Approaches
Chapter 12. Multivariate Statistics and Machine Learning
- Unsupervised Classification: of ordination, clustering and fishtanks
- Supervised Classification: from lines to trees to forests.
- Bioinformatics Vignette (Ayomikun Akinrinade): Using K-Nearest Neighbors and Binary Decision Tree Algorithms to Predict Enzyme Function from Protein Sequences
Chapter 13. Presenting Research
- Presentations as Verbal Chess
Chapter 14. Polishing and Publishing
- Presenting Research
- From Data to Conclusion: building a research manuscript brick by brick
- Resistance is Futile: becoming a language Borg
- Exercise: generating a targeted title using templating
- The Inverted Pyramid: optimizing your text from a reader's perspective
Chapter 15. Careers that draw on Bioinformatics
- Fighting for an Inclusive Workplace
  - Examining Privilege and Identity
  - Making Your Science and Teaching Accessible and Inclusive
  - Campus and Local Activism
  - Improving University Policy
- Happiness Matters
- Radical Collaboration
- Cognitive Bias and Networking
- Open-source Science as Shield and Sword
- Applying for Grants
Appendices:
- Appendix A - Data Sources for Bioinformatics Projects
- Appendix B - Timesaving Starter Code
  - Template Script with Interface and Test Code
  - IUPAC codes in python
  - Standard Translation Tables in Python
- Appendix C - Contributing a Community Example
- Appendix D - Paper Formatting Kit
- Appendix E - Project Specifications

This project is being developed with support from NSF Integrative and Organismal Systems award .

Feedback

You can submit feedback about completed chapters at the following link

Comments

Bump nokogiri from 1.10.9 to 1.11.1
Bumps nokogiri from 1.10.9 to 1.11.1.

Release notes

Sourced from nokogiri's releases.

v1.11.1 / 2021-01-06

Fixed

[CRuby] If libxml-ruby is loaded before nokogiri, the SAX and Push parsers no longer call libxml-ruby's handlers. Instead, they defensively override the libxml2 global handler before parsing. [#2168]

SHA-256 Checksums of published gems

a41091292992cb99be1b53927e1de4abe5912742ded956b0ba3383ce4f29711c nokogiri-1.11.1-arm64-darwin.gem d44fccb8475394eb71f29dfa7bb3ac32ee50795972c4557ffe54122ce486479d nokogiri-1.11.1-java.gem f760285e3db732ee0d6e06370f89407f656d5181a55329271760e82658b4c3fc nokogiri-1.11.1-x64-mingw32.gem dd48343bc4628936d371ba7256c4f74513b6fa642e553ad7401ce0d9b8d26e1f nokogiri-1.11.1-x86-linux.gem 7f49138821d714fe2c5d040dda4af24199ae207960bf6aad4a61483f896bb046 nokogiri-1.11.1-x86-mingw32.gem 5c26111f7f26831508cc5234e273afd93f43fbbfd0dcae5394490038b88d28e7 nokogiri-1.11.1-x86_64-darwin.gem c3617c0680af1dd9fda5c0fd7d72a0da68b422c0c0b4cebcd7c45ff5082ea6d2 nokogiri-1.11.1-x86_64-linux.gem 42c2a54dd3ef03ef2543177bee3b5308313214e99f0d1aa85f984324329e5caa nokogiri-1.11.1.gem

v1.11.0 / 2021-01-03

Notes

Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

"Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.

We've been shipping native Windows gems since 2009, but starting in v1.11.0 we are also shipping native gems for these platforms:

Linux: x86-linux and x86_64-linux -- including musl platforms like alpine

OSX/Darwin: x86_64-darwin and arm64-darwin

We'd appreciate your thoughts and feedback on this work at #2075.

Dependencies

Ruby

This release introduces support for Ruby 2.7 and 3.0 in the precompiled native gems.

This release ends support for:

Ruby 2.3, for which official support ended on 2019-03-31 [#1886] (Thanks @ashmaroli!)

Ruby 2.4, for which official support ended on 2020-04-05

JRuby 9.1, which is the Ruby 2.3-compatible release.

Gems

... (truncated)

Changelog

Sourced from nokogiri's changelog.

v1.11.1 / 2021-01-06

Fixed

[CRuby] If libxml-ruby is loaded before nokogiri, the SAX and Push parsers no longer call libxml-ruby's handlers. Instead, they defensively override the libxml2 global handler before parsing. [#2168]

v1.11.0 / 2021-01-03

Notes

Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

"Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.

We've been shipping native Windows gems since 2009, but starting in v1.11.0 we are also shipping native gems for these platforms:

Linux: x86-linux and x86_64-linux -- including musl platforms like alpine

OSX/Darwin: x86_64-darwin and arm64-darwin

We'd appreciate your thoughts and feedback on this work at #2075.

Dependencies

Ruby

This release introduces support for Ruby 2.7 and 3.0 in the precompiled native gems.

This release ends support for:

Ruby 2.3, for which official support ended on 2019-03-31 [#1886] (Thanks @ashmaroli!)

Ruby 2.4, for which official support ended on 2020-04-05

JRuby 9.1, which is the Ruby 2.3-compatible release.

Gems

Explicitly add racc as a runtime dependency. [#1988] (Thanks, @voxik!)

[MRI] Upgrade mini_portile2 dependency from ~> 2.4.0 to ~> 2.5.0 [#2005] (Thanks, @alejandroperea!)

Security

See note below about CVE-2020-26247 in the "Changed" subsection entitled "XML::Schema parsing treats input as untrusted by default".

Added

Add Node methods for manipulating "keyword attributes" (for example, class and rel): #kwattr_values, #kwattr_add, #kwattr_append, and #kwattr_remove. [#2000]

... (truncated)

Commits

7be6f04 version bump to v1.11.1

aa0c399 dev: overhaul .gitignore

3d90c6d Merge pull request #2169 from sparklemotion/2168-active-support-test-failure

bbf850c changelog: update for #2168

ee69772 ci: another valgrind suppression

f9a2c4e fix: restore proper error handling in the SAX push parser

35aa88b fix(cruby): reset libxml2's error handler in sax and push parsers

07459fd fix(test): clobber libxml2's global error handler before every test

b682ac5 ci: ensure all tests are running setup

007662f github: update "installation difficulty" issue template

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 2
Missing reading response link on "Error Messages in Python"

There is no reading response link at the bottom of content/04_exploring_python/error_messages_in_python.ipynb. Additionally, on the reading response form, there is no entry for this reading.

opened by LucaOnline 1
Add discussion of HISAT2 & transcriptomics

HiSat2 https://anaconda.org/bioconda/hisat2

Salmon intro (another alternative that interoperates well with DESeq2) https://combine-lab.github.io/salmon/getting_started/

opened by zaneveld 0
Literature Synthesis section -- discuss cutting extra phrases that don't add meaning in literature

In addition we found a more recent study that showed that [research finding] (cite1;cite2). --> [research finding]

In a 2016 study it was shown that [finding])(cite1) --> finding

opened by zaneveld 0
More database links: https://www.cbioportal.org/ (Cancer research database) https://www.idigbio.org/ (Integrated digitized biocollections) https://www.gbif.org/ (biodiversity data) https://bceenetwork.org/cure-summaries/ https://docs.google.com/document/d/1gC-sj3p8aUKgEDxVPJfq793Mm4n5niZm/edit (overview of databases for genes and genomics for cancer)

Open resources shared in the 2022 AACU Talks (CUREing Cancer: How a Virtual Cancer Genomics CURE Made Research Accessible to Students During COVID and another was on Expanding Access to Undergraduate Research Through BCEENET Cures Using Digitized Collections Data) on CUREs (shared by Robin Angotti):

https://www.cbioportal.org/ (Cancer research database)
https://www.idigbio.org/ (Integrated digitized biocollections) https://www.gbif.org/ (biodiversity data) https://bceenetwork.org/cure-summaries/ https://docs.google.com/document/d/1gC-sj3p8aUKgEDxVPJfq793Mm4n5niZm/edit (overview of databases for genes and genomics for cancer)

opened by zaneveld 0

Releases(release-2022.3.1)

release-2022.3.1(Mar 2, 2022)

What's Changed

The 2022.3.1 Release of Full Spectrum Bioinformatics greatly expands the scope and maturity of the text, including contributions from 3 undergraduate co-authors. This text has now been used to support multiple classes, and has 35 sections that are linked from the table of content and ready for classroom use.

Here are some of the major changes:

The text has several new sections: -- An overview of python syntax now overviews how to recognize python syntax before we dive into studying the details -- A first chapter on sequence alignment now covers Needleman-Wunsch alignment, both as worked by hand using a simple example, and an implementation in numpy. -- The text now discusses linear models, with accompanying illustrations as well as figures -- An Error Bingo exercise now encourages students to intentionally trigger and learn from errors
-- An extensive section has been added discussing common errors in python, why they most commonly occur, and how to fix them.

-- 3 undergraduate contributors have added Bioinformatics Vignettes showing how to apply the principles in the text to biological problems: - Nia Prabhu (nucleotide composition) - Aziz Bajouri (set analysis) - Ayomikun Akinrinade (machine learning)

-- A section has been added on revising writing about statistical results -- An initial draft section on visualizing correlation has been added showing how a scatterplot can be revised to add linear regression results, 95% confidence intervals, and to better meet recommendations for data visualization. -- The Data Sources page has been greatly updated, and now includes logos for linked resources

New Draft Sections: -- A draft section on student activism and fighting for an inclusive workplace has been added. -- A draft section on network analysis has several in-progress code commits (not yet linked from main table of contents)

Other changes: -- Full Spectrum Bioinformatics has now adopted a code of conduct -- Many minor fixes -- Exercises have been added to many sections that previously lacked them -- The exercise on calculating CG content in the human genome has been updated -- Several chapters have been updated to include Feedback links that were previously missing -- Unused Jupyter Book files have been removed

Full Changelog: https://github.com/zaneveld/full_spectrum_bioinformatics/compare/release-2020.12.1...release-2022.3.1
Source code(tar.gz)
Source code(zip)
full_spectrum_bioinformatics_2022.3.0.zip(182.17 MB)
release-2020.12.1(Dec 8, 2020)

This is an initial development release of the Full Spectrum Bioinformatics online textbook. This is not a full release of the entire planned textbook, but rather an incremental development release of some content that is sufficiently developed that it has been used in classes.

Some current features include: -- A series of open-access Jupyter Notebooks discussing topics in Bioinformatics. -- Links to Google Colab to allow students to run notebooks in a browser without installing software -- An outline table of contents shows planned sections, with sections that are in beta status available as live links. -- This release includes 21 new sections, covering topics ranging from sequence analysis to how to revise one's writing about statistical results:

Foreword The Command Line Using the Command Line Exercise: Little Brother is Missing Exploring Python Exploring Python A Tour of Python Data Types Project Design Using Literature Surveys to Ask Good Questions and Propose Testable Hypotheses Biological Sequences An introduction to Biological Sequences Representing and Manipulating Biological Sequences as Python Strings Analyzing Biological Sequences with For Loops and If Statements Reading and writing FASTA files using Python 'Omics An Introduction to 'Omics Working with Tabular 'Omic data in Python using Pandas Phylogenetic Trees Representing Phylogenetic Trees with Python Classes Generating Trees Using Birth-Death Models Simulation Simulating the Population Genetics of Natural Selection and Genetic Drift Statistics Rank Transformations Monte Carlo simulation of Effect Size, Sample Size, and Significance Dealing with Multiple Comparisons Exercise: Revising your writing about statistical results Polishing and Publishing Presenting Research Careers that draw on Bioinformatics Applying for Grants

NOTE: this is very similar to release-2020.12.0, other than minor edits to the readme but I need to re-release to trigger Zenodo to generate a DOI.
Source code(tar.gz)
Source code(zip)
release-2020.12.0(Dec 7, 2020)

This is an initial development release of the Full Spectrum Bioinformatics online textbook. This is not a full release of the entire planned textbook, but rather an incremental development release of some content that is sufficiently developed that it has been used in classes.

Some current features include: -- A series of open-access Jupyter Notebooks discussing topics in Bioinformatics. -- Links to Google Colab to allow students to run notebooks in a browser without installing software -- An outline table of contents shows planned sections, with sections that are in beta status available as live links. -- This release includes 21 new sections, covering topics ranging from sequence analysis to how to revise one's writing about statistical results:

Foreword The Command Line Using the Command Line Exercise: Little Brother is Missing Exploring Python Exploring Python A Tour of Python Data Types Project Design Using Literature Surveys to Ask Good Questions and Propose Testable Hypotheses Biological Sequences An introduction to Biological Sequences Representing and Manipulating Biological Sequences as Python Strings Analyzing Biological Sequences with For Loops and If Statements Reading and writing FASTA files using Python 'Omics An Introduction to 'Omics Working with Tabular 'Omic data in Python using Pandas Phylogenetic Trees Representing Phylogenetic Trees with Python Classes Generating Trees Using Birth-Death Models Simulation Simulating the Population Genetics of Natural Selection and Genetic Drift Statistics Rank Transformations Monte Carlo simulation of Effect Size, Sample Size, and Significance Dealing with Multiple Comparisons Exercise: Revising your writing about statistical results Polishing and Publishing Presenting Research Careers that draw on Bioinformatics Applying for Grants
Source code(tar.gz)
Source code(zip)
full_spectrum_bioinformatics.zip(84.89 MB)

Owner

Jesse Zaneveld

GitHub Repository

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Speech-Backbones This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab. Grad-TTS Official implementation of the Grad-

295 Jan 07, 2023

NLP tool to extract emotional phrase from tweets 🤩

Emotional phrase extractor Extract phrase in the given text that is used to express the sentiment. Capturing sentiment in language is important in the

38 Oct 17, 2022

Making text a first-class citizen in TensorFlow.

TensorFlow Text - Text processing in Tensorflow IMPORTANT: When installing TF Text with pip install, please note the version of TensorFlow you are run

1k Dec 26, 2022

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2. It is trained (finetuned) on a curated list of approximately 45K Python (~470MB) files gathered from the

91 Sep 23, 2022

StarGAN - Official PyTorch Implementation

StarGAN - Official PyTorch Implementation ***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 ***** This repository provides t

5.1k Dec 30, 2022

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

Pattern-Exploiting Training (PET) This repository contains the code for Exploiting Cloze Questions for Few-Shot Text Classification and Natural Langua

1.4k Dec 30, 2022

Natural Language Processing Specialization

Natural Language Processing Specialization In this folder, Natural Language Processing Specialization projects and notes can be found. WHAT I LEARNED

3 Oct 06, 2022

List of GSoC organisations with number of times they have been selected.

Welcome to GSoC Organisation Frequency And Details 👋 List of GSoC organisations with number of times they have been selected, techonologies, topics,

41 Oct 01, 2022

Finding Label and Model Errors in Perception Data With Learned Observation Assertions

Finding Label and Model Errors in Perception Data With Learned Observation Assertions This is the project page for Finding Label and Model Errors in P

17 Oct 14, 2022

An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

GPT-NeoX An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hun

3.1k Jan 08, 2023

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 Billion Parameters) on a single 16 GB VRAM V100 Google Cloud instance with Huggingfa

289 Jan 06, 2023

Various Algorithms for Short Text Mining

Short Text Mining in Python Introduction This package shorttext is a Python package that facilitates supervised and unsupervised learning for short te

466 Dec 06, 2022

🌐 Translation microservice powered by AI

Dot Translate 🌐 A microservice for quick and local translation using A.I. This service starts a local webserver used for neural machine translation.

48 Nov 22, 2022

⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡

Translations 🇩🇪 DE 🇫🇷 FR 🇭🇺 HU 🇮🇩 ID 🇮🇹 IT 🇳🇱 NL 🇧🇷 PT-BR 🇷🇺 RU 🇨🇳 ZH ➡️ Documentation | Discord | Installation Guide ⬅️ Fully autom

11.2k Jan 05, 2023

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

simple_diarizer Simplified diarization pipeline using some pretrained models. Made to be a simple as possible to go from an input audio file to diariz

65 Dec 30, 2022

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Multilingual Latent Dirichlet Allocation (LDA) Pipeline This project is for text clustering using the Latent Dirichlet Allocation (LDA) algorithm. It

74 Oct 07, 2022

Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

Related tags

Overview

Feedback

Comments

Bump nokogiri from 1.10.9 to 1.11.1

v1.11.1 / 2021-01-06

Fixed

SHA-256 Checksums of published gems

v1.11.0 / 2021-01-03

Notes

Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

Dependencies

Ruby

Gems

v1.11.1 / 2021-01-06

Fixed

v1.11.0 / 2021-01-03

Notes

Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

Dependencies

Ruby

Gems

Security

Added

Missing reading response link on "Error Messages in Python"

Add discussion of HISAT2 & transcriptomics

Literature Synthesis section -- discuss cutting extra phrases that don't add meaning in literature