PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA).

Related tags

Data AnalysisPCAfold
Overview

License: MIT Documentation Status GitLab Binder

PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA). It incorporates a variety of data preprocessing tools (including data clustering and sampling), uses PCA as a dimensionality reduction technique and utilizes a novel approach to assess the quality of the obtained low-dimensional manifolds.

Citing PCAfold

PCAfold is published in the SoftwareX journal. If you use PCAfold in a scientific publication, you can cite the software as:

Zdybał, K., Armstrong, E., Parente, A. and Sutherland, J.C., 2020. PCAfold: Python software to generate, analyze and improve PCA-derived low-dimensional manifolds. SoftwareX, 12, p.100630.

or using BibTeX:

@article{pcafold2020,
title = "PCAfold: Python software to generate, analyze and improve PCA-derived low-dimensional manifolds",
journal = "SoftwareX",
volume = "12",
pages = "100630",
year = "2020",
issn = "2352-7110",
doi = "https://doi.org/10.1016/j.softx.2020.100630",
url = "http://www.sciencedirect.com/science/article/pii/S2352711020303435",
author = "Kamila Zdybał and Elizabeth Armstrong and Alessandro Parente and James C. Sutherland"
}

PCAfold documentation

PCAfold documentation contains a thorough user guide including equations, references and example code snippets. Numerous illustrative tutorials and demos are presented as well. The corresponding Jupyter notebooks can be found in the docs/tutorials directory.

Software architecture

A general overview for using PCAfold modules is presented in the diagram below:

Screenshot

Each module's functionalities can also be used as a standalone tool for performing a specific task and can easily combine with techniques outside of this software, such as K-Means algorithm or Artificial Neural Networks.

Installation

Dependencies

PCAfold requires python3.7 and the following packages:

  • Cython
  • matplotlib
  • numpy
  • scipy
  • termcolor

Build from source

Clone the PCAfold repository and move into the PCAfold directory created:

git clone http://gitlab.multiscale.utah.edu/common/PCAfold.git
cd PCAfold

Run the setup.py script as below to complete the installation:

python3.7 setup.py build_ext --inplace
python3.7 setup.py install

You are ready to import PCAfold!

Testing

To run regression tests from the base repo directory run:

python3.7 -m unittest discover

To switch verbose on, use the -v flag.

All tests should be passing. If any of the tests is failing and you can’t sort out why, please open an issue on GitLab.

Authors and contacts

Owner
Burn Research
Burn Research
Used for data processing in machine learning, and help us to construct ML model more easily from scratch

Used for data processing in machine learning, and help us to construct ML model more easily from scratch. Can be used in linear model, logistic regression model, and decision tree.

ShawnWang 0 Jul 05, 2022
Weather Image Recognition - Python weather application using series of data

Weather Image Recognition - Python weather application using series of data

Kushal Shingote 1 Feb 04, 2022
pipeline for migrating lichess data into postgresql

How Long Does It Take Ordinary People To "Get Good" At Chess? TL;DR: According to 5.5 years of data from 2.3 million players and 450 million games, mo

Joseph Wong 182 Nov 11, 2022
PySpark Structured Streaming ROS Kafka ApacheSpark Cassandra

PySpark-Structured-Streaming-ROS-Kafka-ApacheSpark-Cassandra The purpose of this project is to demonstrate a structured streaming pipeline with Apache

Zekeriyya Demirci 5 Nov 13, 2022
Desafio proposto pela IGTI em seu bootcamp de Cloud Data Engineer

Desafio Modulo 4 - Cloud Data Engineer Bootcamp - IGTI Objetivos Criar infraestrutura como código Utuilizando um cluster Kubernetes na Azure Ingestão

Otacilio Filho 4 Jan 23, 2022
NumPy aware dynamic Python compiler using LLVM

Numba A Just-In-Time Compiler for Numerical Functions in Python Numba is an open source, NumPy-aware optimizing compiler for Python sponsored by Anaco

Numba 8.2k Jan 07, 2023
Ejercicios Panda usando Pandas

Readme Below we add configuration details to locally test your application To co

1 Jan 22, 2022
Catalogue data - A Python Scripts to prepare catalogue data

catalogue_data Scripts to prepare catalogue data. Setup Clone this repo. Install

BigScience Workshop 3 Mar 03, 2022
Udacity-api-reporting-pipeline - Udacity api reporting pipeline

udacity-api-reporting-pipeline In this exercise, you'll use portions of each of

Fabio Barbazza 1 Feb 15, 2022
Data analysis and visualisation projects from a range of individual projects and applications

Python-Data-Analysis-and-Visualisation-Projects Data analysis and visualisation projects from a range of individual projects and applications. Python

Tom Ritman-Meer 1 Jan 25, 2022
Data Analysis for First Year Laboratory at Imperial College, London.

Data Analysis for First Year Laboratory at Imperial College, London. For personal reference only, and to reference in lab reports and lab books.

Martin He 0 Aug 29, 2022
Making the DAEN information accessible.

The purpose of this repository is to make the information on Australian COVID-19 adverse events accessible. The Therapeutics Goods Administration (TGA) keeps a database of adverse reactions to medica

10 May 10, 2022
Snakemake workflow for converting FASTQ files to self-contained CRAM files with maximum lossless compression.

Snakemake workflow: name A Snakemake workflow for description Usage The usage of this workflow is described in the Snakemake Workflow Catalog. If

Algorithms for reproducible bioinformatics (Koesterlab) 1 Dec 16, 2021
Time ranges with python

timeranges Time ranges. Read the Docs Installation pip timeranges is available on pip: pip install timeranges GitHub You can also install the latest v

Micael Jarniac 2 Sep 01, 2022
PATC: Introduction to Big Data Analytics. Practical Data Analytics for Solving Real World Problems

PATC: Introduction to Big Data Analytics. Practical Data Analytics for Solving Real World Problems

1 Feb 07, 2022
Very basic but functional Kakuro solver written in Python.

kakuro.py Very basic but functional Kakuro solver written in Python. It uses a reduction to exact set cover and Ali Assaf's elegant implementation of

Louis Abraham 4 Jan 15, 2022
MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

SisonkeBiotik 6 Nov 30, 2022
CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner.

CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner. It is aimed to integrate this tool with several more features including providing a U

Ravi Prakash 3 Jun 27, 2021
First steps with Python in Life Sciences

First steps with Python in Life Sciences This course material is part of the "First Steps with Python in Life Science" three-day course of SIB-trainin

SIB Swiss Institute of Bioinformatics 22 Jan 08, 2023
Renato 214 Jan 02, 2023