A framework for feature exploration in Data Science

Related tags

ScienceBeehive
Overview

Beehive

A framework for feature exploration in Data Science

Background

What do we do when we finish one episode of feature exploration in a jupyter notebook?
We overwrite the notebook or clone the notebook for the next episode of feature exploration.

I say this is inefficient, thus this project.

Purpose

To help documentation for grouping all episode of feature exploration in a single notebook.

Project Elaboration

terms

episode: one set of feature trained on a model.
e.g.
episode_1 = (sepal_length, sepal_width) | (target_class)
episode_2 = (sepal_length, sepal_width, petal_length, petal_width) | (target_class)
episode_3 = (sepal_length_normalized, sepal_width_normalized, petal_length_normalized, petal_width_normalized) | (target_class)

get to know the classes

Combee

One Combee should represent one model and its features includes: (metrics, confusion matrix, feature importances)
(sklearn-based)

CombeKeras

One Combee should represent one model and its features includes: (metrics, confusion matrix)
(keras-based)
needed because Keras has different features and functions compared to sklearn

Vespiqueen

One Vespiqueen should represent one episode.
Vespiqueen memorizes the tranformation of data.
Vespiqueen governs all the Combees.
Can be used to pick which model is best in a Vespiqueen.

Beehive

Beehive governs the whole Vespiqueen.
Beehive is used to print all the metrics across Vespiqueen (or episodes)
Can also be used to pick new features from the last or best vespiqueen for the next episode.

How to Use

Owner
Steven IJ
Steven IJ
An open-source application for biological image analysis

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure

CellProfiler 734 Jan 08, 2023
Read-only mirror of https://gitlab.gnome.org/GNOME/pybliographer

Pybliographer Pybliographer provides a framework for working with bibliographic databases. This software is licensed under the GPLv2. For more informa

GNOME Github Mirror 15 May 07, 2022
Open Delmic Microscope Software

Odemis Odemis (Open Delmic Microscope Software) is the open-source microscopy software of Delmic B.V. Odemis is used for controlling microscopes of De

Delmic 32 Dec 14, 2022
Veusz scientific plotting application

Veusz 3.3.1 Veusz is a scientific plotting package. It is designed to produce publication-ready PDF or SVG output. Graphs are built-up by combining pl

Veusz 613 Dec 16, 2022
3D visualization of scientific data in Python

Mayavi: 3D visualization of scientific data in Python Mayavi docs: http://docs.enthought.com/mayavi/mayavi/ TVTK docs: http://docs.enthought.com/mayav

Enthought, Inc. 1.1k Jan 06, 2023
Float2Binary - A simple python class which finds the binary representation of a floating-point number.

Float2Binary A simple python class which finds the binary representation of a floating-point number. You can find a class in IEEE754.py file with the

Bora Canbula 3 Dec 14, 2021
A modular single-molecule analysis interface

MOSAIC: A modular single-molecule analysis interface MOSAIC is a single molecule analysis toolbox that automatically decodes multi-state nanopore data

National Institute of Standards and Technology 35 Dec 13, 2022
CS 506 - Computational Tools for Data Science

CS 506 - Computational Tools for Data Science Code, slides, and notes for Boston University CS506 Fall 2021 The Final Project Repository can be found

Lance Galletti 14 Mar 23, 2022
SCICO is a Python package for solving the inverse problems that arise in scientific imaging applications.

Scientific Computational Imaging COde (SCICO) SCICO is a Python package for solving the inverse problems that arise in scientific imaging applications

Los Alamos National Laboratory 37 Dec 21, 2022
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

Blue Collar Bioinformatics 915 Dec 29, 2022
🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Orange Data Mining Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no program

Bioinformatics Laboratory 3.9k Jan 05, 2023
CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation

CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation. The code should run on any Linux system, from massively parallel computer

Jeppe Dakin 62 Dec 08, 2022
Mathics is a general-purpose computer algebra system (CAS). It is an open-source alternative to Mathematica

Mathics is a general-purpose computer algebra system (CAS). It is an open-source alternative to Mathematica. It is free both as in "free beer" and as in "freedom".

Mathics 535 Jan 04, 2023
A mathematica expression evaluator with PokemonTypes

A simple mathematical expression evaluator that uses Pokemon types to replace symbols.

Arnav Jindal 2 Nov 14, 2021
Zipline, a Pythonic Algorithmic Trading Library

Zipline is a Pythonic algorithmic trading library. It is an event-driven system for backtesting. Zipline is currently used in production as the backte

Quantopian, Inc. 15.7k Jan 07, 2023
artisan: visual scope for coffee roasters

Artisan Visual scope for coffee roasters WARNING: pre-release builds may not work. Use at your own risk. Summary Artisan is a software that helps coff

Artisan – Visual Scope for Coffee Roasters 705 Jan 05, 2023
3D medical imaging reconstruction software

InVesalius InVesalius generates 3D medical imaging reconstructions based on a sequence of 2D DICOM files acquired with CT or MRI equipments. InVesaliu

443 Jan 01, 2023
Statsmodels: statistical modeling and econometrics in Python

About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an

statsmodels 8.1k Dec 30, 2022
AnuGA for the simulation of the shallow water equation

ANUGA Contents ANUGA What is ANUGA? Installation Documentation and Help Mailing Lists Web sites Latest source code Bug reports Developer information L

Geoscience Australia 147 Dec 14, 2022
A framework for feature exploration in Data Science

Beehive A framework for feature exploration in Data Science Background What do we do when we finish one episode of feature exploration in a jupyter no

Steven IJ 1 Jan 03, 2022