Python & Julia port of codes in excellent R books

Overview

X4DS

This repo is a collection of

Python & Julia port of codes in the following excellent R books:

Python Stack Julia Stack
Language
Version
v3.9 v1.7
Data
Processing
  • Pandas
  • DataFrames
  • Visualization
  • Matplotlib
  • Seaborn
  • MakiE
  • AlgebraOfGraphics
  • Machine
    Learning
  • Scikit-Learn
  • MLJ
  • Probablistic
    Programming
  • PyMC
  • Turing
  • Code Styles

    2.1. Basics

    • prefer enumerate() over range(len())
    xs = range(3)
    
    # good
    for ind, x in enumerate(xs):
      print(f'{ind}: {x}')
    
    # bad
    for i in range(len(xs)):
      print(f'{i}: {xs[i]}')

    2.2. Matplotlib

    including seaborn

    • prefer Axes object over Figure object
    • use constrained_layout=True when draw subplots
    # good
    _, axes = plt.subplots(1, 2, constrained_layout=True)
    axes[0].plot(x1, y1)
    axes[1].hist(x2, y2)
    
    # bad
    plt.subplot(121)
    plt.plot(x1, y1)
    plt.subplot(122)
    plt.hist(x2, y2)
    • prefer axes.flatten() over plt.subplot() in cases where subplots' data is iterable
    • prefer zip() or enumerate() over range() for iterable objects
    # good
    _, ax = plt.subplots(2, 2, figsize=[12,8],constrained_layout=True)
    
    for ax, x, y in zip(axes.flatten(), xs, ys):
      ax.plot(x, y)
    
    # bad
    for i in range(4):
      ax = plt.subplot(2, 2, i+1)
      ax.plot(x[i], y[i])
    • prefer set() method over set_*() method
    # good
    ax.set(xlabel='x', ylabel='y')
    
    # bad
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    • Prefer despine() over ax.spines[*].set_visible()
    # good
    sns.despine()
    
    # bad
    ax.spines["top"].set_visible(False)
    ax.spines["bottom"].set_visible(False)
    ax.spines["right"].set_visible(False)
    ax.spines["left"].set_visible(False)

    2.3. Pandas

    • prefer df['col'] over df.col
    # good
    movies['duration']
    
    # bad
    movies.duration
    • prefer df.query over df[] or df.loc[] in simple-selection
    # good
    movies.query('duration >= 200')
    
    # bad
    movies[movies['duration'] >= 200]
    movies.loc[movies['duration'] >= 200, :]
    • prefer df.loc and df.iloc over df[] in multiple-selection
    # good
    movies.loc[movies['duration'] >= 200, 'genre']
    movies.iloc[0:2, :]
    
    # bad
    movies[movies['duration'] >= 200].genre
    movies[0:2]

    LaTeX Styles

    Multiple lines

    Reduce the use of begin{array}...end{array}

    • equations: begin{aligned}...end{aligned}
    $$
    \begin{aligned}
    y_1 = x^2 + 2*x \\
    y_2 = x^3 + x
    \end{aligned}
    $$
    • equations with conditions: begin{cases}...end{cases}
    $$
    \begin{cases}
    y = x^2 + 2*x & x > 0 \\
    y = x^3 + x & x ≤ 0
    \end{cases}
    $$
    • matrix: begin{matrix}...end{matrix}
    $$
    \begin{vmatrix}
      a + a^′ & b + b^′ \\ c & d
      \end{vmatrix}= \begin{vmatrix}
      a & b \\ c & d
      \end{vmatrix} + \begin{vmatrix}
      a^′ & b^′ \\ c & d
    \end{vmatrix}
    $$

    Brackets

    • prefer \Bigg...\Bigg over \left...\right
    $$
    A\Bigg[v_1\ v_2\ \ v_r\Bigg]
    $$
    • prefer \underset{}{} over \underset{}
    $$
    \underset{θ}{\mathrm{argmax}}\ p(x_i|θ)
    $$

    Expressions

    • prefer ^{\top} over ^T for transpose

    $$ 𝐀^⊤ $$

    $$
    𝐀^{\top}
    $$
    • prefer \to over \rightarrow for limit

    $$ \lim_{n → ∞} $$

    $$
    \lim_{n\to \infty}
    $$
    • prefer underset{}{} over \limits_

    $$ \underset{w}{\rm argmin}\ (wx +b) $$

    $$
    \underset{w}{\rm argmin}\ (wx +b)
    $$

    Fonts

    • prefer \mathrm over \mathop or \operatorname
    $$
    θ_{\mathrm{MLE}}=\underset{θ}{\mathrm{argmax}}\ ∑_{i = 1}^{N}\log p(x_i|θ)
    $$

    ISLR

    References

    style <style> table { border-collapse: collapse; text-align: center; } </style>
    Owner
    Gitony
    Gitony
    A data visualization curriculum of interactive notebooks.

    A data visualization curriculum of interactive notebooks, using Vega-Lite and Altair. This repository contains a series of Python-based Jupyter notebooks.

    UW Interactive Data Lab 1.2k Dec 30, 2022
    Histogramming for analysis powered by boost-histogram

    Hist Hist is an analyst-friendly front-end for boost-histogram, designed for Python 3.7+ (3.6 users get version 2.4). See what's new. Installation You

    Scikit-HEP Project 97 Dec 25, 2022
    simple tool to paint axis x and y

    simple tool to paint axis x and y

    G705 1 Oct 21, 2021
    A filler visualizer built using python

    filler-visualizer 42 filler のログをビジュアライズしてスポーツさながら楽しむことができます! Usage (標準入力でvisualizer.pyに渡せばALL OK) 1. 既にあるログをビジュアライズする $ ./filler_vm -t 3 -p1 john_fill

    Takumi Hara 1 Nov 04, 2021
    Package managers visualization

    Software Galaxies This repository combines visualizations of major software package managers. All visualizations are available here: http://anvaka.git

    Andrei Kashcha 1.4k Dec 22, 2022
    Analysis and plotting for motor/prop/ESC characterization, thrust vs RPM and torque vs thrust

    esc_test This is a Python package used to plot and analyze data collected for the purpose of characterizing a particular propeller, motor, and ESC con

    Alex Spitzer 1 Dec 28, 2021
    Calendar heatmaps from Pandas time series data

    Note: See MarvinT/calmap for the maintained version of the project. That is also the version that gets published to PyPI and it has received several f

    Martijn Vermaat 195 Dec 22, 2022
    Library for exploring and validating machine learning data

    TensorFlow Data Validation TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be hig

    688 Jan 03, 2023
    A Python Library for Self Organizing Map (SOM)

    SOMPY A Python Library for Self Organizing Map (SOM) As much as possible, the structure of SOM is similar to somtoolbox in Matlab. It has the followin

    Vahid Moosavi 497 Dec 29, 2022
    A set of useful perceptually uniform colormaps for plotting scientific data

    Colorcet: Collection of perceptually uniform colormaps Build Status Coverage Latest dev release Latest release Docs What is it? Colorcet is a collecti

    HoloViz 590 Dec 31, 2022
    Simple Python interface for Graphviz

    Simple Python interface for Graphviz

    Sebastian Bank 1.3k Dec 26, 2022
    📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

    📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

    wq framework 1.2k Jan 01, 2023
    Fractals plotted on MatPlotLib in Python.

    About The Project Learning more about fractals through the process of visualization. Built With Matplotlib Numpy License This project is licensed unde

    Akeel Ather Medina 2 Aug 30, 2022
    Streamlit-template - A streamlit app template based on streamlit-option-menu

    streamlit-template A streamlit app template for geospatial applications based on

    Qiusheng Wu 41 Dec 10, 2022
    Create SVG drawings from vector geodata files (SHP, geojson, etc).

    SVGIS Create SVG drawings from vector geodata files (SHP, geojson, etc). SVGIS is great for: creating small multiples, combining lots of datasets in a

    Neil Freeman 78 Dec 09, 2022
    Splore - a simple graphical interface for scrolling through and exploring data sets of molecules

    Scroll through and exPLORE molecule sets The splore framework aims to offer a si

    3 Jun 18, 2022
    Interactive Data Visualization in the browser, from Python

    Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords hi

    Bokeh 17.1k Dec 31, 2022
    flask extension for integration with the awesome pydantic package

    Flask-Pydantic Flask extension for integration of the awesome pydantic package with Flask. Installation python3 -m pip install Flask-Pydantic Basics v

    249 Jan 06, 2023
    A simple, fast, extensible python library for data validation.

    Validr A simple, fast, extensible python library for data validation. Simple and readable schema 10X faster than jsonschema, 40X faster than schematic

    kk 209 Sep 19, 2022
    A Python toolbox for gaining geometric insights into high-dimensional data

    "To deal with hyper-planes in a 14 dimensional space, visualize a 3D space and say 'fourteen' very loudly. Everyone does it." - Geoff Hinton Overview

    Contextual Dynamics Laboratory 1.8k Dec 29, 2022