Code for the Higgs Boson Machine Learning Challenge organised by CERN & EPFL

Overview

A method to solve the Higgs boson challenge using Least Squares - Novae

This project is the Project 1 of EPFL CS-433 Machine Learning. The project is the same as the Higgs Boson Machine Learning Challenge posted on Kaggle. The dataset and the detailed description can also be found in the GitHub repository of the course.

Team name: Novae

Team members: Giacomo Orsi, Vittorio Rossi, Chun-Tso Tsai

About the Project

The task of this project is to train a model based on the provided train.csv to have the best prediction on the data given in test.csv or any other general case.

We built our model for the problem using regularized linear regression after applying some data cleaning and features engineering techniques. A report describing our approach and our results can be found in the file report.pdf. In the end, we obtained an accuracy of 0.836 and an F1 score of 0.751 on the test.csv dataset.

Instructions

  • The project runs under Python 3.8 and requires NumPy=1.19.
  • Please make sure to place train.csv and test.csv inside the data folder. Those files can be downloaded here.
  • Go to the script/ folder and execute run.py. A model will be trained with the given hyper-parameters and predictions for the test dataset will be outputed in the file out.csv.

Modules

implementations.py

Contains the implementations of different learning algorithms. Including

  • Least squares linear regression
    • least_squares: Direct computation from linear equations.
    • least_squares_GD: Gradient descent.
    • least_squares_SGD: Stochastic gradient descent.
    • ridge_regression: Regularized linear regression from direct computation.
  • Logistic regression
    • logistic_regression: Gradient descent
    • reg_logistic_regression: Gradient descent with regularization.

There are also some helper functions in this file to facilitate the above functions.

data_processing.py

Calls the following files to process the data.

  • data_cleaning.py: Contains functions used to
    1. Categorize data into subgroups.
    2. Replace missing values with the median.
    3. Standardize the features.
  • feature_engineering.py: Contains functions used to generate our interpretable features.

run.py

Generates the submission .csv file based on the data of test.csv stored in the folder data/. Our optimized model is also defined in this file.

Some helper Functions

  • models.py: Create the models for predicting the labels for new data points without true labels.
  • expansions.py: Contains a function to apply polynomial expansion to our features to add extra degrees of freedom for our models.
  • proj1_helpers.py: Contains functions which loads the .csv files as training or testing data, and create the .csv file for submission.
  • cross_validation.py: Contains a function to build the index for k-fold cross_validation.
  • disk_helper.py: Save/load the NumPy array to disk for further usage. Useful for saving hyper-parameters when trying a long training process.

Notebook

It is possible to use the Jupyter notebook project_notebook.ipynb located in the scripts folder to train the best hyper-parameters for the model. In the notebook it is possible to cross-validate a logistic and a least square regression model over given lambdas and degrees.

Owner
Giacomo Orsi
CS Student at EPFL. Previously at University of Bologna
Giacomo Orsi
Accelerated deep learning R&D

Accelerated deep learning R&D PyTorch framework for Deep Learning research and development. It focuses on reproducibility, rapid experimentation, and

Catalyst-Team 3.1k Jan 06, 2023
diablo2 resurrected loot filter

Only For Chinese and Traditional Chinese The filter only for Chinese and Traditional Chinese, i didn't change it for other language.Maybe you could mo

elmagnifico 249 Dec 04, 2022
Tutorial for the PERFECTING FACTORY 5.0 WITH EDGE-POWERED AI workshop

Workshop Advantech Jetson Nano This tutorial has been designed for the PERFECTING FACTORY 5.0 WITH EDGE-POWERED AI workshop in collaboration with Adva

Edge Impulse 18 Nov 22, 2022
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

Learning Opinion Summarizers by Selecting Informative Reviews This repository contains the codebase and the dataset for the corresponding EMNLP 2021

Arthur Bražinskas 39 Jan 01, 2023
Pytorch cuda extension of grid_sample1d

Grid Sample 1d pytorch cuda extension of grid sample 1d. Since pytorch only supports grid sample 2d/3d, I extend the 1d version for efficiency. The fo

lyricpoem 24 Dec 03, 2022
Transport Mode detection - can detect the mode of transport with the help of features such as acceeration,jerk etc

title emoji colorFrom colorTo sdk app_file pinned Transport_Mode_Detector 🚀 purple yellow gradio app.py false Configuration title: string Display tit

Nishant Rajadhyaksha 3 Jan 16, 2022
MIMO-UNet - Official Pytorch Implementation

MIMO-UNet - Official Pytorch Implementation This repository provides the official PyTorch implementation of the following paper: Rethinking Coarse-to-

Sungjin Cho 248 Jan 02, 2023
Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021.

UniRE Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021. Requirements python: 3.7.6 pytorch: 1.8.1 transformers:

Wang Yijun 109 Nov 29, 2022
Iris prediction model is used to classify iris species created julia's DecisionTree, DataFrames, JLD2, PlotlyJS and Statistics packages.

Iris Species Predictor Iris prediction is used to classify iris species using their sepal length, sepal width, petal length and petal width created us

Siva Prakash 2 Jan 06, 2022
Official implementation for (Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation, CVPR-2021)

FRSKD Official implementation for Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation (CVPR-2021) Requirements Pytho

75 Dec 28, 2022
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pytorch Lightning 1.4k Jan 01, 2023
Official Implementation of SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations

Official Implementation of SimIPU SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations Since

Zhyever 37 Dec 01, 2022
This is 2nd term discrete maths project done by UCU students that uses backtracking to solve various problems.

Backtracking Project Sponsors This is a project made by UCU students: Olha Liuba - crossword solver implementation Hanna Yershova - sudoku solver impl

Dasha 4 Oct 17, 2021
KAPAO is an efficient multi-person human pose estimation model that detects keypoints and poses as objects and fuses the detections to predict human poses.

KAPAO (Keypoints and Poses as Objects) KAPAO is an efficient single-stage multi-person human pose estimation model that models keypoints and poses as

Will McNally 664 Dec 30, 2022
Face and other object detection using OpenCV and ML Yolo

Object-and-Face-Detection-Using-Yolo- Opencv and YOLO object and face detection is implemented. You only look once (YOLO) is a state-of-the-art, real-

Happy N. Monday 3 Feb 15, 2022
[ECE NTUA] 👁 Computer Vision - Lab Projects & Theoretical Problem Sets (2020-2021)

Computer Vision - NTUA (2020-2021) This repository hosts the lab projects and theoretical problem sets of the Computer Vision course held by ECE NTUA

Dimitris Dimos 6 Jul 21, 2022
《Towards High Fidelity Face Relighting with Realistic Shadows》(CVPR 2021)

Towards High Fidelity Face-Relighting with Realistic Shadows Andrew Hou, Ze Zhang, Michel Sarkis, Ning Bi, Yiying Tong, Xiaoming Liu. In CVPR, 2021. T

114 Dec 10, 2022
Source code for deep symbolic optimization.

Update July 10, 2021: This repository now supports an additional symbolic optimization task: learning symbolic policies for reinforcement learning. Th

Brenden Petersen 290 Dec 25, 2022
All supplementary material used by me while TA-ing CS3244: Machine Learning

CS3244-Tutorial-Material All supplementary material used by me while TA-ing CS3244: Machine Learning at NUS School of Computing. What is this? I teach

Rishabh Anand 18 Sep 23, 2022
Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers (arXiv2021)

Polyp-PVT by Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, & Ling Shao. This repo is the official implementation of "Polyp-PVT: Polyp Se

Deng-Ping Fan 102 Jan 05, 2023