Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Related tags

Deep Learninggrokking
Overview

Re-implementation of the paper 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Paper

Original paper can be found here

Datasets

I'm not super clear on how they defined their division. I am using integer division:

  • $$x\circ y = (x // y) mod p$$, for some prime $$p$$ and $$0\leq x,y \leq p$$
  • $$x\circ y = (x // y) mod p$$ if y is odd else (x - y) mod p, for some prime $$p$$ and $$0\leq x,y \leq p$$

Hyperparameters

The default hyperparameters are from the paper, but can be adjusted via the command line when running train.py

Running experiments

To run with default settings, simply run python train.py. The first time you train on any dataset you have to specify --force_data.

Arguments:

optimizer args

  • "--lr", type=float, default=1e-3
  • "--weight_decay", type=float, default=1
  • "--beta1", type=float, default=0.9
  • "--beta2", type=float, default=0.98

model args

  • "--num_heads", type=int, default=4
  • "--layers", type=int, default=2
  • "--width", type=int, default=128

data args

  • "--data_name", type=str, default="perm", choices=[
    • "perm_xy", # permutation composition x * y
    • "perm_xyx1", # permutation composition x * y * x^-1
    • "perm_xyx", # permutation composition x * y * x
    • "plus", # x + y
    • "minus", # x - y
    • "div", # x / y
    • "div_odd", # x / y if y is odd else x - y
    • "x2y2", # x^2 + y^2
    • "x2xyy2", # x^2 + y^2 + xy
    • "x2xyy2x", # x^2 + y^2 + xy + x
    • "x3xy", # x^3 + y
    • "x3xy2y" # x^3 + xy^2 + y ]
  • "--num_elements", type=int, default=5 (choose 5 for permutation data, 97 for arithmetic data)
  • "--data_dir", type=str, default="./data"
  • "--force_data", action="store_true", help="Whether to force dataset creation."

training args

  • "--batch_size", type=int, default=512
  • "--steps", type=int, default=10**5
  • "--train_ratio", type=float, default=0.5
  • "--seed", type=int, default=42
  • "--verbose", action="store_true"
  • "--log_freq", type=int, default=10
  • "--num_workers", type=int, default=4
Owner
Tom Lieberum
Master student in AI at the University of Amsterdam. Effective altruist, rationalist, and transhumanist. Got my B.Sc. in Physics from RWTH Aachen Uni
Tom Lieberum
113 Nov 28, 2022
Python-kafka-reset-consumergroup-offset-example - Python Kafka reset consumergroup offset example

Python Kafka reset consumergroup offset example This is a simple example of how

Willi Carlsen 1 Feb 16, 2022
Real-time multi-object tracker using YOLO v5 and deep sort

This repository contains a two-stage-tracker. The detections generated by YOLOv5, a family of object detection architectures and models pretrained on the COCO dataset, are passed to a Deep Sort algor

Mike 3.6k Jan 05, 2023
Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Using a predicted aligned error matrix corresponding to an AlphaFold2 model , returns a series of lists of residue indices, where each list corresponds to a set of residues clustering together into a

Tristan Croll 24 Nov 23, 2022
Determined: Deep Learning Training Platform

Determined: Deep Learning Training Platform Determined is an open-source deep learning training platform that makes building models fast and easy. Det

Determined AI 2k Dec 31, 2022
Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery (ICCV 2021)

Change is Everywhere Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery by Zhuo Zheng, Ailong Ma, Liangpei Zhang and Yanfei

Zhuo Zheng 125 Dec 13, 2022
Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

GraphMask This repository contains an implementation of GraphMask, the interpretability technique for graph neural networks presented in our ICLR 2021

Michael Schlichtkrull 29 Sep 02, 2022
Towards Implicit Text-Guided 3D Shape Generation (CVPR2022)

Towards Implicit Text-Guided 3D Shape Generation Towards Implicit Text-Guided 3D Shape Generation (CVPR2022) Code for the paper [Towards Implicit Text

55 Dec 16, 2022
Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

CReST in Tensorflow 2 Code for the paper: "CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning" by Chen Wei, Ki

Google Research 75 Nov 01, 2022
Implementation of a Transformer using ReLA (Rectified Linear Attention)

ReLA (Rectified Linear Attention) Transformer Implementation of a Transformer using ReLA (Rectified Linear Attention). It will also contain an attempt

Phil Wang 49 Oct 14, 2022
Python with OpenCV - MediaPip Framework Hand Detection

Python HandDetection Python with OpenCV - MediaPip Framework Hand Detection Explore the docs » Contact Me About The Project It is a Computer vision pa

2 Jan 07, 2022
This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation This repo is the official implementation of "DeciWatch: A Simple Baseline for

117 Dec 24, 2022
SurfEmb (CVPR 2022) - SurfEmb: Dense and Continuous Correspondence Distributions

SurfEmb SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation with Learnt Surface Embeddings Rasmus Laurvig Haugard, A

Rasmus Haugaard 56 Nov 19, 2022
"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (CVPRW 2022) Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Z

Yuanhao Cai 274 Jan 05, 2023
An open source Jetson Nano baseboard and tools to design your own.

My Jetson Nano Baseboard This basic baseboard gives the user the foundation and the flexibility to design their own baseboard for the Jetson Nano. It

NVIDIA AI IOT 57 Dec 29, 2022
CSD: Consistency-based Semi-supervised learning for object Detection

CSD: Consistency-based Semi-supervised learning for object Detection (NeurIPS 2019) By Jisoo Jeong, Seungeui Lee, Jee-soo Kim, Nojun Kwak Installation

80 Dec 15, 2022
Official implementation for the paper "Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection"

Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection PyTorch code release of the paper "Attentive Prototypes for Sour

Deepti Hegde 23 Oct 17, 2022
A Dataset of Python Challenges for AI Research

Python Programming Puzzles (P3) This repo contains a dataset of python programming puzzles which can be used to teach and evaluate an AI's programming

Microsoft 850 Dec 24, 2022
Keqing Chatbot With Python

KeqingChatbot A public running instance can be found on telegram as @keqingchat_bot. Requirements Python 3.8 or higher. A bot token. Local Deploy git

Rikka-Chan 2 Jan 16, 2022
Implementation of C-RNN-GAN.

Implementation of C-RNN-GAN. Publication: Title: C-RNN-GAN: Continuous recurrent neural networks with adversarial training Information: http://mogren.

Olof Mogren 427 Dec 25, 2022