Code for the tech report Toward Training at ImageNet Scale with Differential Privacy

Last update: Nov 03, 2022

Related tags

Overview

Differentially private Imagenet training

Code for the tech report Toward Training at ImageNet Scale with Differential Privacy by Alexey Kurakin, Steve Chien, Shuang Song, Roxana Geambasu, Andreas Terzis and Abhradeep Thakurta.

This is not an officially supported Google product.

Repository structure

benchmarks directory contains code which we used to compare performance of various DP-SGD frameworks on CIFAR10 and MNIST
imagenet directory contains Imagenet trainign code.

Installation

If you are going to use NVIDIA GPU then install latest NVIDIA drivers, CUDA and CuDNN. While latest versions are not strictly necessary to run the code, we sometimes observed slower performance with older versions of CUDA and CuDNN.

Set up Python virtual environment with all necessary libraries:

# Create virtualenv
virtualenv -p python3 ~/.venv/dp_imagenet
source ~/.venv/dp_imagenet/bin/activate
# Install Objax with CUDA
pip install --upgrade objax
pip install --upgrade jax[cuda11_cudnn82] -f https://storage.googleapis.com/jax-releases/jax_releases.html
# Tensorflow and TFDS (for datasets readers)
pip install tensorflow
pip install tensorflow-datasets

Extra libraries for TF and Opacus benchmarks:

pip install tensorflow-privacy
pip install opacus
pip install torchvision
pip install tensorboard

Follow instructions at https://www.tensorflow.org/datasets/catalog/imagenet2012 to download Imagenet dataset for TFDS.

Before running any code, make sure to enter virtual environment and setup PYTHONPATH:

# Enter virtual env, set up path
source ~/.venv/dp_imagenet/bin/activate
cd ${REPOSITORY_DIRECTORY}
export PYTHONPATH=$PYTHONPATH:.

Training Imagenet models with DP

Here are few examples showing how to run Imagenet training with and without DP:

# Resnet50 without DP
python imagenet/imagenet_train.py --tfds_data_dir="${TFDS_DATA_DIR}" --max_eval_batches=10 --eval_every_n_steps=100 --train_device_batch_size=64 --disable_dp

# Resnet18 without DP
python imagenet/imagenet_train.py --tfds_data_dir="${TFDS_DATA_DIR}" --max_eval_batches=10 --eval_every_n_steps=100 --model=resnet18 --train_device_batch_size=64 --disable_dp

# Resnet18 with DP
python imagenet/imagenet_train.py --tfds_data_dir="${TFDS_DATA_DIR}" --max_eval_batches=10 --eval_every_n_steps=100 --model=resnet18 --train_device_batch_size=64

To pre-train model on Places365 and finetune with differential privacy on Imagenet use the following commands:

# Prepare directory for Places365 checkpoint
PLACES_CHECKPOINT_DIR="${HOME}/experiments/places365"
mkdir -p "${PLACES_CHECKPOINT_DIR}"

# Pre-train model on Places365 without differential privacy
# This will train a model to about 55% accuracy on Places365
# when run on 8 GPUs.
python imagenet/imagenet_train.py \
  --tfds_data_dir="${TFDS_DATA_DIR}" \
  --dataset=places365 \
  --eval_every_n_steps=1024 \
  --model=resnet18 \
  --num_train_epochs=80 \
  --lr_warmup_epochs=4 \
  --base_learning_rate=0.05 \
  --disable_dp \
  --train_device_batch_size=128 \
  --model_dir="${PLACES_CHECKPOINT_DIR}"

# Prepare directory for Imagenet checkpoint
IMAGENET_DP_CHECKPOINT_DIR="${HOME}/experiments/imagenet_dp"
mkdir -p "${IMAGENET_DP_CHECKPOINT_DIR}"

# Finetune model on Imagenet with differential privacy.
# This will train a differentially private Imagenet model
# to approximately 48% accuracy with epsilon ~10, delta ~10^{-6}
# when run on 8 GPUs.
# If number of GPUs is different then adjust --grad_acc_steps argument
# such that number_of_gpus*grad_acc_steps = 512.
python imagenet/imagenet_train.py \
  --tfds_data_dir="${TFDS_DATA_DIR}" \
  --eval_every_n_steps=1024 \
  --model=resnet18 \
  --num_train_epochs=70 \
  --dp_clip_norm=1.0 \
  --dp_sigma=0.058014 \
  --grad_acc_steps=64 \
  --base_learning_rate=0.03 \
  --lr_warmup_epochs=1 \
  --num_layers_to_freeze=6 \
  --finetune_path="${PLACES_CHECKPOINT_DIR}/ckpt/0000141312.npz" \
  --model_dir="${IMAGENET_DP_CHECKPOINT_DIR}"

Running DP-SGD benchmarks

Following commands were used to obtain benchmarks of various frameworks for the tech report. All of them were run on n1-standard-96 Google Cloud machine with 8 v100 GPUs. All numbers were obtains with CUDA 11.4 and CuDNN 8.2.2.26.

Objax benchmarks:

# MNIST benchmark without DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/mnist_objax.py --disable-dp

# MNIST benchmark with DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/mnist_objax.py

# CIFAR10 benchmark without DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/cifar10_objax.py --disable-dp

# CIFAR10 benchmark with DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/cifar10_objax.py

# Imagenet benchmark Resnet18 without DP
python imagenet/imagenet_train.py --tfds_data_dir="${TFDS_DATA_DIR}" --disable_dp --base_learning_rate=0.2

# Imagenet benchmark Resnet18 with DP
python imagenet/imagenet_train.py --tfds_data_dir="${TFDS_DATA_DIR}" --base_learning_rate=2.0

Opacus benchmarks:

# MNIST benchmark without DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/mnist_opacus.py --disable-dp

# MNIST benchmark with DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/mnist_opacus.py

# CIFAR10 benchmark without DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/cifar10_opacus.py --disable-dp

# CIFAR10 benchmark with DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/cifar10_opacus.py

Tensorflow benchmarks:

# MNIST benchmark without DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/mnist_tf.py --dpsgd=False

# MNIST benchmark with DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/mnist_tf.py

# CIFAR10 example without DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/cifar10_tf.py --dpsgd=False

# CIFAR10 example with DP
CUDA_VISIBLE_DEVICES=0 python benchmarks/cifar10_tf.py

Code for the tech report Toward Training at ImageNet Scale with Differential Privacy

Related tags

Overview

Differentially private Imagenet training

Repository structure

Installation

Training Imagenet models with DP

Running DP-SGD benchmarks

Owner

Google Research

Vision transformers (ViTs) have found only limited practical use in processing images

we propose EfficientDerain for high-efficiency single-image deraining

Author: Wenhao Yu ([email protected]). ACL 2022. Commonsense Reasoning on Knowledge Graph for Text Generation

Fast, Attemptable Route Planner for Navigation in Known and Unknown Environments

[NeurIPS 2021] Galerkin Transformer: a linear attention without softmax

Learning to Identify Top Elo Ratings with A Dueling Bandits Approach

Leaderboard, taxonomy, and curated list of few-shot object detection papers.

Weight initialization schemes for PyTorch nn.Modules

Pytorch implementation of the Variational Recurrent Neural Network (VRNN).

The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"

Official implementation of CVPR2020 paper "Deep Generative Model for Robust Imbalance Classification"

[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Benchmark for Answering Existential First Order Queries with Single Free Variable

This is a repository for a No-Code object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operating systems.

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

functorch is a prototype of JAX-like composable function transforms for PyTorch.

My course projects for the 2021 Spring Machine Learning course at the National Taiwan University (NTU)

PyContinual (An Easy and Extendible Framework for Continual Learning)

Official implementation of the paper "Steganographer Detection via a Similarity Accumulation Graph Convolutional Network"

Implementation of "JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting"