Privacy-Preserving Machine Learning (PPML) Tutorial Presented at PyConDE 2022

Last update: Aug 16, 2022

Related tags

Deep Learning ppml-pyconde

Overview

PPML: Machine Learning on Data you cannot see

Repository for the tutorial on Privacy-Preserving Machine Learning (PPML) presented at PyConDE 2022

Abstract
- Outline
Get the Material
- Set up the Environment
Colophon
- Acknowledgments and Fundings
Contacts

Abstract

Privacy guarantees are one of the most crucial requirements when it comes to analyse sensitive information. However, data anonymisation techniques alone do not always provide complete privacy protection; moreover Machine Learning (ML) models could also be exploited to leak sensitive data when attacked and no counter-measure is put in place.

Privacy-preserving machine learning (PPML) methods hold the promise to overcome all those issues, allowing to train machine learning models with full privacy guarantees.

This workshop will be mainly organised in two parts. In the first part, we will explore one example of ML model exploitation (i.e. inference attack ) to reconstruct original data from a trained model, and we will then see how differential privacy can help us protecting the privacy of our model, with minimum disruption to the original pipeline. In the second part of the workshop, we will examine a more complicated ML scenario to train Deep learning networks on encrypted data, with specialised distributed federated learning strategies.

Outline

Introduction: Brief Intro to PPML and to the workshop (slides)
Part 1: Strengthening Deep Neural Networks
- Model vulnerabilities:
  - Adversarial Examples and FGSM (Fast Gradient Sign Method) notebook
  - Model Inference attack notebooks: training | reconstruction
- Deep Learning with Differential Privacy
  - Model Inference attack with OPACUS notebooks: training | reconstruction
Part 2: Primer on Privacy-Preserving Machine Learning
- Introduction to Federated Learning notebook
- DL training on (Homomorphically) Encrypted Data notebook
- OpenMined and PrivateAI series notebook
  - Introduction to Remote Data Science notebooks
  - SplitNN notebooks

Note: the material has been updated after the conference, to match the flow of the presentation as delivered during the conference, as well as to incorporate feedbacks received afterwards.

Video recording of the session presented at PyCon DE

Get the material

Clone the current repository, in order to get the course materials. To do so, once connected to your remote machine (via SSH), execute the following instructions:

cd $HOME  # This will make sure you'll be in your HOME folder
git clone https://github.com/leriomaggio/ppml-pyconde.git

Note: This will create a new folder named ppml-pyconde. Move into this folder by typing:

cd ppml-pyconde

Well done! Now you should do be in the right location. Bear with me another few seconds, following instructions reported below 🙏

Set up your Environment

To execute the notebooks in this repository, it is necessary to set up the environment.

Please refer to the Get-Ready.ipynb notebook for a step-by-step guide on how to setup the environment, and check that all is working, and ready to go.

Note: You could run this notebook directly in VSCode, or in your existing Jupyter notebook/lab environment:

jupyter notebook Get-Ready.ipynb

Colophon

Author: Valerio Maggio (@leriomaggio), Senior Research Associate, University of Bristol.

All the Code material is distributed under the terms of the Apache License. See LICENSE file for additional details.

All the instructional materials in this repository are free to use, and made available under the [Creative Commons Attribution license][https://creativecommons.org/licenses/by/4.0/]. The following is a human-readable summary of (and not a substitute for) the full legal text of the CC BY 4.0 license.

You are free:

to Share---copy and redistribute the material in any medium or format
to Adapt---remix, transform, and build upon the material

for any purpose, even commercially.

The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

Attribution---You must give appropriate credit (mentioning that your work is derived from work that is Copyright © Software Carpentry and, where practical, linking to http://software-carpentry.org/), provide a [link to the license][cc-by-human], and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

No additional restrictions---You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

Acknowledgment and funding

The material developed in this tutorial has been supported by the University of Bristol, and by the Software Sustainability Institute (SSI), as part of my SSI fellowship on PETs (Privacy Enchancing Technologies).

Please see this deck to know more about my fellowship plans.

I would also like to thank all the people at OpenMined for all the encouragement and support with the preparation of this tutorial. I hope the material in this repository could contribute to raise awareness about all the amazing work on PETs it's being provided to the Open Source and the Python communities.

Contacts

For any questions or doubts, feel free to open an issue in the repository, or drop me an email @ valerio.maggio_at_gmail_dot_com

Releases(pyconde)

pyconde(Jun 14, 2022)

Tutorial on Privacy-Preserving Machine Learning as presented at PyCon DE 2022 (https://2022.pycon.de/program/QHJ7SX/)

Full Changelog: https://github.com/leriomaggio/ppml-tutorial/commits/pyconde
Source code(tar.gz)
Source code(zip)

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

GraphMask This repository contains an implementation of GraphMask, the interpretability technique for graph neural networks presented in our ICLR 2021

29 Sep 2, 2022

Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

76 Jan 3, 2023

Code for the Population-Based Bandits Algorithm, presented at NeurIPS 2020.

Population-Based Bandits (PB2) Code for the Population-Based Bandits (PB2) Algorithm, from the paper Provably Efficient Online Hyperparameter Optimiza

22 Nov 16, 2022

Projects for AI/ML and IoT integration for games and other presented at re:Invent 2021.

Playground4AWS Projects for AI/ML and IoT integration for games and other presented at re:Invent 2021. Architecture Minecraft and Lamps This project i

5 Nov 30, 2022

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

3 Dec 2, 2022

Collection of TensorFlow2 implementations of Generative Adversarial Network varieties presented in research papers.

TensorFlow2-GAN Collection of tf2.0 implementations of Generative Adversarial Network varieties presented in research papers. Model architectures will

41 Apr 28, 2022

Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences", CVPR 2021.

HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature fo

50 Dec 21, 2022

clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation

README clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation CVPR 2021 Authors: Suprosanna Shit and Johannes C. Paetzo

110 Dec 29, 2022

A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching.

LPM_Python A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching. The code is established ac

11 Nov 7, 2022

Privacy-Preserving Machine Learning (PPML) Tutorial Presented at PyConDE 2022

Related tags

Overview

PPML: Machine Learning on Data you cannot see

Abstract

Outline

Get the material

Set up your Environment

Colophon

Acknowledgment and funding

Contacts

You might also like...

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

Code for the Population-Based Bandits Algorithm, presented at NeurIPS 2020.

Projects for AI/ML and IoT integration for games and other presented at re:Invent 2021.

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Collection of TensorFlow2 implementations of Generative Adversarial Network varieties presented in research papers.

Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences", CVPR 2021.

clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation

A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching.

Releases(pyconde)

pyconde(Jun 14, 2022)

Owner

Valerio Maggio

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment (ICCV2021)

Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

CVPR2021 Content-Aware GAN Compression

Using Tensorflow Object Detection API to detect Waymo open dataset

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Code implementing "Improving Deep Learning Interpretability by Saliency Guided Training"

Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF shows significant improvements over baseline fine-tuning without data filtration.

This repo in the implementation of EMNLP'21 paper "SPARQLing Database Queries from Intermediate Question Decompositions" by Irina Saparina, Anton Osokin

Self-supervised learning on Graph Representation Learning (node-level task)

The Power of Scale for Parameter-Efficient Prompt Tuning

Code release for DS-NeRF (Depth-supervised Neural Radiance Fields)

This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

Source code and data in paper "MDFEND: Multi-domain Fake News Detection (CIKM'21)"

This implements one of result networks from Large-scale evolution of image classifiers

Testing and Estimation of structural breaks in Stata

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

Code for "Learning Graph Cellular Automata"

Info and sample codes for "NTU RGB+D Action Recognition Dataset"

Repository for MuSiQue: Multi-hop Questions via Single-hop Question Composition

Tracking code for the winner of track 1 in the MMP-Tracking Challenge at ICCV 2021 Workshop.