This repository contains the code used to quantitatively evaluate counterfactual examples in the associated paper.

Last update: Jan 16, 2022

Related tags

Overview

On Quantitative Evaluations of Counterfactuals

Install

To install required packages with conda, run the following command:

> conda env create -f requirements.yml

Code

The code contains all the evaluation metrics used in the paper as well as the models and the data.

To evaluate methods, you need to choose a config from the configs directory and to choose which metric to apply. The code will then evaluate the chosen metrics on counterfactuals from all three methods (GB, GL, GEN) and store the results in an appropriate subdirectory in outputs. If you, e.g., want to run all metrics on the MNIST dataset, use the following command:

(cfeval) > python main.py --eval -c configs/mnist/mnist.ini -a

Afterwards you can enumerate the directory by

(cfeval) > python main.py --list

to get an output like the following:

> Listing dirs
000: ./output/celeba_makeup_[0]
001: ./output/fake_mnist_[0]
002: ./output/mnist_0_1_[0]
003: ./output/mnist_[0]

Now, results can be printed for the MNIST dataset (idx 3 above) by

(cfeval) > python main.py --print -c 3

To get a result like

# # # # # # # # # # # # # # # # # # # # 
# MNIST
# # # # # # # # # # # # # # # # # # # # 
Method \ Metric    TargetClassValidity    ElasticNet    IM1          IM2             FID  Oracle
-----------------  ---------------------  ------------  -----------  -----------  ------  ------------
GB                 99.59 (0.13)           16.07 (0.18)  0.99 (0.00)  0.55 (0.01)   50.23  73.38 (0.87)
GL                 100.00 (0.00)          42.76 (0.31)  0.99 (0.00)  0.53 (0.00)  308.43  37.71 (0.95)
GEN                99.97 (0.03)           99.17 (0.58)  0.88 (0.00)  0.17 (0.00)   90.73  93.13 (0.50)

Directory overview:

File	Description
`ckpts`	Contains all the (Keras) models used by the various metrics.
`data`	Contains the data used, both counterfactual examples from GB, GL, and GEN, and original input data.
`configs`	Contains config files specifying experimental details like dataset, normalization, etc.
`data`	Contains the data in numpy arrays.
`dataset`	Code for loading data.
`evaluate`	Implementations of all the metrics.
`output`	Directory to hold computed results. Directory already contains results from paper.
`config.py`	Reads config files from `configs`
`constants.py`	Method and metric names.
`listing.py`	Utility for indexing output dirs (see description below)
`main.py`	Main file to run all code through.
`print_results.py`	Utillity function for printing results from json files in the `output` directory.

This repository contains the code used to quantitatively evaluate counterfactual examples in the associated paper.

Related tags

Overview

On Quantitative Evaluations of Counterfactuals

Install

Code

Owner

Frederik Hvilshøj

This code is a toolbox that uses Torch library for training and evaluating the ERFNet architecture for semantic segmentation.

[NeurIPS 2021] SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

Pairwise learning neural link prediction for ogb link prediction

The official repository for "Score Transformer: Generating Musical Scores from Note-level Representation" (MMAsia '21)

Official implementation for paper Knowledge Bridging for Empathetic Dialogue Generation (AAAI 2021).

"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

Cossim - Sharpened Cosine Distance implementation in PyTorch

Reinforcement Learning for the Blackjack

Implementation of the master's thesis "Temporal copying and local hallucination for video inpainting".

Laplace Redux -- Effortless Bayesian Deep Learning

A Collection of Papers and Codes for ICCV2021 Low Level Vision and Image Generation

Distributed Arcface Training in Pytorch

This is a computer vision based implementation of the popular childhood game 'Hand Cricket/Odd or Even' in python

This is a Image aid classification software based on python TK library development

Neuralnetwork - Basic Multilayer Perceptron Neural Network for deep learning

No-reference Image Quality Assessment(NIQA) Algorithms (BRISQUE, NIQE, PIQE, RankIQA, MetaIQA)

On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification

Implementation of Deep Deterministic Policy Gradiet Algorithm in Tensorflow

ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhin et al., 2020).

Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer