Official implementation of ETH-XGaze dataset baseline

Last update: Jan 03, 2023

Related tags

Overview

ETH-XGaze baseline

Official implementation of ETH-XGaze dataset baseline.

ETH-XGaze dataset

ETH-XGaze dataset is a gaze estimation dataset consisting of over one million high-resolution images of varying gaze under extreme head poses. We established a simple baseline test on our ETH-XGaze dataset and other datasets. This repository includes the code and pre-trained model. Please find more details about the dataset on our project page.

License

The code is under the license of CC BY-NC-SA 4.0 license

Requirement

Python 3.5
Pytorch 1.1.0, torchvision
opencv-python

For model training

h5py to load the training data
configparser

For testing

dlib for face and facial landmark detection.

Training

You need to download the ETH-XGaze dataset for training. After downloading the data, make sure it is the version of pre-processed 224*224 pixels face patch. Put the data under '\data\xgaze'
Run the python main.py to train the model
The model will be saved under 'ckpt' folder.

Test

The demo.py files show how to perform the gaze estimation from input image. The example image is already in 'example/input' folder.

First, you need to download the pre-trained model, and put it under "ckpt" folder.
And then, run the 'python demo.py' for test.

Data normalization

The 'normalization_example.py' gives the example of data normalization from the raw dataset to the normalized data.

Citation

If using this code-base and/or the ETH-XGaze dataset in your research, please cite the following publication:

@inproceedings{Zhang2020ETHXGaze,
  author    = {Xucong Zhang and Seonwook Park and Thabo Beeler and Derek Bradley and Siyu Tang and Otmar Hilliges},
  title     = {ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation},
  year      = {2020},
  booktitle = {European Conference on Computer Vision (ECCV)}
}

FAQ

Q: Where are the test set labels?
You can submit your test result to our leaderboard and get the results. Please do follow the registration first, otherwiese, your request will be ignored. Link to the leaderboard.

Q: What is the data normalization?
As we wrote in our paper, data normalization is a method to crop the face/eye image without head rotation around the roll axis. Please refer to the following paper for details: Revisiting Data Normalization for Appearance-Based Gaze Estimation

Q: Why convert 3D gaze direction (vector) to 2D gaze direction (pitch and yaw)? How to convert between 3D and 2D gaze directions?
Essentially to say, 2D pitch and yaw is enough to describe the gaze direction in the head coordinate system, and using 2D instead of 3D could make the model training easier. There are code examples on how to convert between them in the "utils.py" file as pitchyaw_to_vector and vector_to_pitchyaw.

Official implementation of ETH-XGaze dataset baseline

Related tags

Overview

ETH-XGaze baseline

ETH-XGaze dataset

License

Requirement

For model training

For testing

Training

Test

Data normalization

Citation

FAQ

Owner

Xucong Zhang

Birthday-problem - The birthday problem asks for the probability that, in a set of n randomly chosen people, at least two will share a birthday

Creating a custom CNN hypertunned architeture for the Fashion MNIST dataset with Python, Keras and Tensorflow.

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

PyTorch implementation of "PatchGame: Learning to Signal Mid-level Patches in Referential Games" to appear in NeurIPS 2021

A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

Code to reproduce the results in "Visually Grounded Reasoning across Languages and Cultures", EMNLP 2021.

Lane follower: Lane-detector (OpenCV) + Object-detector (YOLO5) + CAN-bus

Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples

Proposed n-stage Latent Dirichlet Allocation method - A Novel Approach for LDA

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

A simple implementation of Kalman filter in single object tracking

RL agent to play μRTS with Stable-Baselines3

Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data

Neuron Merging: Compensating for Pruned Neurons (NeurIPS 2020)

DeepStochlog Package For Python

Pytorch Implementation for (STANet+ and STANet)

Hierarchical Uniform Manifold Approximation and Projection

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.