[ WSDM '22 ] On Sampling Collaborative Filtering Datasets

Last update: Dec 08, 2022

Related tags

Overview

On Sampling Collaborative Filtering Datasets

This repository contains the implementation of many popular sampling strategies, along with various explicit/implicit/sequential feedback recommendation algorithms. The code accompanies the paper "On Sampling Collaborative Filtering Datasets" [ACM] [Public PDF] where we compare the utility of different sampling strategies for preserving the performance of various recommendation algorithms.

We also provide code for Data-Genie which can automatically predict the performance of how good any sampling strategy will be for a given collaborative filtering dataset. We refer the reader to the full paper for more details. Kindly send me an email if you're interested in obtaining access to the pre-trained weights of Data-Genie.

If you find any module of this repository helpful for your own research, please consider citing the below WSDM'22 paper. Thanks!

@inproceedings{sampling_cf,
  author = {Noveen Sachdeva and Carole-Jean Wu and Julian McAuley},
  title = {On Sampling Collaborative Filtering Datasets},
  url = {https://doi.org/10.1145/3488560.3498439},
  booktitle = {Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining},
  series = {WSDM '22},
  year = {2022}
}

Code Author: Noveen Sachdeva ([email protected])

Setup

Environment Setup

$ pip install -r requirements.txt

Data Setup

Once you've correctly setup the python environments and downloaded the dataset of your choice (Amazon: http://jmcauley.ucsd.edu/data/amazon/), the following steps need to be run:

The following command will create the required data/experiment directories as well as download & preprocess the Amazon magazine and the MovieLens-100K datasets. Feel free to download more datasets from the following web-page http://jmcauley.ucsd.edu/data/amazon/ and adjust the setup.sh and preprocess.py files accordingly.

$ ./setup.sh

How to train a model on a sampled/complete CF-dataset?

Edit the hyper_params.py file which lists all config parameters, including what type of model to run. Currently supported models:

Sampling Strategy	What is sampled?	Paper Link
Random	Interactions
Stratified	Interactions
Temporal	Interactions
SVP-CF w/ MF	Interactions	LINK & LINK
SVP-CF w/ Bias-only	Interactions	LINK & LINK
SVP-CF-Prop w/ MF	Interactions	LINK & LINK
SVP-CF-Prop w/ Bias-only	Interactions	LINK & LINK
Random	Users
Head	Users
SVP-CF w/ MF	Users	LINK & LINK
SVP-CF w/ Bias-only	Users	LINK & LINK
SVP-CF-Prop w/ MF	Users	LINK & LINK
SVP-CF-Prop w/ Bias-only	Users	LINK & LINK
Centrality	Graph	LINK
Random-Walk	Graph	LINK
Forest-Fire	Graph	LINK

Finally, type the following command to run:

$ CUDA_VISIBLE_DEVICES=<SOME_GPU_ID> python main.py

Alternatively, to train various possible recommendation algorithm on various CF datasets/subsets, please edit the configuration in grid_search.py and then run:

$ python grid_search.py

How to train Data-Genie?

Edit the data_genie/data_genie_config.py file which lists all config parameters, including what datasets/CF-scenarios/samplers etc. to train Data-Genie on
Finally, use the following command to train Data-Genie:

$ python data_genie.py

License

MIT

[ WSDM '22 ] On Sampling Collaborative Filtering Datasets

Related tags

Overview

On Sampling Collaborative Filtering Datasets

Setup

Environment Setup

Data Setup

How to train a model on a sampled/complete CF-dataset?

How to train Data-Genie?

License

Owner

Noveen Sachdeva

For the paper entitled ''A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining''

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA

Image super-resolution (SR) is a fast-moving field with novel architectures attracting the spotlight

This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies.

Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

This repo contains code to reproduce all experiments in Equivariant Neural Rendering

PyTorch implementation of DARDet: A Dense Anchor-free Rotated Object Detector in Aerial Images

A concise but complete implementation of CLIP with various experimental improvements from recent papers

Luminaire is a python package that provides ML driven solutions for monitoring time series data.

Super Pix Adv - Offical implemention of Robust Superpixel-Guided Attentional Adversarial Attack (CVPR2020)

Neural network graphs and training metrics for PyTorch, Tensorflow, and Keras.

MMdet2-based reposity about lightweight detection model: Nanodet, PicoDet.

Modular Probabilistic Programming on MXNet

Deep Learning Theory

i3DMM: Deep Implicit 3D Morphable Model of Human Heads

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Semantic Segmentation Suite in TensorFlow

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Official pytorch implementation of the IrwGAN for unaligned image-to-image translation

StarGAN-ZSVC: Unofficial PyTorch Implementation