Linear programming solver for paper-reviewer matching and mind-matching

Overview

Paper-Reviewer Matcher

A python package for paper-reviewer matching algorithm based on topic modeling and linear programming. The algorithm is implemented based on this article). This package solves problem of assigning paper to reviewers with constrains by solving linear programming problem. We minimize global distance between papers and reviewers in topic space (e.g. topic modeling can be Principal component, Latent Semantic Analysis (LSA), etc.).

Here is a diagram of problem setup and how we solve the problem.

Mind-Match Command Line

Mind-Match is a session we run at Cognitive Computational Neuroscience (CCN) conference. We use a combination of topic modeling and linear programming to solve optimal matching problem. To run example Mind-Match algorithm on sample of 500 people, you can clone the repository and run the following

python mindmatch.py data/mindmatch_example.csv --n_match=6 --n_trim=50

in the root of this repo. This should produce a matching output output_match.csv in this relative location. However, when people get much larger this script takes quite a long time to run. We use pre-cluster into groups before running the mind-matching to make the script runs faster. Below is an example script for pre-clustering and mind-matching on all data:

python mindmatch_cluster.py data/mindmatch_example.csv --n_match=6 --n_trim=50 --n_clusters=4

Example script for the conferences

Here, I include a recent scripts for our Mind Matching session for CCN conference.

  • ccn_mind_matching_2019.py contains script for Mind Matching session (match scientists to scientists) for CCN conference
  • ccn_paper_reviewer_matching.py contains script for matching publications to reviewers for CCN conference, see example of CSV files in data folder

The code makes the distance metric of topics between incoming papers with reviewers (for ccn_paper_reviewer_matching.py) and between people with people (for ccn_mind_matching_2019). We trim the metric so that the problem is not too big to solve using or-tools. It then solves linear programming problem to assign the best matches which minimize the global distance between papers to reviewers. After that, we make the output that can be used by the organizers of the CCN conference -- pairs of paper and reviewers or mind-matching schedule between people to people in the conference. You can see of how it works below.

Dependencies

Use pip to install dependencies

pip install -r requirements.txt

Please see Stackoverflow if you have a problem installing or-tools on MacOS. You can use pip to install protobuf before installing or-tools

pip install protobuf==3.0.0b4
pip install ortools

for Python 3.6,

pip install --user --upgrade ortools

Citations

If you use Paper-Reviewer Matcher in your work or conference, please cite us as follows

@misc{achakulvisut2018,
    author = {Achakulvisut, Titipat and Acuna, Daniel E. and Kording, Konrad},
    title = {Paper-Reviewer Matcher},
    year = {2018},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/titipata/paper-reviewer-matcher}},
    commit = {9d346ee008e2789d34034c2b330b6ba483537674}
}

Members

Owner
Titipat Achakulvisut
Science of Science & Applied NLP | Mahidol University | Former @KordingLab, University of Pennsylvania, and intern @allenai, organizer/co-founder of neuromatch.
Titipat Achakulvisut
Multi Task Vision and Language

12-in-1: Multi-Task Vision and Language Representation Learning Please cite the following if you use this code. Code and pre-trained models for 12-in-

Meta Research 711 Jan 08, 2023
Tensorflow implementation of paper: Learning to Diagnose with LSTM Recurrent Neural Networks.

Multilabel time series classification with LSTM Tensorflow implementation of model discussed in the following paper: Learning to Diagnose with LSTM Re

Aaqib 552 Nov 28, 2022
ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost LOVE is accpeted by ACL22 main conference as a long pape

Lihu Chen 32 Jan 03, 2023
Persian-lexicon - A lexicon of 70K unique Persian (Farsi) words

Persian Lexicon This repo uses Uppsala Persian Corpus (UPC) to construct a lexic

Saman Vaisipour 7 Apr 01, 2022
ETM - R package for Topic Modelling in Embedding Spaces

ETM - R package for Topic Modelling in Embedding Spaces This repository contains an R package called topicmodels.etm which is an implementation of ETM

bnosac 37 Nov 06, 2022
Checking spelling of form elements

Checking spelling of form elements. You can check the source files of external workflows/reports and configuration files

СКБ Контур (команда 1с) 15 Sep 12, 2022
NLP made easy

GluonNLP: Your Choice of Deep Learning for NLP GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you l

Distributed (Deep) Machine Learning Community 2.5k Jan 04, 2023
Tools, wrappers, etc... for data science with a concentration on text processing

Rosetta Tools for data science with a focus on text processing. Focuses on "medium data", i.e. data too big to fit into memory but too small to necess

207 Nov 22, 2022
Sentiment Classification using WSD, Maximum Entropy & Naive Bayes Classifiers

Sentiment Classification using WSD, Maximum Entropy & Naive Bayes Classifiers

Pulkit Kathuria 173 Jan 04, 2023
NLP - Machine learning

Flipkart-product-reviews NLP - Machine learning About Product reviews is an essential part of an online store like Flipkart’s branding and marketing.

Harshith VH 1 Oct 29, 2021
TalkNet: Audio-visual active speaker detection Model

Is someone talking? TalkNet: Audio-visual active speaker detection Model This repository contains the code for our ACM MM 2021 paper, TalkNet, an acti

142 Dec 14, 2022
Augmenty is an augmentation library based on spaCy for augmenting texts.

Augmenty: The cherry on top of your NLP pipeline Augmenty is an augmentation library based on spaCy for augmenting texts. Besides a wide array of high

Kenneth Enevoldsen 124 Dec 29, 2022
A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.

MedMCQA MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering A large-scale, Multiple-Choice Question Answe

MedMCQA 24 Nov 30, 2022
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you

Eliyar Eziz 2.3k Dec 29, 2022
🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Hugging Face 15k Jan 02, 2023
Pytorch NLP library based on FastAI

Quick NLP Quick NLP is a deep learning nlp library inspired by the fast.ai library It follows the same api as fastai and extends it allowing for quick

Agis pof 283 Nov 21, 2022
Guide to using pre-trained large language models of source code

Large Models of Source Code I occasionally train and publicly release large neural language models on programs, including PolyCoder. Here, I describe

Vincent Hellendoorn 947 Dec 28, 2022
Python package for Turkish Language.

PyTurkce Python package for Turkish Language. Documentation: https://pyturkce.readthedocs.io. Installation pip install pyturkce Usage from pyturkce im

Mert Cobanov 14 Oct 09, 2022
Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Natural Language Processing for Adverse Drug Reaction (ADR) Detection This repo contains code from a project to identify ADRs in discharge summaries a

Medicines Optimisation Service - Austin Health 21 Aug 05, 2022
The NewSHead dataset is a multi-doc headline dataset used in NHNet for training a headline summarization model.

This repository contains the raw dataset used in NHNet [1] for the task of News Story Headline Generation. The code of data processing and training is available under Tensorflow Models - NHNet.

Google Research Datasets 31 Jul 15, 2022