Python code to crawl computer vision papers from top CV conferences. Currently it supports CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, SIGGRAPH

Overview

Crawling-CV-Conference-Papers

News

  • 2021-6-21 Support CVPR-2021

Download all CVPR-2021 papers in one click. Just set the local download directory in download_cvpr2021.py and run it! Don't forget to have your chrome driver ready (i.e., corresponding version to your Chrome browser)

  • 2021-6-20 Support continuation of downloading from where the program encounters interruption. (prevent re-downloading from scratch)

Introduction

Python code to crawl computer vision papers from top CV conferences. Currently it supports CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, SIGGRAPH. It leverages selenium, a website testing framework to crawl the titles and pdf urls from the conference website, and download them one by one with some simple anti-anti-crawler tricks.

Websites for older conferences are not guaranteed to be bug-free, since this project is based on newest website structure.

Recommend to work with Mendeley. You will get a juicy academic corpus.

Currently only single-thread downloading is implemented. Therefore the downloading for thousands of papers would be slow (takes several hours). It is suggested that you run the script before bed and it would be finished when you get to work again :)

Multi-thread downloading will be coming soon!

Requirements

pip install selenium, slugify

Besides, downlowd chromedriver.exe from the link to any local path you favour.

Usage

To execute the crawler, you could run download.py or download.ipynb (Basically the same). Before the execution, some paths need to be set up, including:

conference = 'neurips'
conference_url = "https://papers.nips.cc/paper/2019" # the conference url to download papers from
chromedriver_path = '.../chromedriver.exe' # the chromedriver.exe path
root = './NeurIPS-2019-ALL' # file path to save the downloaded papers

Here are some conference url examples:

cvpr: https://openaccess.thecvf.com/CVPR2020 (CVPR 2020)
eccv: https://openaccess.thecvf.com/ECCV2018 (ECCV 2018) (changed in 2020)
eccv: https://www.ecva.net/papers.php (ECCV 2020) 
iccv: https://openaccess.thecvf.com/ICCV2019 (ICCV 2019)
icml: http://proceedings.mlr.press/v119/ (ICML 2020)
neurips: https://papers.nips.cc/paper/2020 (NeurIPS 2020)
iclr: https://openreview.net/group?id=ICLR.cc/2021/Conference (ICLR 2021)
siggraph: https://dl.acm.org/toc/tog/2020/39/4 (SIGGRAPH 2020)

Replace the url and the conference names with your choice.

If you want to crawl papers from other conference website, all you need to do is to write a retrieve function like the ones in retrieve_titles_urls_from_websites.py, to parse html code and retrieve the paper titles and pdf urls into two lists.

Others

Warnings: It is heard that crawling from conference websites might cause a banning of your IP (hasn't happened to me so far). Not sure of the risk.

Warnings: This project is for learning purpose only. Do not crawl the same website frequently, which will burden the server.

Welcome to submit a pull request if there is any bugs or if you would like to add support to other conferences!

Maintainer

Xiaoyang Huang

Email: [email protected]

Owner
Xiaoyang Huang
Xiaoyang Huang
Convert BMS songs to osu! With options to convert keysounds and convert to 7key.

bmx2osu Convert BMS to osu! With options to: convert keysounds to one song file using BMX2WAV include 7k version change Overall Difficulty and HP Drai

7 Nov 28, 2022
TikTok downloader video without watermark from Telegram bot

⬇️ How to download video from Tik Tok via telegram bot? Send a link to the video from tik tok to our telegram bot and it will send you a video without

1 Mar 04, 2022
Source code of paper: "HRegNet: A Hierarchical Network for Efficient and Accurate Outdoor LiDAR Point Cloud Registration".

HRegNet: A Hierarchical Network for Efficient and Accurate Outdoor LiDAR Point Cloud Registration Environments The code mainly requires the following

Intelligent Sensing, Perception and Computing Group 3 Oct 06, 2022
Python module to donwload all Pixiv artworks of a user using it's user ID.

Python module to donwload all Pixiv artworks of a user using it's user ID. You need a PHPSESSID token to export NSFW.

Quatrecentquatre 1 Jan 27, 2022
Download Thumbnail of YouTube Videos

Download Thumbnail of YouTube Videos in High Quality Variables: API_ID : Get From my.telegram.org API_HASH : Get from my.telegram.org BOT_TOKEN : Your

Arun 6 Jun 08, 2022
This project is helps to download contents from Streamtape by utilizing the API

It scrapes Streamtape api and download contents from the site.

Debiprasad Das 5 Dec 28, 2022
VK sticker downloader with python

VK Sticker Downloader This repository is used to automate download file from VK Sticker How to use Execute the file ./downloader.py Writedown full url

Hartawan Bahari M. 1 Dec 29, 2021
this is udemy course downloader, before a start you know how to get access token.

udemy_downloader this is udemy course downloader, before a start you know how to get access token. To get the access_token on Google Chrome (once on U

OkUgur 18 Dec 04, 2022
⚙️ A CLI tool that can download songs from youtube.

⚙️ Music Downloader Music Downloader is a tool that can download songs from Youtube. Installation Base requirements: Python 3.7+ If you have Python 3.

matjs 4 Nov 03, 2021
Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included

WV-AMZN-4K-RIPPER Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included For CDM You can Mail :- Denis Trunov 179 Dec 17, 2022

Youtube videos and channels scraper python wrapper!

YouTubeCrawle Wrapper for python Why This wrapper? This is wrapper is not limited to videos only it can scrape both channel and videos seperately ;D

Kei 16 Aug 08, 2022
A toolkit to automatically crawl the paper list and download paper pdfs of ACL Ahthology.

ACL-Anthology-Crawler A toolkit to automatically crawl the paper list and download paper pdfs of ACL Anthology

Ray GG 9 Oct 09, 2022
Youtube Downloader by PyTube é uma ferramenta simples com interface gráfica e escrito em python para baixar vídeos e playlists do youtube...

YouTube Downloader by PyTube O que é o YouTube Downloader by PyTube? YouTube Downloader by PyTube é um software simples para baixar vídeos no YouTube

Elizeu Barbosa Abreu 5 Jul 30, 2022
A cross-platform python based utility to download courses from udemy for personal offline use.

udemy-dl A cross-platform python based utility to download courses from udemy for personal offline use. Warning Udemy has started to encrypt many of t

Nasir Khan 4.6k Dec 31, 2022
lo2: Simple youtube-dl web frontend

Simple youtube-dl web frontend

Denis Volk 22 Jun 03, 2022
Search & download music from a certain streaming service

Search & download music from a certain streaming service

mat 2 Mar 11, 2022
AI Dungeon Catalog Archive Toolkit

AI Dungeon Content Archive Toolkit (AID CAT) AID CAT is a command-line utility that will allow you to download JSON backups of: Your private and publi

Mimi 31 Oct 26, 2022
Python library to download bulk of images from Bing.com

Python library to download bulk of images form Bing.com. This package uses async url, which makes it very fast while downloading.

Guru Prasad Singh 105 Dec 14, 2022
squid-dl is a massively parallel yt-dlp-based YouTube downloader.

squid-dl squid-dl is a massively parallel yt-dlp-based YouTube downloader. Installation Run the setup.py, which will install squid-dl and its two depe

tuxlovesyou 51 Jan 05, 2023
Web Downloader With Python

Web Downloader Introduction This module will provide API to download the webpage components : html file, image file, css fil, javascript file, href li

3 Dec 28, 2022