Screen scraping and web crawling framework

Last update: Jun 21, 2021

Overview

Pomp

Pomp is a screen scraping and web crawling framework. Pomp is inspired by and similar to Scrapy, but has a simpler implementation that lacks the hard Twisted dependency.

Features:

Pure python
Only one dependency for Python 2.x - concurrent.futures (backport of package for Python 2.x)
Supports one file applications; Pomps doesn't force a specific project layout or other restrictions.
Pomp is a meta framework like Paste: you may use it to create your own scraping framework.
Extensible networking: you may use any sync or async method.
No parsing libraries in the core; use you preferred approach.
Pomp instances may be distributed and are designed to work with an external queue.

Pomp makes no attempt to accomodate:

redirects
proxies
caching
database integration
cookies
authentication
etc.

If you want proxies, redirects, or similar, you may use the excellent requests library as the Pomp downloader.

Pomp examples

Pomp docs

Pomp is written and maintained by Evgeniy Tatarkin and is licensed under the BSD license.

Screen scraping and web crawling framework

Related tags

Overview

Pomp

Owner

Evgeniy Tatarkin

This project was created using Python technology and flask tools to scrape a music site

robobrowser - A simple, Pythonic library for browsing the web without a standalone web browser.

Google Maps crawler using Selenium

FilmMikirAPI - A simple rest-api which is used for scrapping on the Kincir website using the Python and Flask package

Web scraper build using python.

A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

A dead simple crawler to get books information from Douban.

Unja is a fast & light tool for fetching known URLs from Wayback Machine

Web-scraping - Program that scrapes a website for a collection of quotes, picks one at random and displays it

This program scrapes information and images for movies and TV shows.

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

京东抢茅台，秒杀成功很多次讨论，天猫抢购，赚钱交流等。

Python scraper to check for earlier appointments in Clalit Health Services

A distributed crawler for weibo, building with celery and requests.

Grab the changelog from releases on Github

京东茅台抢购

抖音批量下载用户所有无水印视频

Find papers by keywords and venues. Then download it automatically

Web Scraping Framework

Script used to download data for stocks.