A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    Pseudo API for Google Trends

    pytrends Introduction Unofficial API for Google Trends Allows simple interface for automating downloading of reports from Google Trends. Only good unt

    General Mills 2.6k Dec 28, 2022
    This is my CS 20 final assesment.

    eeeeeSpider This is my CS 20 final assesment. How to use: Open program Run to your hearts content! There are no external dependancies that you will ha

    1 Jan 17, 2022
    A simple python web scraper.

    Dissec A simple python web scraper. It gets a website and its contents and parses them with the help of bs4. Installation To install the requirements,

    11 May 06, 2022
    A low-code tool that generates python crawler code based on curl or url

    KKBA Intruoduction A low-code tool that generates python crawler code based on curl or url Requirement Python = 3.6 Install pip install kkba Usage Co

    8 Sep 20, 2021
    Scrap-mtg-top-8 - A top 8 mtg scraper using python

    Scrap-mtg-top-8 - A top 8 mtg scraper using python

    1 Jan 24, 2022
    Lovely Scrapper

    Lovely Scrapper

    Tushar Gadhe 2 Jan 01, 2022
    Visual scraping for Scrapy

    Portia Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web pag

    Scrapinghub 8.7k Jan 05, 2023
    Automatically download and crop key information from the arxiv daily paper.

    Arxiv daily 速览 功能:按关键词筛选arxiv每日最新paper,自动获取摘要,自动截取文中表格和图片。 1 测试环境 Ubuntu 16+ Python3.7 torch 1.9 Colab GPU 2 使用演示 首先下载权重baiduyun 提取码:il87,放置于code/Pars

    HeoLis 20 Jul 30, 2022
    Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

    Repositório contendo scripts Python que realizam a consulta de CPF e CNPJ diretamente no site da Receita Federal.

    Josué Campos 5 Nov 29, 2021
    An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line!

    Social Media Scraper An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line! Go to the website » Vie

    2 Aug 03, 2022
    fork huanghyw/jd_seckill

    Jd_Seckill 特别声明: 本仓库发布的jd_seckill项目中涉及的任何脚本,仅用于测试和学习研究,禁止用于商业用途,不能保证其合法性,准确性,完整性和有效性,请根据情况自行判断。 本项目内所有资源文件,禁止任何公众号、自媒体进行任何形式的转载、发布。

    512 Jan 03, 2023
    Scraping Thailand COVID-19 data from the DDC's tableau dashboard

    Scraping COVID-19 data from DDC Dashboard Scraping Thailand COVID-19 data from the DDC's tableau dashboard. Data is updated at 07:30 and 08:00 daily.

    Noppakorn Jiravaranun 5 Jan 04, 2022
    Transistor, a Python web scraping framework for intelligent use cases.

    Web data collection and storage for intelligent use cases. transistor About The web is full of data. Transistor is a web scraping framework for collec

    BOM Quote Manufacturing 212 Nov 05, 2022
    Kusonime scraper using python3

    Features Scrap from url Scrap from recommendation Search by query Todo [+] Search by genre Example # Get download url from kusonime import Scrap

    MhankBarBar 2 Jan 28, 2022
    腾讯课堂,模拟登陆,获取课程信息,视频下载,视频解密。

    腾讯课堂脚本 要学一些东西,但腾讯课堂不支持自定义变速,播放时有水印,且有些老师的课一遍不够看,于是这个脚本诞生了。 时间比较紧张,只会不定时修复重大bug。多线程下载之类的功能更新短期内不会有,如果你想一起完善这个脚本,欢迎pr 2020.5.22测试可用 使用方法 很简单,三部完成 下载代码,

    163 Dec 30, 2022
    Introduction to WebScraping Workshop - Semcomp 24 Beta

    Extrair informações da internet de forma automatizada. Existem diversas maneiras de fazer isso, nesse tutorial vamos ver algumas delas, por meio de bibliotecas de python.

    Luísa Moura 19 Sep 11, 2022
    淘宝、天猫半价抢购,抢电视、抢茅台,干死黄牛党

    taobao_seckill 淘宝、天猫半价抢购,抢电视、抢茅台,干死黄牛党 依赖 安装chrome浏览器,根据浏览器的版本找到对应的chromedriver下载安装 web版使用说明 1、抢购前需要校准本地时间,然后把需要抢购的商品加入购物车 2、如果要打包成可执行文件,可使用pyinstalle

    2k Jan 05, 2023
    Python scraper to check for earlier appointments in Clalit Health Services

    clalit-appt-checker Python scraper to check for earlier appointments in Clalit Health Services Some background If you ever needed to schedule a doctor

    Dekel 16 Sep 17, 2022
    Simple tool to scrape and download cross country ski timings and results from live.skidor.com

    LiveSkidorDownload Simple tool to scrape and download cross country ski timings

    0 Jan 07, 2022
    This is python to scrape overview and reviews of companies from Glassdoor.

    Data Scraping for Glassdoor This is python to scrape overview and reviews of companies from Glassdoor. Please use it carefully and follow the Terms of

    Houping 5 Jun 23, 2022