This script is intended to crawl license information of repositories through the GitHub API.

Last update: Oct 25, 2022

Related tags

Overview

GithubLicenseCrawler

This script is intended to crawl license information of repositories through the GitHub API. Taking a csv file with requirements.txt format the script will return a csv with the associated license information.

Input

Input file is expected to be a requirements.txt Expected format looks like this, for two exemplary repositories:

HeartSeg-Dataset==0.0.1
DeepDive==0.0.1

Output

Output file will be generated on the fly, named licenses.csv and the columns depict:

Running the script should look like this:

Contact and Contribute

[email protected] Obviously the Github API is way more powerful than what has been done here. Feel free to extend this code or preferably directly contribute here.

Owner

schutera

GitHub Repository

Web Scraping Instagram photos with Selenium by only using a hashtag.

Web-Scraping-Instagram This project is used to automatically obtain images by web scraping Instagram with Selenium in Python. The required input will

3 Nov 24, 2022

Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Scrapy Cluster This Scrapy project uses Redis and Kafka to create a distributed

0 Jan 06, 2022

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Gerapy Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Documentation Documentation

2.9k Jan 03, 2023

爱奇艺会员,腾讯视频,哔哩哔哩,百度,各类签到

My-Actions 个人收集并适配Github Actions的各类签到大杂烩不要fork了 ⭐️ star就行使用方式新建仓库并同步代码点击Settings - Secrets - 点击绿色按钮 (如无绿色按钮说明已激活。直接到下一步。) 新增 new secret 并设置 Secr

280 Dec 30, 2022

tweet random sand cat pictures

sandcatbot setup pip3 install --user -r requirements.txt cp sandcatbot.example.conf sandcatbot.conf vim sandcatbot.conf running the first parameter i

8 Aug 07, 2022

Minecraft Item Scraper

Minecraft Item Scraper To run, first ensure you have the BeautifulSoup module: pip install bs4 Then run, python minecraft_items.py folder-to-save-ima

1 Dec 29, 2021

Works very well and you can ask for the type of image you want the scrapper to collect.

Works very well and you can ask for the type of image you want the scrapper to collect. Also follows a specific urls path depending on keyword selection.

1 Feb 17, 2022

Scrape and display grades onto the console

WebScrapeGrades About The Project This Project is a personal project where I learned how to webscrape using python requests. Being able to get request

1 Oct 23, 2021

Snowflake database loading utility with Scrapy integration

Snowflake Stage Exporter Snowflake database loading utility with Scrapy integration. Meant for streaming ingestion of JSON serializable objects into S

0 Dec 06, 2021

Examine.com supplement research scraper!

ExamineScraper Examine.com supplement research scraper! Why I want to be able to search pages for a specific term. For example, I want to be able to s

15 Dec 06, 2022

This code will be able to scrape movies from a movie website and also provide download links to newly uploaded movies.

Movies-Scraper You are probably tired of navigating through a movie website to get the right movie you'd want to watch during the weekend. There may e

1 Jan 31, 2022

对于有验证码的站点爆破，用于安全合法测试

使用方法 python3 main.py + 配置好的文件 python3 main.py Verify.json python3 main.py NoVerify.json 以上分别对应有验证码的demo和无验证码的demo Tips: 你可以以域名作为配置文件名字加载：python3 main

47 Nov 09, 2022

Automatically scrapes all menu items from the Taco Bell website

Automatically scrapes all menu items from the Taco Bell website. Returns as PANDAS dataframe.

2 Jan 15, 2022

Scrape all the media from an OnlyFans account - Updated regularly

3.2k Dec 29, 2022

Python Web Scrapper Project

Web Scrapper Projeto desenvolvido em python, sobre tudo com Selenium, BeautifulSoup e Pandas é um web scrapper que puxa uma tabela com as principais e

2 Jan 04, 2022

河南工业大学完美校园自动校外打卡

HAUT-checkin 河南工业大学自动校外打卡由于github actions存在明显延迟，建议直接使用腾讯云函数特点多人打卡使用简单，仅需账号密码以及用于微信推送的uid 自动获取上一次打卡信息用于打卡向所有成员微信单独推送打卡状态完美校园服务器繁忙时造成打卡失败会自动重新打卡

36 Oct 27, 2022

A simplistic scraper made to download tons of random screenshots made by people.

printStealer 1.1 What is this tool? This tool is developed to show the insecurity of the screenshot utility called prnt sc. It is a site that stores s

4 Jul 26, 2022

基于Github Action的定时HITsz疫情上报脚本，开箱即用

HITsz Daily Report 基于 GitHub Actions 的「HITsz 疫情系统」访问入口定时自动上报脚本，开箱即用。感谢 @JellyBeanXiewh 提供原始脚本和 idea。感谢 @bugstop 对脚本进行重构并新增 Easy Connect 校内代理访问。

56 Nov 27, 2022

This repo has the source code for the crawler and data crawled from auto-data.net

This repo contains the source code for crawler and crawled data of cars specifications from autodata. The data has roughly 45k cars

5 Nov 22, 2022

A web crawler script that crawls the target website and lists its links

A web crawler script that crawls the target website and lists its links || A web crawler script that lists links by scanning the target website.

2 Apr 29, 2022