Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Last update: Nov 05, 2021

Related tags

Overview

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

This repository provides two web crawlers to label domain names using the McAfee API (https://www.trustedsource.org/sources/index.pl) and IP reputation using the TALOS API (https://talosintelligence.com/), respectively.

Requirements

BeautifulSoup

Usage

Descriptions of the demonstration code are as follows.

To label the categories of a set of domains, put the domain list in 'data/domain_list.txt' and run 'demo_domain_label.py'. The program will label the (1) category (e.g., Malicious Sites- Parked Domain) as well as (2) risk level (e.g., High Risk) of each domain (using the McAfee API) and save the results in 'res/domain_labels.txt'. When the program continuously outputs ''-Retry-'', please stop the program and wait for a moment. After the waiting, you can start the program again, which can automatically skip the domains already labeled and continue to label the rest domains.
To label the reputation of a set of IP addresses, put the IP list in 'data/IP_list.txt' and run 'demo_IP_label.py'. The program will label the (1) email reputation as well as (2) web reputation (with 3 levels of Poor, Neutral, and Good) and save the results in 'res/IP_labels.txt'. When the program continuously outputs ''None'', please stop the program and wait for a moment. After the waiting, you can start the program again, which can automatically skip the IPs already labeled and continue to label the rest IPs.
An example domain name list (with 21,820 effective second-level domains) and an example IP list (with 67,751 IP addresses) are given in 'data/examples/example_domain_list.txt' and 'data/examples/example_IP_list.txt', repsectively. The corresponding labeled results are saved in 'res/examples/example_domain_labels.txt' and 'res/examples/example_IP_labels.txt', respectively.

If you have questions regarding this repository, you can contact the author via [[email protected]].

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Related tags

Overview

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Requirements

Usage

Owner

EBay-email-tracker - Scapes an entire search page of a particular item on eBay and sends regular updates to an email address

An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line!

Pyrics is a tool to scrape lyrics, get rhymes, generate relevant lyrics with rhymes.

Pelican plugin that adds site search capability

Ebay Webscraper for Getting Average Product Price

A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

Telegram Group Scrapper

A list of Python Bots used to extract data from several websites

A tool can scrape product in aliexpress: Title, Price, and URL Product.

Fundamentus scrapy

Scrapes mcc-mnc.com and outputs 3 files with the data (JSON, CSV & XLSX)

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

A python module to parse the Open Graph Protocol

Divar.ir Ads scrapper

A Powerful Spider(Web Crawler) System in Python.

This is python to scrape overview and reviews of companies from Glassdoor.

对于有验证码的站点爆破，用于安全合法测试

Amazon scraper using scrapy, a python framework for crawling websites.

Haphazard scripts for scraping bitcoin/bitcoin data from GitHub

A web scraper that exports your entire WhatsApp chat history.