Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

Last update: Dec 27, 2022

Overview

crawlersuseragents

This Python script can be used to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

Features

30 crawler's user agent strings.
Multithreading.
JSON export with --json outputfile.json.
Auto-detecting responses that stands out.

Usage

$ ./crawlersuseragents.py -h
[~] Access web pages as web crawlers User-Agents, v1.1

usage: crawlersuseragents.py [-h] [-v] [-t THREADS] [-x PROXY] [-k] [-L] [-j JSONFILE] url

This Python script can be used to check if there is any differences in responses of an application
when the request comes from a search engine's crawler.

positional arguments:
  url                   e.g. https://example.com:port/path

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         arg1 help message
  -t THREADS, --threads THREADS
                        Number of threads (default: 5)
  -x PROXY, --proxy PROXY
                        Specify a proxy to use for requests (e.g., http://localhost:8080)
  -k, --insecure        Allow insecure server connections when using SSL (default: False)
  -L, --location        Follow redirects (default: False)
  -j JSONFILE, --jsonfile JSONFILE
                        Save results to specified JSON file.

Auto-detecting responses that stands out

Results are sorted by uniqueness of their response's length. This means that the results with unique response length will be on top, and results with response's length occurring multiple times at the bottom:

Two different result lengths	Four different result lengths

Contributing

Pull requests are welcome. Feel free to open an issue if you want to add other features.

References

You might also like...

Audio media crawler for lbry.

Audio media crawler for lbry. Requirements Python 3.8 Poetry 1.1.7 Elasticsearch 7.14.0 Lbry-sdk 0.99.0 Development This project uses poetry as a depe

4 Dec 3, 2022

Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

Toxicity comments crawler Crawler job that scrapes comments from social media posts and saves them in a S3 bucket. Twitter Tweets and replies are scra

2 Jan 24, 2022

A crawler of doubamovie

豆瓣电影 A crawler of doubamovie 一个小小的入门级scrapy框架的应用，选取豆瓣电影对排行榜前1000的电影数据进行爬取。 spider.py start_requests方法为scrapy的方法，我们对它进行重写。 def start_requests(self):

1 Oct 5, 2021

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

3 Oct 4, 2022

A Pixiv web crawler module

Pixiv-spider A Pixiv spider module WARNING It's an unfinished work, browsing the code carefully before using it. Features 0004 - Readme.md updated, co

1 Nov 14, 2021

Google Maps crawler using Selenium

Google Maps Crawler using Selenium Built as part of the Antifragile Dev Project Selenium crawler that browses Google Maps as a regular user and stores

46 Dec 16, 2022

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

Crawler Rottentomatoes, Goodreads and IMDB sites crawler. Crawler written by beautifulsoup, selenium and lxml to gather books and films information an

1 Dec 30, 2021

A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

1 Jan 10, 2022

A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

1 Jan 10, 2022

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

Related tags

Overview

crawlersuseragents

Features

Usage

Auto-detecting responses that stands out

Contributing

References

You might also like...

Audio media crawler for lbry.

Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

A crawler of doubamovie

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

A Pixiv web crawler module

Google Maps crawler using Selenium

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

A dead simple crawler to get books information from Douban.

A dead simple crawler to get books information from Douban.

Releases(1.1)

1.1(Nov 15, 2021)

Owner

Podalirius

a high-performance, lightweight and human friendly serving engine for scrapy

Libextract: extract data from websites

Unja is a fast & light tool for fetching known URLs from Wayback Machine

A pure-python HTML screen-scraping library

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Web-Scrapper using Python and Flask

WebScraping - Scrapes Job website for python developer jobs and exports the data to a csv file

Kusonime scraper using python3

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

Extract embedded metadata from HTML markup

Automated Linkedin bot that will improve your visibility and increase your network.

This Spider/Bot is developed using Python and based on Scrapy Framework to Fetch some items information from Amazon

Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

NASA APOD Discord Bot - Fetches information from NASA APOD site.

Web-scraping - Program that scrapes a website for a collection of quotes, picks one at random and displays it

UsernameScraperTool - Username Scraper Tool With Python

Python web scrapper

Footballmapies - Football mapies for learning webscraping and use of gmplot module in python

WebScrapping Project - G1 Latest News

Twitter Eye is a Twitter Information Gathering Tool With Twitter Eye