A scrapy pipeline that provides an easy way to store files and images using various folder structures.

Last update: Oct 23, 2022

Overview

scrapy-folder-tree

This is a scrapy pipeline that provides an easy way to store files and images using various folder structures.

Supported folder structures:

Given this scraped file: 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg, you can choose the following folder structures:

Using file name

full
├── 0
.   ├── 5
.   .   ├── b
.   .   .   ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg

Using crawling time

full
├── 0
.   ├── 11
.   .   ├── 48
.   .   .   ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg

Using crawling date

full
├── 2022
.   ├── 1
.   .   ├── 24
.   .   .   ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg

Installation

pip install scrapy_folder_tree

Usage

Use the following settings in your project:

ITEM_PIPELINES = {
    'scrapy_folder_tree.FilesHashTreePipeline': 300
}

FOLDER_TREE_DEPTH = 3

A scrapy pipeline that provides an easy way to store files and images using various folder structures.

Related tags

Overview

scrapy-folder-tree

Supported folder structures:

Installation

Usage

Owner

Panagiotis Simakis

A Pixiv web crawler module

淘宝、天猫半价抢购，抢电视、抢茅台，干死黄牛党

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

OSTA web scraper, for checking the status of school buses in Ottawa

自动完成每日体温上报（Github Actions）

Linkedin webscraping - Linkedin web scraping with python

Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Ebay Webscraper for Getting Average Product Price

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸每日一句 + 毒鸡汤（从2月份稳定运行至今）

This is my CS 20 final assesment.

a small library for extracting rich content from urls

Simple Web scrapper Bot to scrap webpages using Requests, html5lib and Beautifulsoup.

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》

Scrapes all articles and their headlines from theonion.com

Audio media crawler for lbry.

Python Web Scrapper Project

A Python module to bypass Cloudflare's anti-bot page.

Open Crawl Vietnamese Text

This is a module that I had created along with my friend. It's a basic web scraping module

原神爬虫抓取原神界面圣遗物信息

A scrapy pipeline that provides an easy way to store files and images using various folder structures.

Related tags

Overview

scrapy-folder-tree

Supported folder structures:

Installation

Usage

Owner

Panagiotis Simakis

A Pixiv web crawler module

淘宝、天猫半价抢购，抢电视、抢茅台，干死黄牛党

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

OSTA web scraper, for checking the status of school buses in Ottawa

自动完成每日体温上报（Github Actions）

Linkedin webscraping - Linkedin web scraping with python

Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Ebay Webscraper for Getting Average Product Price

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸 每日一句 + 毒鸡汤（从2月份稳定运行至今）

This is my CS 20 final assesment.

a small library for extracting rich content from urls

Simple Web scrapper Bot to scrap webpages using Requests, html5lib and Beautifulsoup.

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》

Scrapes all articles and their headlines from theonion.com

Audio media crawler for lbry.

Python Web Scrapper Project

A Python module to bypass Cloudflare's anti-bot page.

Open Crawl Vietnamese Text

This is a module that I had created along with my friend. It's a basic web scraping module

原神爬虫 抓取原神界面圣遗物信息

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸每日一句 + 毒鸡汤（从2月份稳定运行至今）

原神爬虫抓取原神界面圣遗物信息