a small library for extracting rich content from urls

Last update: Dec 27, 2022

Related tags

Overview

A small library for extracting rich content from urls.

what does it do?

micawber supplies a few methods for retrieving rich metadata about a variety of links, such as links to youtube videos. micawber also provides functions for parsing blocks of text and html and replacing links to videos with rich embedded content.

examples

here is a quick example:

import micawber

# load up rules for some default providers, such as youtube and flickr
providers = micawber.bootstrap_basic()

providers.request('http://www.youtube.com/watch?v=54XHDUOHuzU')

# returns the following dictionary:
{
    'author_name': 'pascalbrax',
    'author_url': u'http://www.youtube.com/user/pascalbrax'
    'height': 344,
    'html': u'<iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>',
    'provider_name': 'YouTube',
    'provider_url': 'http://www.youtube.com/',
    'title': 'Future Crew - Second Reality demo - HD',
    'type': u'video',
    'thumbnail_height': 360,
    'thumbnail_url': u'http://i2.ytimg.com/vi/54XHDUOHuzU/hqdefault.jpg',
    'thumbnail_width': 480,
    'url': 'http://www.youtube.com/watch?v=54XHDUOHuzU',
    'width': 459,
    'version': '1.0',
}

providers.parse_text('this is a test:\nhttp://www.youtube.com/watch?v=54XHDUOHuzU')

# returns the following string:
this is a test:
<iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>

providers.parse_html('<p>http://www.youtube.com/watch?v=54XHDUOHuzU</p>')

# returns the following html:
<p><iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&amp;feature=oembed" frameborder="0" allowfullscreen="allowfullscreen"></iframe></p>

a small library for extracting rich content from urls

Related tags

Overview

what does it do?

examples

Owner

Charles Leifer

京东茅台抢购 2021年4月最新版

Libextract: extract data from websites

This is my CS 20 final assesment.

Web Scraping OLX with Python and Bsoup.

A simple python script to fetch the latest covid info

Console application for downloading images from Reddit in Python

A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

Transistor, a Python web scraping framework for intelligent use cases.

Introduction to WebScraping Workshop - Semcomp 24 Beta

This is python to scrape overview and reviews of companies from Glassdoor.

Python scraper to check for earlier appointments in Clalit Health Services

Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

Scrap the 42 Intranet's elearning videos in a single click

Automatically download and crop key information from the arxiv daily paper.

A webdriver-based script for reserving Tsinghua badminton courts.

爬取各大SRC当日公告 | 通过微信通知的小工具 | 赏金工具

This Spider/Bot is developed using Python and based on Scrapy Framework to Fetch some items information from Amazon

This is a webscraper for a specific website

An helper library to scrape data from Instagram effortlessly, using the Influencer Hunters APIs.

crypto currency scraping