Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

Overview

Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

Uses Beautiful Soup to read Wiki pages, Gensim to summarize, NLTK to process, and extracts keywords based on entropy: everything in one beautiful code. I was looking for similar codes throughout Github but most of them were very difficult to understand and use. I'm building this repo to provide simple, yet effective solution in extractive summarization and keyword identification.

Program works best for 300+ words summary.

License

Please follow license guidelines in usage. GNU General Public License v3.0

Requirements

  • Gensim
  • NLTK
  • and others

I provided requirements.txt. Simply input command below in the terminal.

    pip install -r requirements.txt

How to Use

    python summarize.py 
    
   

output:

[email protected](github)

Apple Computer Company was founded on April 1, 1976, by Steve Jobs, Steve Wozniak, and Ronald Wayne as a business partnership. The company's first product is the Apple I, a computer designed and hand-built entirely by Wozniak. To finance its creation, Jobs sold his only motorized means of transportation, a VW Microbus, for a few hundred dollars, and Wozniak sold his HP-65 calculator for US$500 . Wozniak debuted the first prototype at the Homebrew Computer Club in July 1976. The Apple I was sold as a motherboard with CPU, RAM, and basic textual-video chips—a base kit concept which would not yet be marketed as a complete personal computer. It went on sale soon after debut for US$666.66 .:180 Wozniak later said he was unaware of the coincidental mark of the beast in the number 666, and that he came up with the price because he liked "repeating digits". During his keynote speech at the Macworld Expo on January 9, 2007, Jobs announced that Apple Computer, Inc. would thereafter be known as "Apple Inc.", because the company had shifted its emphasis from computers to consumer electronics. This event also saw the announcement of the iPhone and the Apple TV. The company sold 270,000 iPhone units during the first 30 hours of sales, and the device was called "a game changer for the industry". Apple would achieve widespread success with its iPhone, iPod Touch, and iPad products, which introduced innovations in mobile phones, portable music players, and personal computers respectively. Furthermore, by early 2007, 800,000 Final Cut Pro users were registered.

keywords:

'iphone', 'ipad', 'jobs', 'macintosh', 'stores'

Examples

Python (programming language) (300 words)

    python summarize.py https://en.wikipedia.org/wiki/Python_\(programming_language\) 300

output-summary:

Python was conceived in the late 1980s by Guido van Rossum at Centrum Wiskunde & Informatica in the Netherlands as a successor to the ABC language , capable of exception handling and interfacing with the Amoeba operating system. Its implementation began in December 1989. Van Rossum shouldered sole responsibility for the project, as the lead developer, until 12 July 2018, when he announced his "permanent vacation" from his responsibilities as Python's Benevolent Dictator For Life, a title the Python community bestowed upon him to reflect his long-term commitment as the project's chief decision-maker. He now shares his leadership as a member of a five-person steering council. In January 2019, active Python core developers elected Brett Cannon, Nick Coghlan, Barry Warsaw, Carol Willing and Van Rossum to a five-member "Steering Council" to lead the project. Python uses dynamic typing and a combination of reference counting and a cycle-detecting garbage collector for memory management. It also features dynamic name resolution , which binds method and variable names during program execution. Python's developers strive to avoid premature optimization, and reject patches to non-critical parts of the CPython reference implementation that would offer marginal increases in speed at the cost of clarity. When speed is important, a Python programmer can move time-critical functions to extension modules written in languages such as C, or use PyPy, a just-in-time compiler. The long-term plan is to support gradual typing and from Python 3.5, the syntax of the language allows specifying static types but they are not checked in the default implementation, CPython. Examples of the use of this prefix in names of Python applications or libraries include Pygame, a binding of SDL to Python ; PyQt and PyGTK, which bind Qt and GTK to Python respectively; and PyPy, a Python implementation originally written in Python.

output-keywords:

'python', 'class', 'classes', 'division', 'round', 'type'

Steve Jobs (350 words)

    python summarize.py https://en.wikipedia.org/wiki/Steve_Jobs 350

output-summary:

He worked closely with designer Jony Ive to develop a line of products that had larger cultural ramifications, beginning in 1997 with the "Think different" advertising campaign and leading to the iMac, iTunes, iTunes Store, Apple Store, iPod, iPhone, App Store, and the iPad. In 2001, the original Mac OS was replaced with a completely new Mac OS X , based on NeXT's NeXTSTEP platform, giving the OS a modern Unix-based foundation for the first time. 1931), grew up in Homs, Syria, and was born into an Arab Muslim household. While an undergraduate at the American University of Beirut, Lebanon, he was a student activist and spent time in prison for his political activities. He pursued a PhD at the University of Wisconsin, where he met Joanne Carole Schieble, a Catholic of Swiss and German descent. As a doctoral candidate, Jandali was a teaching assistant for a course Schieble was taking, although both were the same age. Mona Simpson, Jobs's biological sister, notes that her maternal grandparents were not happy that their daughter was dating a Muslim. Walter Isaacson, author of the Steve Jobs biography, additionally states that Schieble's father "threatened to cut Joanne off completely" if she continued the relationship. The location of the Los Altos home meant that Jobs would be able to attend nearby Homestead High School, which had strong ties to Silicon Valley. He began his first year there in late 1968 along with Bill Fernandez. Neither Jobs nor Fernandez came from engineering households and thus decided to enroll in John McCollum's "Electronics 1." McCollum and the rebellious Jobs would eventually clash and Jobs began to lose interest in the class.

output-keywords:

'brennan', 'apple', 'macintosh', 'disney', 'next', 'ipod', 'jandali', 'wozniak'

University of Pennsylvania (300 words)

    python summarize.py https://en.wikipedia.org/wiki/University_of_Pennsylvania 300

output-summary:

In 2019, the university had an endowment of $14.65 billion, the sixth-largest endowment of all colleges in the United States, as well as a research budget of $1.02 billion. The university's athletics program, the Quakers, fields varsity teams in 33 sports as a member of the NCAA Division I Ivy League conference. As of 2018, distinguished alumni include three U.S. Supreme Court justices, 32 U.S. senators, 46 U.S. governors, 163 members of the U.S. House of Representatives, eight signers of the Declaration of Independence, 12 signers of the U.S. Constitution, 24 members of the Continental Congress, 14 foreign heads of state, and two presidents of the United States, including the incumbent, Donald Trump. As of October 2019, 36 Nobel laureates, 80 members of the American Academy of Arts and Sciences, 64 billionaires, 29 Rhodes Scholars, 15 Marshall Scholars, and 16 Pulitzer Prize winners have been affiliated with the university. Penn has three claims to being the first university in the United States, according to university archives director Mark Frazier Lloyd: the 1765 founding of the first medical school in America made Penn the first institution to offer both "undergraduate" and professional education; the 1779 charter made it the first American institution of higher learning to take the name of "University"; and existing colleges were established as seminaries. Penn's educational innovations include the nation's first medical school in 1765; the first university teaching hospital in 1874; the Wharton School, the world's first collegiate business school, in 1881; the first American student union building, Houston Hall, in 1896; the country's second school of veterinary medicine; and the home of ENIAC, the world's first electronic, large-scale, general-purpose digital computer in 1946.

output-keywords:

'rugby', 'team', 'football', 'research', 'programs', 'founder', 'school', 'cricket', 'located', 'former'

Owner
Kevin Lai
Kevin Lai
The Scary Story - A Text Adventure

This is a text adventure which I made in python 3. This is one of my first big projects so any feedback would be greatly appreciated.

2 Feb 20, 2022
Simple python program to auto credit your code, text, book, whatever!

Credit Simple python program to auto credit your code, text, book, whatever! Setup First change credit_text to whatever text you would like to credit

Hashm 1 Jan 29, 2022
An online markdown resume template project, based on pywebio

An online markdown resume template project, based on pywebio

极简XksA 5 Nov 10, 2022
汉字转拼音(pypinyin)

汉字拼音转换工具(Python 版) 将汉字转为拼音。可以用于汉字注音、排序、检索(Russian translation) 。 基于 hotoo/pinyin 开发。 Documentation: http://pypinyin.rtfd.io/ GitHub: https://github.co

Huang Huang 4.2k Jan 03, 2023
Meeting, rendezvous, confluence (Finnish kohtaaminen) mark up, down, and up again.

kohtaaminen Meeting, rendezvous, confluence (Finnish kohtaaminen) mark up, down, and up again. Given a zip file containing a tree of html and media fi

Stefan Hagen 2 Dec 14, 2022
This is an AI that is supposed to say you if your text is formal or not

This is an AI that is supposed to say you if your text is formal or not. It's written in Python 3 and has some german examples (because I'm german yk) in the text.json file. This file contains the te

1 Jan 12, 2022
Text to ASCII and ASCII to text

Text2ASCII Description This python script (converter.py) contains two functions: encode() is used to return a list of Integer, one item per character

4 Jan 22, 2022
Wikipedia Reader for the GNOME Desktop

Wike Wike is a Wikipedia reader for the GNOME Desktop. Provides access to all the content of this online encyclopedia in a native application, with a

Hugo Olabera 126 Dec 24, 2022
A non-validating SQL parser module for Python

python-sqlparse - Parse SQL statements sqlparse is a non-validating SQL parser for Python. It provides support for parsing, splitting and formatting S

Andi Albrecht 3.1k Jan 04, 2023
A neat little program to read the text from the "All Ten Fingers" program, and write them back.

ATFTyper A neat little program to read the text from the "All Ten Fingers" program, and write them back. How does it work? This program uses the Pillo

1 Nov 26, 2021
Map Reduce Wordcount in Python using gRPC

This project is implemented in Python using gRPC. The input files are given in .txt format and the word count operation is performed.

Divija 4 Dec 05, 2022
Widevine KEY Extractor in Python

Widevine Client 3 This was originally written by T3rry7f. This repo is slightly modified version of his repo. This only works on standard Windows! Usa

Vank0n (SJJeon) 68 Dec 29, 2022
An extension to detect if the articles content match its title.

Clickbait Detector An extension to detect if the articles content match its title. This was developed in a period of 24-hours in a hackathon called 'H

Arvind Krishna 5 Jul 26, 2022
Python Lex-Yacc

PLY (Python Lex-Yacc) Copyright (C) 2001-2020 David M. Beazley (Dabeaz LLC) All rights reserved. Redistribution and use in source and binary forms, wi

David Beazley 2.4k Dec 31, 2022
Goblin-sim - Procedural fantasy world generator

goblin-sim This project is an attempt to create a procedural goblin fantasy worl

3 May 18, 2022
Find a Doc is a free online resource aimed at helping connect the foreign community in Japan with health services in their native language.

Find a Doc - Localization Find a Doc is a free online resource aimed at helping connect the foreign community in Japan with health services in their n

Our Japan Life 18 Dec 19, 2022
Python tool to make adding to your armory spreadsheet armory less of a pain.

Python tool to make adding to your armory spreadsheet armory slightly less of a pain by creating a CSV to simply copy and paste.

1 Oct 20, 2021
This project is a small tool for processing url-containing texts delivered by HUAWEI Share on Windows.

hwshare_helper This project is a small tool for handling url-containing texts delivered by HUAWEI Share on Windows. config Before use, please install

1 Jan 19, 2022
Adventura is an open source Python Text Adventure Engine

Adventura Adventura is an open source Python Text Adventure Engine, Not yet uplo

5 Oct 02, 2022
Make writing easier!

Handwriter Make writing easier! How to Download and install a handwriting font, or create a font from your handwriting. Use a word processor like Micr

64 Dec 25, 2022