Package for controllable summarization

Last update: Dec 07, 2022

Related tags

Overview

summarizers

summarizers is package for controllable summarization based CTRLsum.
currently, we only supports English. It doesn't work in other languages.

Installation

pip install summarizers

Usage

1. Create Summarizers

First at all, create summarizers obejct to summarize your own article.

>>> from summarizers import Summarizers
>>> summ = Summarizers()

You can select type of source article between [normal, paper, patent].
If you don't input any parameter, default type is normal.

>>> from summarizers import Summarizers
>>> summ = Summarizers('normal')  # <-- default.
>>> summ = Summarizers('paper')
>>> summ = Summarizers('patent')

If you want GPU acceleration, set param device='cuda'.

>>> from summarizers import Summarizers
>>> summ = Summarizers('normal', device='cuda')

2. Basic Summarization

If you inputted source article, basic summariztion is conducted.

>>> contents = """
Tunip is the Octonauts' head cook and gardener. 
He is a Vegimal, a half-animal, half-vegetable creature capable of breathing on land as well as underwater. 
Tunip is very childish and innocent, always wanting to help the Octonauts in any way he can. 
He is the smallest main character in the Octonauts crew.
"""

>>> summ(contents)
'Tunip is a Vegimal, a half-animal, half-vegetable creature'

3. Query focused Summarization

If you want to input query together, Query focused summarization conducted.

>>> summ(contents, query="main character of Octonauts")
'Tunip is the smallest main character in the Octonauts crew.'

3. Abstractive QA (Auto Question Detection)

If you inputted question as query, Abstractive QA is conducted.

>>> summ(contents, query="What is Vegimal?")
'Half-animal, half-vegetable'

You can turn off this feature by setting param question_detection=False.

>>> summ(contents, query="SOME_QUERY", question_detection=False)

4. Prompt based Summarization

You can generate summary that begins with some sequence using param prompt.
It works like GPT-3's Prompt based generation. (but It doesn't work very well.)

>>> summ(contents, prompt="Q:Who is Tunip? A:")
"Q:Who is Tunip? A: Tunip is the Octonauts' head"

5. Query focused Summarization with Prompt

You can also input both query and prompt.
In this case, a query focus summary is generated that starts with a prompt.

>>> summ(contents, query="personality of Tunip", prompt="Tunip is very")
"Tunip is very childish and innocent, always wanting to help the Octonauts."

6. Options for Decoding Strategy

For generative models, decoding strategy is very important.
summarizers support variety of options for decoding strategy.

>>> summ(
...     contents=contents,
...     num_beams=10,
...     top_k=30,
...     top_p=0.85,
...     no_repeat_ngram_size=3,                  
... )

License

Copyright 2021 Hyunwoong Ko.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Package for controllable summarization

Related tags

Overview

summarizers

Installation

Usage

1. Create Summarizers

2. Basic Summarization

3. Query focused Summarization

3. Abstractive QA (Auto Question Detection)

4. Prompt based Summarization

5. Query focused Summarization with Prompt

6. Options for Decoding Strategy

License

Owner

Hyunwoong Ko

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

The source code of HeCo

An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow

Example code for "Real-World Natural Language Processing"

An A-SOUL Text Generator Based on CPM-Distill.

A single model that parses Universal Dependencies across 75 languages.

LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

Binaural Speech Synthesis

A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

Topic Modelling for Humans

Repository for the paper: VoiceMe: Personalized voice generation in TTS

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Code Implementation of "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

Datasets of Automatic Keyphrase Extraction