SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

Introduction

This is a PyTorch implementation of "SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training"

The paper propose a novel text detection system termed SelfText Beyond Polygon(SBP) with Bounding Box Supervision(BBS) and Dynamic Self Training~(DST), where training a polygon-based text detector with only a limited set of upright bounding box annotations. As shown in the Figure, SBP achieves the same performance as strong supervision while saving huge data annotation costs.

From more details,please refer to our arXiv paper

Environments

python 3
torch = 1.1.0
torchvision
Pillow
numpy

ToDo List

Dataset

Supported:

model zoo

Supported text detection:

Bounding Box Supervision(BBS)

Train

The training strategy includes three steps: (1) training SASN with synthetic data (2) generating pseudo label on real data based on bounding box annotation with SASN (3) training the detectors(EAST and PSENet) with the pseudo label

training SASN with synthtext or curved synthtext

(TDB)

generating pseudo label on real data with SASN

(TDB)

training EAST or PSENet with the pseudo label

(TDB)

Eval

for example (batchsize=2)

(TDB)

Visualization

Dynamic Self Training

Train

(TDB)

Eval

for example (batchsize=2)

(TDB)

Visualization

Experiments

Bounding Box Supervision

The performance of EAST on ICDAR15

Method	Dataset	Pretrain	precision	recall	f-score
EAST_box	ICDAR15	-	65.8	63.8	64.8
EAST	ICDAR15	-	76.9	77.1	77.0
EAST_pseudo(SynthText)	ICDAR15	-	77.8	78.2	78.0
EAST_box	ICDAR15	SynthText	70.8	72.0	71.4
EAST	ICDAR15	SynthText	82.0	82.4	82.2
EAST_pseudo(SynthText)	ICDAR15	SynthText	81.3	82.2	81.8

The performance of EAST on MSRA-TD500

Method	Dataset	Pretrain	precision	recall	f-score
EAST_box	MSRA-TD500	-	40.49	31.05	35.15
EAST	MSRA-TD500	-	71.76	69.05	70.38
EAST_pseudo(SynthText)	MSRA-TD500	-	71.27	67.54	69.36
EAST_box	MSRA-TD500	SynthText	48.34	42.37	45.16
EAST	MSRA-TD500	SynthText	77.91	76.45	77.17
EAST_pseudo(SynthText)	MSRA-TD500	SynthText	77.42	73.85	75.59

The performance of PSENet on ICDAR15

Method	Dataset	Pretrain	precision	recall	f-score
PSENet_box	ICDAR15	-	70.17	69.09	69.63
PSENet	ICDAR15	-	81.6	79.5	80.5
PSENet_pseudo(SynthText)	ICDAR15	-	82.9	77.6	80.2
PSENet_box	ICDAR15	SynthText	72.65	74.29	73.46
PSENet	ICDAR15	SynthText	86.42	83.54	84.96
PSENet_pseudo(SynthText)	ICDAR15	SynthText	86.77	83.34	85.02

The performance of PSENet on MSRA-TD500

Method	Dataset	Pretrain	precision	recall	f-score
PSENet_box	MSRA-TD500	-	47.17	36.90	41.41
PSENet	MSRA-TD500	-	80.86	77.72	79.13
PSENet_pseudo(SynthText)	MSRA-TD500	-	80.32	77.26	78.86
PSENet_box	MSRA-TD500	SynthText	47.45	39.49	43.11
PSENet	MSRA-TD500	SynthText	84.11	84.97	84.54
PSENet_pseudo(SynthText)	MSRA-TD500	SynthText	84.03	84.03	84.03

The performance of PSENet on Total Text

Method	Dataset	Pretrain	precision	recall	f-score
PSENet_box	Total Text	-	46.5	43.6	45.0
PSENet	Total Text	-	80.4	76.5	78.4
PSENet_pseudo(SynthText)	Total Text	-	80.33	73.54	76.78
PSENet_pseudo(Curved SynthText)	Total Text	-	81.68	74.61	78.0
PSENet_box	Total Text	SynthText	51.94	47.45	49.59
PSENet	Total Text	SynthText	83.4	78.1	80.7
PSENet_pseudo(SynthText)	Total Text	SynthText	81.57	75.54	78.44
PSENet_pseudo(Curved SynthText)	Total Text	SynthText	82.51	77.57	80.0

The visualization of bounding-box annotation and the pseudo labels generated by BBS on Total-Text

links

https://github.com/SakuraRiven/EAST

https://github.com/WenmuZhou/PSENet.pytorch

License

For academic use, this project is licensed under the Apache License - see the LICENSE file for details. For commercial use, please contact the authors.

Citations

Please consider citing our paper in your publications if the project helps your research.

Eamil: [email protected]

Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

Related tags

Overview

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

Introduction

Environments

ToDo List

Dataset

model zoo

Bounding Box Supervision(BBS)

Train

training SASN with synthtext or curved synthtext

generating pseudo label on real data with SASN

training EAST or PSENet with the pseudo label

Eval

Visualization

Dynamic Self Training

Train

Eval

Visualization

Experiments

Bounding Box Supervision

The performance of EAST on ICDAR15

The performance of EAST on MSRA-TD500

The performance of PSENet on ICDAR15

The performance of PSENet on MSRA-TD500

The performance of PSENet on Total Text

links

License

Citations

Owner

weijiawu

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

The Empirical Investigation of Representation Learning for Imitation (EIRLI)

ICCV2021 Expert-Goal Trajectory Prediction

For IBM Quantum Challenge 2021 (May 20 - 26)

Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

Implementation for paper LadderNet: Multi-path networks based on U-Net for medical image segmentation

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

A public available dataset for road boundary detection in aerial images

Optical machine for senses sensing using speckle and deep learning

Probabilistic Gradient Boosting Machines

This project aims to be a handler for input creation and running of multiple RICEWQ simulations.

Walk with fastai

Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs, ICCV 2021

A project to make Amazon Echo respond to sign language using your webcam

[ECCV 2020] Gradient-Induced Co-Saliency Detection

Riemannian Convex Potential Maps

PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds

A large-scale face dataset for face parsing, recognition, generation and editing.

Code To Tune or Not To Tune? Zero-shot Models for Legal Case Entailment.