๐Ÿ‡ฐ๐Ÿ‡ท Text to Image in Korean

Overview

KoDALLE

Open In Colab Wandb Log

image-20211227151557604

Utilizing pretrained language modelโ€™s token embedding layer and position embedding layer as DALLEโ€™s text encoder.

Background

  • Training DALLE model from scratch demands large size paired dataset of images and captions. For example, OpenAI DALLE is trained with more than 250 million text-image pairs for the training.
  • If the dataset isnโ€™t large enough or is limited to specific domains, number of vocabularies in the trained DALLE model are insufficient. For instance, 1 million text captions of K-Fashion dataset only consists of more or less than 300 tokens.
  • Therefore, inferencing from such DALLE models could be problematic if the given sentence query is unconnected to the originally trained captionsโ€™ text dataset.

KoDALLE's Result on Small Size Fashion Dataset

OpenAIโ€™s DALLE KoDALLE of HappyFace
Train Dataset Size 250 Million Pairs 0.8 Million Pairs
#Params 12 Billion 428 Million
#Layers 64 Layers 16 Layers
Computing Resource 1024 x V100 16GB 1 x V100 32GB
Text Encoder 16384 Vocab x 512 Dim BPE 32000 Vocab x 1024 Dim klue/roberta-large
Image Encoder VQVAE VQGAN
Optimizer AdamW AdamW
Learning Rate 4.5e-5 3.0e-5
Weight Decay 4.5e-3 3.0e-3
LR Scheduler ReduceLROnPlateau -

The team constructed Text to Fashion Design DALLE model in Korean language with less than 100k text-image sampled pairs.

Caption ํ•˜์˜์—์„œ ์ƒ‰์ƒ์€ ์Šค์นด์ด๋ธ”๋ฃจ์ด๋‹ค. ์ƒ์˜์—์„œ ๊ธฐ์žฅ์€ ๋กฑ์ด๋‹ค. ์ƒ‰์ƒ์€ ํ™”์ดํŠธ์ด๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ๋ธ”๋ผ์šฐ์Šค์ด๋‹ค. ๋””ํ…Œ์ผ์—๋Š” ์…”๋ง์ด๋‹ค. ์†Œ๋งค๊ธฐ์žฅ์€ ๋ฐ˜ํŒ”์ด๋‹ค. ์†Œ์žฌ์—๋Š” ์‹คํฌ์ด๋‹ค. ํ”„๋ฆฐํŠธ์—๋Š” ๋ฌด์ง€์ด๋‹ค. ๋„ฅ๋ผ์ธ์€ ๋ธŒ์ด๋„ฅ์ด๋‹ค. ํ•์€ ๋…ธ๋ฉ€
Generated Image image
Caption ์•„์šฐํ„ฐ๋Š” ์ƒ‰์ƒ์ด ์นดํ‚ค ์†Œ์žฌ๊ฐ€ ์šฐ๋ธ ํ•์ด ๋ฃจ์ฆˆ์ธ ์ฝ”ํŠธ์ด๋‹ค. ํ•˜์˜๋Š” ์ƒ‰์ƒ์ด ๋„ค์ด๋น„ ์†Œ์žฌ๊ฐ€ ๋ฐ๋‹˜ ํ•์ด ์Šคํ‚ค๋‹ˆ์ธ ์ฒญ๋ฐ”์ง€์ด๋‹ค.
Generated Image image
Caption ํ•˜์˜์—์„œ ๊ธฐ์žฅ์€ ๋ฐœ๋ชฉ์ด๋‹ค. ์ƒ‰์ƒ์€ ๋ธ”๋ฃจ์ด๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ์Šค์ปคํŠธ์ด๋‹ค. ์†Œ์žฌ์—๋Š” ๋ฐ๋‹˜์ด๋‹ค. ํ•์€ ์™€์ด๋“œ์ด๋‹ค. ์ƒ์˜์—์„œ ์ƒ‰์ƒ์€ ํ™”์ดํŠธ์ด๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ๋ธ”๋ผ์šฐ์Šค์ด๋‹ค. ๋””ํ…Œ์ผ์—๋Š” ์…”๋ง์ด๋‹ค. ์†Œ๋งค๊ธฐ์žฅ์€ ๋ฐ˜ํŒ”์ด๋‹ค. ์†Œ์žฌ์—๋Š” ์šฐ๋ธ์ด๋‹ค.
Generated Image image
Caption ์ƒ์˜์—์„œ ๊ธฐ์žฅ์€ ๋…ธ๋ฉ€์ด๋‹ค. ์ƒ์˜์—์„œ ์ƒ‰์ƒ์€ ํ™”์ดํŠธ์ด๋‹ค. ์ƒ์˜์—์„œ ์„œ๋ธŒ์ƒ‰์ƒ์€ ๋ธ”๋ž™์ด๋‹ค. ์ƒ์˜์—์„œ ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ํ‹ฐ์…”์ธ ์ด๋‹ค. ์ƒ์˜์—์„œ ์†Œ๋งค๊ธฐ์žฅ์€ ๋ฐ˜ํŒ”์ด๋‹ค. ์ƒ์˜์—์„œ ์†Œ์žฌ์—๋Š” ์ €์ง€์ด๋‹ค. ์ƒ์˜์—์„œ ํ”„๋ฆฐํŠธ์—๋Š” ๋ ˆํ„ฐ๋ง์ด๋‹ค. ์ƒ์˜์—์„œ ๋„ฅ๋ผ์ธ์€ ๋ผ์šด๋“œ๋„ฅ์ด๋‹ค. ์ƒ์˜์—์„œ ํ•์€ ๋ฃจ์ฆˆ์ด๋‹ค.
Generated Image image

Methodology

Experimentations were conducted with the following Korean Transformers Modelsโ€™ embedding layers. The team selected klue/roberta-large as baseline in the repository considering the size of the model.

KoDALLE with klue/roberta-large's wpe and wte which is trainable on 16GB GPU Google Colab environment. Hyperparams related to the DALLE's model size are following.

'BATCH_SIZE': 32
'DEPTH': 2
'TEXT_SEQ_LEN': 128
'VOCAB_SIZE': 32000
'MODEL_DIM': 1024
'ATTN_TYPES': 'full'
'DIM_HEAD': 64
'HEADS': 8

Significance

  • Offers promising result for training from scratch on specific domains with small size dataset.
  • Introduces solution for domain specific DALLE & CLIP models to be robust on input sentence.
  • Recommends adequate text-to-image model size for given computation resource.
  • Suggests effortless method of creating DALLE & CLIP model for own languages if pretrained language model is available.

WIP

  • Add image-caption reranker(EfficientNet + Klue/roberta-large)
  • Model trained with 500k text-image pairs.
  • Modulize in python code.
  • Update Inference code.
  • Update FID and IS metrics on test and validation dataset.
You might also like...
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

BARTScore: Evaluating Generated Text as Text Generation
BARTScore: Evaluating Generated Text as Text Generation

This is the Repo for the paper: BARTScore: Evaluating Generated Text as Text Generation Updates 2021.06.28 Release online evaluation Demo 2021.06.25 R

Code for EMNLP 2021 main conference paper
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task Automatic number plate recognition using tech:  Yolo, OCR, Scene text detection, scene text recognation, flask, torch
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

Comments
  • Koclip apply in KoDALLE

    Koclip apply in KoDALLE

    ๋ณ€๊ฒฝ์‚ฌํ•ญ

    add) model.py

    ํ˜„์ˆ˜๋‹˜์˜ KoCLIP์ด DALLE Roberta ์—์„œ ์ž‘๋™ํ•˜๊ฒŒ๋” ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•œ ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

    dev branch์— ์กด์žฌํ•˜๋Š” model.py ๋น„๊ตํ•˜๋ฉด์„œ ์ˆ˜์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

    add) generate.ipynb

    KoCLIP์ด ์ž‘๋™ํ•˜๋Š”๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“  ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

    opened by JoonHong-Kim 1
  • add: KoCLIP codes

    add: KoCLIP codes

    ๋ณ€๊ฒฝ์‚ฌํ•ญ:

    refactor) clipmodel.py

    • CLIPModel ์ตœ์ข… ๋ฒ„์ „์œผ๋กœ ์ˆ˜์ •
    • clip folder๋กœ ์ด๋™

    add) clip/train_clip.py

    • CLIP ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉํ•œ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค

    add) clip/dataloader.py

    • CLIP ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉํ•œ dataloader ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
    opened by shawnhyeonsoo 0
  • add skip_sample in TextImageDataset

    add skip_sample in TextImageDataset

    ๋ณ€๊ฒฝ์‚ฌํ•ญ

    modify) loader.py

    • TextImageDataset์—์„œ texts, image๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ, data๊ฐ€ ์—†์„ ๊ฒฝ์šฐ ๋ฐœ์ƒํ•˜๋Š” ์—๋Ÿฌ ์ฒ˜๋ฆฌ
    • skip_sample ํ•จ์ˆ˜๋ฅผ ํ™œ์šฉํ•˜์—ฌ error๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒฝ์šฐ, random ํ˜น์€ ๋‹ค์Œ index๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ skip
    • ๊ธฐ์กด train_dalle_gpt_roberta.py๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ˆ˜์ •
    opened by jjonhwa 0
Releases(v0.1.0-beta)
PyTorch implementation of the Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning This is the official PyTorch implementation of the ContrastiveCrop paper: @artic

249 Dec 28, 2022
Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel

Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel This repository is the official PyTorch implementation of BSRDM w

Zongsheng Yue 69 Jan 05, 2023
3D-Reconstruction ๅŸบไบŽๆทฑๅบฆๅญฆไน ๆ–นๆณ•็š„ๅ•็›ฎๅคš่ง†ๅ›พไธ‰็ปด้‡ๅปบ

ๅŸบไบŽๆทฑๅบฆๅญฆไน ๆ–นๆณ•็š„ๅ•็›ฎๅคš่ง†ๅ›พไธ‰็ปด้‡ๅปบ Part I ไธ‰็ปด้‡ๅปบ ไปฃ็ ๏ผšPart1 ๆŠ€ๆœฏๆ–‡ๆกฃ๏ผš[Markdown] [PDF] ๅŽŸๅง‹ๅ›พๅƒ๏ผšOriginal Images ็‚นไบ‘็ป“ๆžœ๏ผšPoint Cloud Results-1

HMT_Curo 19 Dec 26, 2022
Implemented fully documented Particle Swarm Optimization algorithm (basic model with few advanced features) using Python programming language

Implemented fully documented Particle Swarm Optimization (PSO) algorithm in Python which includes a basic model along with few advanced features such as updating inertia weight, cognitive, social lea

9 Nov 29, 2022
Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.

Angora Angora is a mutation-based coverage guided fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without s

833 Jan 07, 2023
Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomaly Detection

Why, hello there! This is the supporting notebook for the research paper โ€” Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomal

2 Dec 14, 2021
ReSSL: Relational Self-Supervised Learning with Weak Augmentation

ReSSL: Relational Self-Supervised Learning with Weak Augmentation This repository contains PyTorch evaluation code, training code and pretrained model

mingkai 45 Oct 25, 2022
This project generates news headlines using a Long Short-Term Memory (LSTM) neural network.

News Headlines Generator bunnysaini/Generate-Headlines Goal This project aims to generate news headlines using a Long Short-Term Memory (LSTM) neural

Bunny Saini 1 Jan 24, 2022
Example of a Quantum LSTM

Example of a Quantum LSTM

Riccardo Di Sipio 36 Oct 31, 2022
Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image [Project Page] [Paper] [Supp. Mat.] Table of Contents License Description Fittin

Vassilis Choutas 1.3k Jan 07, 2023
Multiband spectro-radiometric satellite image analysis with K-means cluster algorithm

Multi-band Spectro Radiomertric Image Analysis with K-means Cluster Algorithm Overview Multi-band Spectro Radiomertric images are images comprising of

Chibueze Henry 6 Mar 16, 2022
Pytorch implementation of Bert and Pals: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

PyTorch implementation of BERT and PALs Introduction Work by Asa Cooper Stickland and Iain Murray, University of Edinburgh. Code for BERT and PALs; mo

Asa Cooper Stickland 70 Dec 29, 2022
Unsupervised Feature Ranking via Attribute Networks.

FRANe Unsupervised Feature Ranking via Attribute Networks (FRANe) converts a dataset into a network (graph) with nodes that correspond to the features

7 Sep 29, 2022
An open-source Deep Learning Engine for Healthcare that aims to treat & prevent major diseases

AlphaCare Background AlphaCare is a work-in-progress, open-source Deep Learning Engine for Healthcare that aims to treat and prevent major diseases. T

Siraj Raval 44 Nov 05, 2022
Yoga - Yoga asana classifier for python

Yoga Asana Classifier Description Hi welcome to my new deep learning project "Yo

Programminghut 35 Dec 12, 2022
StarGAN2 for practice

StarGAN2 for practice This version of StarGAN2 (coined as 'Post-modern Style Transfer') is intended mostly for fellow artists, who rarely look at scie

vadim epstein 87 Sep 24, 2022
A Quick and Dirty Progressive Neural Network written in TensorFlow.

prog_nn .โ–„โ–„ ยท โ–„ยท โ–„โ–Œ โ– โ–„ โ–„โ–„โ–„ยท โ– โ–„ โ–โ–ˆ โ–€. โ–โ–ˆโ–ชโ–ˆโ–ˆโ–Œโ€ขโ–ˆโ–Œโ–โ–ˆโ–โ–ˆ โ–„โ–ˆโ–ช โ€ขโ–ˆโ–Œโ–โ–ˆ โ–„โ–€โ–€โ–€โ–ˆโ–„โ–โ–ˆโ–Œโ–โ–ˆโ–ชโ–โ–ˆโ–โ–โ–Œ โ–ˆโ–ˆโ–€

SynPon 53 Dec 12, 2022
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

DatasetGAN This is the official code and data release for: DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort Yuxuan Zhang*, Huan Li

302 Jan 05, 2023
Self-driving car env with PPO algorithm from stable baseline3

Self-driving car with RL stable baseline3 Most of the project develop from https://github.com/GerardMaggiolino/Gym-Medium-Post Please check it out! Th

Sornsiri.P 7 Dec 22, 2022
PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021)

PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021) This repo presents PyTorch implementation of M

Evgeny 79 Dec 19, 2022