This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

Last update: Dec 30, 2022

Related tags

Deep Learning clipseg

Overview

Prompt-Based Multi-Modal Image Segmentation

This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

The systems allows to create segmentation models without training based on:

An arbitrary text query
Or an image with a mask highlighting stuff or an object.

Quick Start

In the Quickstart.ipynb notebook we provide the code for using a pre-trained CLIPSeg model. It can also be used interactively using MyBinder (please note that the VM does not use a GPU, thus inference takes a few seconds).

Dependencies

This code base depends on pytorch, torchvision and clip (pip install git+https://github.com/openai/CLIP.git). Additional dependencies are hidden for double blind review.

Datasets

PhraseCut and PhraseCutPlus: Referring expression dataset
PFEPascalWrapper: Wrapper class for PFENet's Pascal-5i implementation
PascalZeroShot: Wrapper class for PascalZeroShot
COCOWrapper: Wrapper class for COCO.

Models

CLIPDensePredT: CLIPSeg model with transformer-based decoder.
ViTDensePredT: CLIPSeg model with transformer-based decoder.

Third Party Dependencies

For some of the datasets third party dependencies are required. Run the following commands in the third_party folder.

git clone https://github.com/cvlab-yonsei/JoEm
git clone https://github.com/Jia-Research-Lab/PFENet.git
git clone https://github.com/ChenyunWu/PhraseCutDataset.git
git clone https://github.com/juhongm999/hsnet.git

Weights

CLIPSeg-D64 (4.1MB, without CLIP weights)
CLIPSeg-D16 (1.1MB, without CLIP weights)

Training

See the experiment folder for yaml definitions of the training configurations. The training code is in experiment_setup.py.

Usage of PFENet Wrappers

In order to use the dataset and model wrappers for PFENet, the PFENet repository needs to be cloned to the root folder. git clone https://github.com/Jia-Research-Lab/PFENet.git

Citation

@article{lueddecke21
    title={Prompt-Based Multi-Modal Image Segmentation},
    author={Timo Lüddecke and Alexander Ecker},
    journal={arXiv preprint arXiv:2112.10003},
    year={2021}
}

This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

Related tags

Overview

Prompt-Based Multi-Modal Image Segmentation

Quick Start

Dependencies

Datasets

Models

Third Party Dependencies

Weights

Training

Usage of PFENet Wrappers

Citation

Owner

Timo Lüddecke

Intel® Neural Compressor is an open-source Python library running on Intel CPUs and GPUs

Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

KDD CUP 2020 Automatic Graph Representation Learning: 1st Place Solution

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

This is the pytorch code for the paper Curious Representation Learning for Embodied Intelligence.

Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

September-Assistant - Open-source Windows Voice Assistant

Import Python modules from dicts and JSON formatted documents.

EqGAN - Improving GAN Equilibrium by Raising Spatial Awareness

Start-to-finish tutorial for interactive music co-creation in PyTorch and Tensorflow.js

A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi

Deep Learning Specialization by Andrew Ng, deeplearning.ai.

The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning"

Efficient Online Bayesian Inference for Neural Bandits

Matplotlib Image labeller for classifying images

SwinIR: Image Restoration Using Swin Transformer

Code for "Learning to Regrasp by Learning to Place"

Classical OCR DCNN reproduction based on PaddlePaddle framework.

prior-based-losses-for-medical-image-segmentation

Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"