Multi-Modal Machine Learning toolkit based on PyTorch.

Related tags

Deep LearningTorchMM
Overview

简体中文 | English

TorchMM

简介

多模态学习工具包 TorchMM 旨在于提供模态联合学习和跨模态学习算法模型库,为处理图片文本等多模态数据提供高效的解决方案,助力多模态学习应用落地。

近期更新

  • 2022.1.5 发布 TorchMM 初始版本 v1.0

特性

  • 丰富的任务场景:工具包提供多模态融合、跨模态检索、图文生成等多种多模态学习任务算法模型库,支持用户自定义数据和训练。
  • 成功的落地实践:基于工具包算法已有相关落地应用,如球鞋真伪鉴定、球鞋风格迁移、家具图片自动描述、舆情监控等。

应用展示

  • 球鞋真伪鉴定

更多信息欢迎访问我们的网站 Ysneaker

框架

TorchMM 包括以下模块:

  • 数据处理:提供统一的数据接口和多种数据处理格式
  • 模型库:包括多模态融合、跨模态检索、图文生成、多任务算法
  • 训练器:对每种任务设置统一的训练流程和相关指标计算

使用

下载工具包

git clone https://github.com/njustkmg/TorchMM.git

使用示例:

from torchmm import TorchMM

# config: Model running parameters, see configs/
# data_root: Path to dataset
# image_root: Path to images
# gpu: Which gpu to use

runner = PaddleMM(config='configs/cmml.yml',
                  data_root='data/COCO', 
                  image_root='data/COCO/images', 
                  cuda=0)

或者

python run.py --config configs/cmml.yml --data_root data/COCO --image_root data/COCO/images --cuda 0

模型库 (更新中)

[1] Comprehensive Semi-Supervised Multi-Modal Learning

[2] Stacked Cross Attention for Image-Text Matching

[3] Similarity Reasoning and Filtration for Image-Text Matching

[4] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

[5] Attention on Attention for Image Captioning

[6] VQA: Visual Question Answering

[7] ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

实验结果

多模态融合

Average_Precision Coverage Example_AUC Macro_AUC Micro_AUC Ranking_loss
CMML 0.682 18.827 0.948 0.927 0.950 0.052
Early(add) ResNet+LSTM
Early(concat) ResNet+GRU

许可证书

本项目的发布受 Apache 2.0 license 许可认证。

Owner
njustkmg
njustkmg
Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

44 Dec 12, 2022
Image classification for projects and researches

This is a tool to help you quickly solve classification problems including: data analysis, training, report results and model explanation.

Nguyễn Trường Lâu 2 Dec 27, 2021
System Combination for Grammatical Error Correction Based on Integer Programming

System Combination for Grammatical Error Correction Based on Integer Programming This repository contains the code and scripts that implement the syst

NUS NLP Group 0 Mar 29, 2022
Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

human-pose-estimation-3d-python-cpp RealSenseD435 (RGB) 480x640 + CPU Corei9 45 FPS (Depth is not used) 1. Run 1-1. RealSenseD435 (RGB) 480x640 + CPU

Katsuya Hyodo 8 Oct 03, 2022
This project uses Template Matching technique for object detecting by detection of template image over base image.

Object Detection Project Using OpenCV This project uses Template Matching technique for object detecting by detection the template image over base ima

Pratham Bhatnagar 7 May 29, 2022
A Deep learning based streamlit web app which can tell with which bollywood celebrity your face resembles.

Project Name: Which Bollywood Celebrity You look like A Deep learning based streamlit web app which can tell with which bollywood celebrity your face

BAPPY AHMED 20 Dec 28, 2021
Model of an AI powered sign language interpreter.

TEXT AND SPEECH TO SIGN LANGUAGE. A web application which takes in text or live audio speech recording as input, converts and displays the relevant Si

Mark Gatere 4 Mar 30, 2022
An Unsupervised Graph-based Toolbox for Fraud Detection

An Unsupervised Graph-based Toolbox for Fraud Detection Introduction: UGFraud is an unsupervised graph-based fraud detection toolbox that integrates s

SafeGraph 99 Dec 11, 2022
Use graph-based analysis to re-classify stocks and to improve Markowitz portfolio optimization

Dynamic Stock Industrial Classification Use graph-based analysis to re-classify stocks and experiment different re-classification methodologies to imp

Sheng Yang 10 Dec 05, 2022
🔥RandLA-Net in Tensorflow (CVPR 2020, Oral & IEEE TPAMI 2021)

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds (CVPR 2020) This is the official implementation of RandLA-Net (CVPR2020, Oral

Qingyong 1k Dec 30, 2022
Tools for investing in Python

InvestOps Original repository on GitHub Original author is Magnus Erik Hvass Pedersen Introduction This is a Python package with simple and effective

24 Nov 26, 2022
Unofficial Alias-Free GAN implementation. Based on rosinality's version with expanded training and inference options.

Alias-Free GAN An unofficial version of Alias-Free Generative Adversarial Networks (https://arxiv.org/abs/2106.12423). This repository was heavily bas

dusk (they/them) 75 Dec 12, 2022
PyTorch for Semantic Segmentation

PyTorch for Semantic Segmentation This repository contains some models for semantic segmentation and the pipeline of training and testing models, impl

Zijun Deng 1.7k Jan 06, 2023
The dynamics of representation learning in shallow, non-linear autoencoders

The dynamics of representation learning in shallow, non-linear autoencoders The package is written in python and uses the pytorch implementation to ML

Maria Refinetti 4 Jun 08, 2022
Source for the paper "Universal Activation Function for machine learning"

Universal Activation Function Tensorflow and Pytorch source code for the paper Yuen, Brosnan, Minh Tu Hoang, Xiaodai Dong, and Tao Lu. "Universal acti

4 Dec 03, 2022
AI-Fitness-Tracker - AI Fitness Tracker With Python

AI-Fitness-Tracker We have build a AI based Fitness Tracker using OpenCV and Pyt

Sharvari Mangale 5 Feb 09, 2022
AFL binary instrumentation

E9AFL --- Binary AFL E9AFL inserts American Fuzzy Lop (AFL) instrumentation into x86_64 Linux binaries. This allows binaries to be fuzzed without the

242 Dec 12, 2022
Code for our WACV 2022 paper "Hyper-Convolution Networks for Biomedical Image Segmentation"

Hyper-Convolution Networks for Biomedical Image Segmentation Code for our WACV 2022 paper "Hyper-Convolution Networks for Biomedical Image Segmentatio

Tianyu Ma 17 Nov 02, 2022
Make your master artistic punk avatar through machine learning world famous paintings.

Master-art-punk Make your master artistic punk avatar through machine learning world famous paintings. 通过机器学习世界名画制作属于你的大师级艺术朋克头像 Nowadays, NFT is beco

Philipjhc 53 Dec 27, 2022
基于Paddle框架的fcanet复现

fcanet-Paddle 基于Paddle框架的fcanet复现 fcanet 本项目基于paddlepaddle框架复现fcanet,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 参考项目: frazerlin-fcanet 数据准备 本项目已挂

QuanHao Guo 7 Mar 07, 2022