Deep Semisupervised Multiview Learning With Increasing Views (IEEE TCYB 2021, PyTorch Code)

Last update: Nov 19, 2022

Related tags

Overview

Deep Semisupervised Multiview Learning With Increasing Views (ISVN, IEEE TCYB)

Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, Jie Lin, Huaibai Yan, Dezhong Peng, Deep Semisupervised Multiview Learning With Increasing Views[J]. IEEE Transactions on Cybernetics, Online. (PyTorch Code)

Abstract

In this article, we study two challenging problems in semisupervised cross-view learning. On the one hand, most existing methods assume that the samples in all views have a pairwise relationship, that is, it is necessary to capture or establish the correspondence of different views at the sample level. Such an assumption is easily isolated even in the semisupervised setting wherein only a few samples have labels that could be used to establish the correspondence. On the other hand, almost all existing multiview methods, including semisupervised ones, usually train a model using a fixed dataset, which cannot handle the data of increasing views. In practice, the view number will increase when new sensors are deployed. To address the above two challenges, we propose a novel method that employs multiple independent semisupervised view-specific networks (ISVNs) to learn representation for multiple views in a view-decoupling fashion. The advantages of our method are two-fold. Thanks to our specifically designed autoencoder and pseudolabel learning paradigm, our method shows an effective way to utilize both the labeled and unlabeled data while relaxing the data assumption of the pairwise relationship, that is, correspondence. Furthermore, with our view decoupling strategy, the proposed ISVNs could be separately trained, thus efficiently handling the data of increasing views without retraining the entire model. To the best of our knowledge, our ISVN could be one of the first attempts to make handling increasing views in the semisupervised setting possible, as well as an effective solution to the noncorresponding problem. To verify the effectiveness and efficiency of our method, we conduct comprehensive experiments by comparing 13 state-of-the-art approaches on four multiview datasets in terms of retrieval and classification.

Framework

Figure 1. Difference between (a) existing joint multiview learning and (b) our independent multiview learning. In brief, the traditional methods use all views to learn the common space. They are difficult to handle increasing views since their models are optimized depending on all views. Thus, they should retrain the whole model to handle new views, which is inefficient with abandoning the trained model. In contrast, our method independently trains the k view-specific models for the k new views, thus efficiently handling increasing views.

Figure 2. Pipeline of our ISVN for the 𝓲th view. All views could be separately projected into the common space without any interview constraints, and could easily and efficiently handle new views.

Usage

To train a model for image modelity wtih 64 bits on $datasets, just run main_DCHN.py as follows:

python train_ISVN.py --datasets $datasets --epochs $epochs --batch_size $batch_size --view_id $view --output_shape $output_shape --beta $beta --alpha $alpha --threshold $threshold --K $K --gpu_id $gpu_id

where $datasets, $epochs, $batch_size, $view, $output_shape, $beta, $alpha, $threshold, $K, and $gpu_id are the name of dataset, epoch , batch size, view number, objective dimensionality, β, α，γ, the number of labeled data, and GPU ID, respectively.

To evaluate the trained models, you could run train_ISVN.py as follows:

python train_ISVN.py --mode eval --datasets $datasets --view -1 --output_shape $output_shape --beta $beta --alpha $alpha --K $K --gpu_id $gpu_id --num_workers 0

Comparison with the State-of-the-Art

Table 1. Performance comparison in terms of mAP scores on the XMediaNet dataset. The highest score is shown in boldface.

Table 2. Performance comparison in terms of mAP scores on the NUS-WIDE dataset. The highest score is shown in boldface.

Table 3. Performance comparison in terms of mAP scores on the INRIA-Websearch dataset. The highest score is shown in boldface.

Table 4. Performance comparison in terms of cross-view top-1 classification on the MNIST-SVHN dataset. The highest score is shown in boldface.

Table 5. Ablation study on different datasets. X denotes training ISVN without X, and X could be autoencoder (AE) and pseudo-label (PL). This table shows the experimental results of cross-view retrieval on XMediaNet and NUS-WIDE, and of cross-view classification on MNIST-SVHN. The highest score is shown in boldface.

Citation

If you find ISVN useful in your research, please consider citing:

@inproceedings{hu2021ISVN,
  author={Hu, Peng and Peng, Xi and Zhu, Hongyuan and Zhen, Liangli and Lin, Jie and Yan, Huaibai and Peng, Dezhong},
  journal={IEEE Transactions on Cybernetics}, 
  title={Deep Semisupervised Multiview Learning With Increasing Views}, 
  year={2021},
  volume={},
  number={},
  pages={1-12},
  doi={10.1109/TCYB.2021.3093626}}
}

Deep Semisupervised Multiview Learning With Increasing Views (IEEE TCYB 2021, PyTorch Code)

Related tags

Overview

Deep Semisupervised Multiview Learning With Increasing Views (ISVN, IEEE TCYB)

Abstract

Framework

Figure 2. Pipeline of our ISVN for the 𝓲th view. All views could be separately projected into the common space without any interview constraints, and could easily and efficiently handle new views.

Usage

Comparison with the State-of-the-Art

Table 1. Performance comparison in terms of mAP scores on the XMediaNet dataset. The highest score is shown in boldface.

Table 2. Performance comparison in terms of mAP scores on the NUS-WIDE dataset. The highest score is shown in boldface.

Table 3. Performance comparison in terms of mAP scores on the INRIA-Websearch dataset. The highest score is shown in boldface.

Table 4. Performance comparison in terms of cross-view top-1 classification on the MNIST-SVHN dataset. The highest score is shown in boldface.

Citation

Owner

Python parser for DTED data.

Perform Linear Classification with Multi-way Data

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Supplementary code for TISMIR paper "Sliding-Window Pitch-Class Histograms as a Means of Modeling Musical Form"

Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

Towhee is a flexible machine learning framework currently focused on computing deep learning embeddings over unstructured data.

Implementations of orthogonal and semi-orthogonal convolutions in the Fourier domain with applications to adversarial robustness

A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

A simplified framework and utilities for PyTorch

A distributed, plug-n-play algorithm for multi-robot applications with a priori non-computable objective functions

Semi-Supervised Semantic Segmentation with Pixel-Level Contrastive Learning from a Class-wise Memory Bank

Automatically erase objects in the video, such as logo, text, etc.

Hierarchical probabilistic 3D U-Net, with attention mechanisms (—𝘈𝘵𝘵𝘦𝘯𝘵𝘪𝘰𝘯 𝘜-𝘕𝘦𝘵, 𝘚𝘌𝘙𝘦𝘴𝘕𝘦𝘵) and a nested decoder structure with deep supervision (—𝘜𝘕𝘦𝘵++).

TimeSHAP explains Recurrent Neural Network predictions.

PyTorch implementation of "LayoutTransformer: Layout Generation and Completion with Self-attention"

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

FFCV: Fast Forward Computer Vision (and other ML workloads!)

Time Delayed NN implemented in pytorch

SigOpt wrappers for scikit-learn methods