The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

Last update: Dec 02, 2022

Related tags

Deep Learning SF-Net

Overview

SF-Net for fullband SE

This is the repo of the manuscript "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement", which is submitted to Interspecch 2022. Some audio samples are provided here and the code for GCRN-full, DS-Net-full, CTS-Net-full and the network configuration of SF-Net are released.

Abstract：Due to the high computational complexity to model more frequency bands, it is still intractable to conduct real-time full-band speech enhancement based on deep neural networks. Recent studies typically utilize the compressed perceptually motivated features with relatively low frequency resolution to filter the full-band spectrum by one-stage networks, leading to limited speech quality improvements. In this paper, we propose a coordinated sub-band fusion network for full-band speech enhancement, which aims to recover the low- (0-8 kHz), middle- (8-16 kHz), and high-band (16-24 kHz) in a step-wise manner. Specifically, a dual-stream network is first pretrained to recover the low-band complex spectrum, and another two sub-networks are designed as the middle- and high-band noise suppressors in the magnitude-only domain. To fully capitalize on the information intercommunication, we employ a sub-band interaction module to provide external knowledge guidance across different frequency bands. Extensive experiments show that the proposed method yields consistent performance advantages over state-of-the-art full-band baselines.

The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

Related tags

Overview

SF-Net for fullband SE

Demo page of audio samples

System flowchart of SF-Net

Results:

Abaltion study

Comparison with SOTA

Visualization of spectrograms

VB dataset

DNS blind set

Owner

Guochen Yu

Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Official implementation of Long-Short Transformer in PyTorch.

lightweight python wrapper for vowpal wabbit

FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks

Here we present the implementation in TensorFlow of our work about liver lesion segmentation accepted in the Machine Learning 4 Health Workshop

This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".

an implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation using PyTorch

PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

The code release of paper Low-Light Image Enhancement with Normalizing Flow

Implementation of Invariant Point Attention, used for coordinate refinement in the structure module of Alphafold2, as a standalone Pytorch module

Byzantine-robust decentralized learning via self-centered clipping

Groceries ARL: Association Rules (Birliktelik Kuralı)

CVPR2021 Workshop - HDRUNet: Single Image HDR Reconstruction with Denoising and Dequantization.

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Example of a Quantum LSTM

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

Voice control for Garry's Mod

Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.

Deep Reinforcement Learning based Trading Agent for Bitcoin