chldkato

딥러닝 음성 합성 (TTS) / 보코더 github, 논문 모음 본문

딥러닝

딥러닝 음성 합성 (TTS) / 보코더 github, 논문 모음

chldkato 2019. 9. 15. 12:04

Tacotron

arxiv.org/abs/1703.10135

 

Tacotron: Towards End-to-End Speech Synthesis

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. Building these components often requires extensive domain expertise and may contain brittle design c

arxiv.org

https://github.com/keithito/tacotron

 

keithito/tacotron

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial) - keithito/tacotron

github.com

 

WaveNet 보코더

arxiv.org/abs/1609.03499

 

WaveNet: A Generative Model for Raw Audio

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that

arxiv.org

https://github.com/r9y9/wavenet_vocoder

 

r9y9/wavenet_vocoder

WaveNet vocoder. Contribute to r9y9/wavenet_vocoder development by creating an account on GitHub.

github.com

 

Tacotron2

arxiv.org/abs/1712.05884

 

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed

arxiv.org

https://github.com/Rayhane-mamah/Tacotron-2

 

Rayhane-mamah/Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation. Contribute to Rayhane-mamah/Tacotron-2 development by creating an account on GitHub.

github.com

 

WaveGlow

arxiv.org/abs/1811.00002

 

WaveGlow: A Flow-based Generative Network for Speech Synthesis

In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality audio synthesis, without the need

arxiv.org

https://github.com/NVIDIA/waveglow

 

NVIDIA/waveglow

A Flow-based Generative Network for Speech Synthesis - NVIDIA/waveglow

github.com

 

multi speaker Tacotron

https://github.com/carpedm20/multi-speaker-tacotron-tensorflow

 

carpedm20/multi-speaker-tacotron-tensorflow

Multi-speaker Tacotron in TensorFlow. Contribute to carpedm20/multi-speaker-tacotron-tensorflow development by creating an account on GitHub.

github.com

 

 

Tacotron + WaveNet

https://github.com/hccho2/Tacotron-Wavenet-Vocoder

 

hccho2/Tacotron-Wavenet-Vocoder

Tacotron, Korean, Wavenet-Vocoder, Korean TTS. Contribute to hccho2/Tacotron-Wavenet-Vocoder development by creating an account on GitHub.

github.com

https://github.com/hccho2/Tacotron2-Wavenet-Korean-TTS

 

hccho2/Tacotron2-Wavenet-Korean-TTS

Korean TTS, Tacotron2, Wavenet. Contribute to hccho2/Tacotron2-Wavenet-Korean-TTS development by creating an account on GitHub.

github.com

 

DCTTS

arxiv.org/abs/1710.08969

 

Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

This paper describes a novel text-to-speech (TTS) technique based on deep convolutional neural networks (CNN), without use of any recurrent units. Recurrent neural networks (RNN) have become a standard technique to model sequential data recently, and this

arxiv.org

https://github.com/Kyubyong/dc_tts

 

Kyubyong/dc_tts

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model - Kyubyong/dc_tts

github.com

https://github.com/Kyubyong/kss

 

Kyubyong/kss

Contribute to Kyubyong/kss development by creating an account on GitHub.

github.com

 

MelGAN

arxiv.org/abs/1910.06711

 

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Previous works (Donahue et al., 2018a; Engel et al., 2019a) have found that generating coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by i

arxiv.org

https://github.com/descriptinc/melgan-neurips

 

descriptinc/melgan-neurips

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis - descriptinc/melgan-neurips

github.com

https://github.com/seungwonpark/melgan

 

seungwonpark/melgan

MelGAN vocoder (compatible with NVIDIA/tacotron2). Contribute to seungwonpark/melgan development by creating an account on GitHub.

github.com

 

Tacotron2 + WaveGlow

https://github.com/NVIDIA/tacotron2

 

NVIDIA/tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference - NVIDIA/tacotron2

github.com

 

VocGAN

arxiv.org/abs/2007.15256

 

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

We present a novel high-fidelity real-time neural vocoder called VocGAN. A recently developed GAN-based vocoder, MelGAN, produces speech waveforms in real-time. However, it often produces a waveform that is insufficient in quality or inconsistent with acou

arxiv.org

github.com/rishikksh20/VocGAN

 

rishikksh20/VocGAN

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network - rishikksh20/VocGAN

github.com

 

TFGAN

arxiv.org/abs/2011.12206

 

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis

Recently, GAN based speech synthesis methods, such as MelGAN, have become very popular. Compared to conventional autoregressive based methods, parallel structures based generators make waveform generation process fast and stable. However, the quality of ge

arxiv.org

github.com/rishikksh20/TFGAN

 

rishikksh20/TFGAN

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis - rishikksh20/TFGAN

github.com

 

HiFi-GAN

arxiv.org/abs/2010.05646

 

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive a

arxiv.org

github.com/jik876/hifi-gan

 

jik876/hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis - jik876/hifi-gan

github.com

github.com/rishikksh20/HiFi-GAN

 

rishikksh20/HiFi-GAN

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis - rishikksh20/HiFi-GAN

github.com

 

WaveGrad

arxiv.org/abs/2009.00713

 

WaveGrad: Estimating Gradients for Waveform Generation

This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density. The model is built on prior work on score matching and diffusion probabilistic models. It starts from a Gaussian white noise signal a

arxiv.org

github.com/ivanvovk/WaveGrad

 

ivanvovk/WaveGrad

Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub. - ivanvovk/WaveGrad

github.com

 

Comments