딥러닝 음성 합성 (TTS) / 보코더 github, 논문 모음

Notice

Recent Posts

Recent Comments

Link

« 2025/03 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Tags more

Archives

Today

Total

관리 메뉴

chldkato

딥러닝 음성 합성 (TTS) / 보코더 github, 논문 모음 본문

딥러닝

딥러닝 음성 합성 (TTS) / 보코더 github, 논문 모음

chldkato 2019. 9. 15. 12:04

Tacotron

arxiv.org/abs/1703.10135

Tacotron: Towards End-to-End Speech Synthesis

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. Building these components often requires extensive domain expertise and may contain brittle design c

arxiv.org

https://github.com/keithito/tacotron

keithito/tacotron

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial) - keithito/tacotron

github.com

WaveNet 보코더

arxiv.org/abs/1609.03499

WaveNet: A Generative Model for Raw Audio

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that

arxiv.org

https://github.com/r9y9/wavenet_vocoder

r9y9/wavenet_vocoder

WaveNet vocoder. Contribute to r9y9/wavenet_vocoder development by creating an account on GitHub.

github.com

Tacotron2

arxiv.org/abs/1712.05884

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed

arxiv.org

https://github.com/Rayhane-mamah/Tacotron-2

Rayhane-mamah/Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation. Contribute to Rayhane-mamah/Tacotron-2 development by creating an account on GitHub.

github.com

WaveGlow

arxiv.org/abs/1811.00002

WaveGlow: A Flow-based Generative Network for Speech Synthesis

In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality audio synthesis, without the need

arxiv.org

https://github.com/NVIDIA/waveglow

NVIDIA/waveglow

A Flow-based Generative Network for Speech Synthesis - NVIDIA/waveglow

github.com

multi speaker Tacotron

https://github.com/carpedm20/multi-speaker-tacotron-tensorflow

carpedm20/multi-speaker-tacotron-tensorflow

Multi-speaker Tacotron in TensorFlow. Contribute to carpedm20/multi-speaker-tacotron-tensorflow development by creating an account on GitHub.

github.com

Tacotron + WaveNet

https://github.com/hccho2/Tacotron-Wavenet-Vocoder

hccho2/Tacotron-Wavenet-Vocoder

Tacotron, Korean, Wavenet-Vocoder, Korean TTS. Contribute to hccho2/Tacotron-Wavenet-Vocoder development by creating an account on GitHub.

github.com

https://github.com/hccho2/Tacotron2-Wavenet-Korean-TTS

hccho2/Tacotron2-Wavenet-Korean-TTS

Korean TTS, Tacotron2, Wavenet. Contribute to hccho2/Tacotron2-Wavenet-Korean-TTS development by creating an account on GitHub.

github.com

DCTTS

arxiv.org/abs/1710.08969

Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

This paper describes a novel text-to-speech (TTS) technique based on deep convolutional neural networks (CNN), without use of any recurrent units. Recurrent neural networks (RNN) have become a standard technique to model sequential data recently, and this

arxiv.org

https://github.com/Kyubyong/dc_tts

Kyubyong/dc_tts

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model - Kyubyong/dc_tts

github.com

https://github.com/Kyubyong/kss

Kyubyong/kss

Contribute to Kyubyong/kss development by creating an account on GitHub.

github.com

MelGAN

arxiv.org/abs/1910.06711

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Previous works (Donahue et al., 2018a; Engel et al., 2019a) have found that generating coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by i

arxiv.org

https://github.com/descriptinc/melgan-neurips

descriptinc/melgan-neurips

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis - descriptinc/melgan-neurips

github.com

https://github.com/seungwonpark/melgan

seungwonpark/melgan

MelGAN vocoder (compatible with NVIDIA/tacotron2). Contribute to seungwonpark/melgan development by creating an account on GitHub.

github.com

Tacotron2 + WaveGlow

https://github.com/NVIDIA/tacotron2

NVIDIA/tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference - NVIDIA/tacotron2

github.com

VocGAN

arxiv.org/abs/2007.15256

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

We present a novel high-fidelity real-time neural vocoder called VocGAN. A recently developed GAN-based vocoder, MelGAN, produces speech waveforms in real-time. However, it often produces a waveform that is insufficient in quality or inconsistent with acou

arxiv.org

github.com/rishikksh20/VocGAN

rishikksh20/VocGAN

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network - rishikksh20/VocGAN

github.com

TFGAN

arxiv.org/abs/2011.12206

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis

Recently, GAN based speech synthesis methods, such as MelGAN, have become very popular. Compared to conventional autoregressive based methods, parallel structures based generators make waveform generation process fast and stable. However, the quality of ge

arxiv.org

github.com/rishikksh20/TFGAN

rishikksh20/TFGAN

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis - rishikksh20/TFGAN

github.com

HiFi-GAN

arxiv.org/abs/2010.05646

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive a

arxiv.org

github.com/jik876/hifi-gan

jik876/hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis - jik876/hifi-gan

github.com

github.com/rishikksh20/HiFi-GAN

rishikksh20/HiFi-GAN

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis - rishikksh20/HiFi-GAN

github.com

WaveGrad

arxiv.org/abs/2009.00713

WaveGrad: Estimating Gradients for Waveform Generation

This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density. The model is built on prior work on score matching and diffusion probabilistic models. It starts from a Gaussian white noise signal a

arxiv.org

github.com/ivanvovk/WaveGrad

ivanvovk/WaveGrad

Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub. - ivanvovk/WaveGrad

github.com

저작자표시

'딥러닝' 카테고리의 다른 글

윈도우에서 Tacotron 한국어 TTS 학습하기 (98)	2020.03.25
윈도우에서 DCTTS (Deep Convolutional TTS) 학습하기 (19)	2019.10.30
윈도우에서 waveglow 학습하기 (4)	2019.09.14
윈도우에서 딥러닝 음성 합성(Multi-Speaker Tacotron) 학습하기 (12)	2019.07.28
윈도우에서 YOLO 학습하기 (9)	2019.07.12

'딥러닝' Related Articles

Comments

chldkato

딥러닝 음성 합성 (TTS) / 보코더 github, 논문 모음 본문

딥러닝 음성 합성 (TTS) / 보코더 github, 논문 모음

'딥러닝' 카테고리의 다른 글

티스토리툴바