윈도우에서 Tacotron 한국어 TTS 학습하기

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

chldkato

윈도우에서 Tacotron 한국어 TTS 학습하기 본문

딥러닝

윈도우에서 Tacotron 한국어 TTS 학습하기

chldkato 2020. 3. 25. 17:51

Tensorflow1

github.com/chldkato/Tacotron-Korean

chldkato/Tacotron-Korean

Contribute to chldkato/Tacotron-Korean development by creating an account on GitHub.

github.com

Tensorflow2

github.com/chldkato/Tacotron-Korean-Tensorflow2

chldkato/Tacotron-Korean-Tensorflow2

Contribute to chldkato/Tacotron-Korean-Tensorflow2 development by creating an account on GitHub.

github.com

pytorch

https://github.com/chldkato/Tacotron-pytorch

GitHub - chldkato/Tacotron-pytorch

Contribute to chldkato/Tacotron-pytorch development by creating an account on GitHub.

github.com

1. 한국어 음성 데이터 KSS preprocess

https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset

Korean Single Speaker Speech Dataset

KSS Dataset: Korean Single Speaker Speech Dataset

www.kaggle.com

위의 KSS 데이터를 사용합니다. (약 4기가)

다운로드 한 후 아래와 같이 압축을 풀어주세요

Tacotron_Korean
  |- kss
      |- 1
      |- 2
      |- 3
      |- 4
      |- transcript.v.1.x.txt

학습에 사용할 데이터를 만들기 위해 preprocess를 실행합니다

python preprocess.py

실행 후, data폴더에 학습에 필요한 파일들이 생성됩니다

text, mel, spec는 각각 텍스트, 멜스펙트로그램, 스펙트로그램이고 dec는 디코더 입력이 되는 멜스펙입니다

text_len은 dynamic rnn에 들어가는 각 텍스트의 길이, mel_len은 zero padding 최소화를 위한 각 멜스펙의 길이입니다

2. Train

train1.py - train2.py 순으로 실행하면 바로 학습이 진행됩니다

train1은 embedding 부터 디코더까지, train2는 post CBHG입니다

# tensorflow
python train1.py
python train2.py

# pytorch
python train1.py -n <name>
python train2.py -n <name>

실행 후에는 checkpoint 폴더에 학습한 모델이 각각 저장되고 attention alignment 그래프가 저장됩니다

pytorch는 원하는 이름을 정해서 ckpt/<name> 경로에 모델이 저장되도록 했습니다

tensorflow v2의 경우 저장된 모델이 있으면 가장 최근의 모델을 불러와서 다시 학습합니다

tensorflow v1, pytorch 에서 이전에 학습한 모델을 불러와서 다시 학습하려면 아래와 같이 실행하면 됩니다

# tensorflow
python train1.py --step 100000
python train2.py --step 100000

# pytorch
python train1.py -n <nanme> -c ckpt/<name>/1/ckpt-<step>.pt
python train1.py -n <nanme> -c ckpt/<name>/2/ckpt-<step>.pt

본인이 학습한 모델에 맞게 숫자를 수정하면 됩니다

3. Synthesize

test1.py를 열어서 sentences에 합성할 문장을 정해줍니다

불러올 모델을 지정하여 test1.py - test2.py 순으로 실행하면 됩니다

tensorflow v2는 학습 때와 마찬가지로 최근 모델을 불러오기 때문에 python test1.py만 입력하면 됩니다

# tensorflow
python test1.py --step 100000
python test2.py --step 100000

# pytorch
python test1.py -c ckpt/<name>/1/ckpt-<step>.pt
python test1.py -c ckpt/<name>/2/ckpt-<step>.pt

tensorflow v1은 재학습과 마찬가지로 숫자는 본인에 맞게 수정하면 됩니다

output 폴더에 alignment 그래프와 wav 파일이 생성됩니다

아래는 학습할 때 attention alignment 그래프입니다

* 음성이 끝까지 출력되지 않는 이슈가 있어서 tes2.py에서 endpoint를 찾는 것을 trim으로 대체했습니다

* tensorflow2와 pytorch 코드 구성이 다른데, pytorch는 autoregressive한 특징을 살려서 구현했습니다

저작자표시

'딥러닝' 카테고리의 다른 글

Tacotron 정리 (8)	2020.04.03
MelGAN 정리 (4)	2020.04.03
윈도우에서 DCTTS (Deep Convolutional TTS) 학습하기 (19)	2019.10.30
딥러닝 음성 합성 (TTS) / 보코더 github, 논문 모음 (0)	2019.09.15
윈도우에서 waveglow 학습하기 (4)	2019.09.14

'딥러닝' Related Articles

Comments

chldkato

윈도우에서 Tacotron 한국어 TTS 학습하기 본문

윈도우에서 Tacotron 한국어 TTS 학습하기

'딥러닝' 카테고리의 다른 글

티스토리툴바