Recognition of Chinese eight-digit speech. / 识别中文的八位数字语音。
- change speech signal to image
- use CNN recognize image
- input:
0_a.wav | 0_b.wav | 2_a.wav | 3_a.wav | 3_b.wav | ... - output:
0 | 0 | 2 | 3 | 3 | ... - model:
VGG13 for 10 class(0-9 numbers) - data:
2k+ single number specch data from 60+ people
use 5 kinds of augment, include crop、loudness、noise、pitch and speed
final training data 13k+ - pretrained_model
- input:
08416923_a.wav | 79684315_b.wav | 29368741_a.wav | ... - output:
08416923 | 79684315 | 29368741 | ... - model:
speech recognition has two models: acoustic-model and language-model
plan-B just use acoustic-model: CNN+LSTM+CTC - data:
lack of continuous voice data - pretrained_model