Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 1.06 KB

README.md

File metadata and controls

22 lines (16 loc) · 1.06 KB

1. Overview

 This repo was created to separate two speakers from a telephone recording.
 If your telephone recording has more than two speakers, I can't guarantee that my method will work.
 In addition to this one, to get good result, please try to make sure that different speakers have the same length of speech.


2. Implement

 1. Split a wave to audio clips by remove mute
 2. Count all clips' id-vector use pre-trained speaker recognition model
 3. Use K-means to cluster all clips' id-vector when K=2


3. Result

image


4. Appendix

 1. The pre-trained speaker recognition model from WeidiXie's repo VGG-Speaker-Recognition. Thanks for the open source!
 2. Because my method looks like a non-supervised method, so you can try supervised method even end2end. You can get more information about speaker diarization from Here