AccentComparison

Accent Comparison From Pitch

Data Statistics

Total Statistics

Number of recordings 148
Total duration of data 46:36:57.26
Size of data (numpy files only) 78M
Size of recordings 14G

west

Number of recordings 51
Total duration of data 09:13:55.38
Size of data (numpy files only) 16M
Size of recordings 2,7G

skane

Number of recordings 26
Total duration of data 06:34:45.97
Size of data (numpy files only) 12M
Size of recordings 1,8G

norwegian

Number of recordings 25
Total duration of data 11:39:47.27
Size of data (numpy files only) 20M
Size of recordings 3,5G

danish

Number of recordings 46
Total duration of data 19:08:28.64
Size of data (numpy files only) 32M
Size of recordings 5,7G

The Script for Pitch Extraction

The list of sound files from where to perform the extraction should be in a file called "list.txt". Then call the script. As output you will get 2 files, one with extension .f0 containing the information about the pitch, and another one having extension .frm that contains the informations about formants.

The information abouth the pitch can be in two formats, see here

Preprocessing

Pitch values will be:

log scale
centered

Probablility of voicing:

untouched

Power (already in log scale):

untouched

If using only pitch won't be enough to make any conclusive arguments about our hypothesis that the neighboring dialects are more similar than the dialects spoken in more distant areas from each other, despite the corresponding dialects' belonging to different official languages, we will encorporate formants data as part of our features. This decision was based on the fact that after pitch varition the next most prominent feature characterizing a dialect is the differences in the pronunciation of the vowels, which would be captured by their respective formants.

Classification

The classification can be performed using a recurrent neural network as it can deal with time series well.

One particular flexible network is the LSTM one, whose inner working is explained here : LSTM explaination. The advantage is to be able to model the temporal structure of the data on the long-term.

We also introduced a CNN, as explored by this paper, to enhance the classification. Indeed, while the general temporal structure of the date is modelled well by the LSTM, CNNs offer an efficient way of extracting short-term features from the data. That is why it makes sense to combine both approaches.

To implement such a network in python with Keras library we can refer to this page here and here.

Results

30 epochs, 200sec, 10 loops, globaly normalized

skane danish 59.1% +/- 3.6%
west norwegian 57.0% +/- 5.1%
west skane 63.5% +/- 4.6%
danish norwegian 67.2% +/- 6.2%

30 epochs, 200sec, 10 loops, sequencialy normalized

skane danish 58.2% +/- 3.8%
west norwegian 69.35% +/- 2.5%
west skane 65.1% +/- 1.1%
danish norwegian 56.3% +/- 3.8%

30 epochs, 50sec, 10 loops, sequencialy normalized

skane danish 56.1% +/- 2.6%
west norwegian 56.6% +/- 4.2%
west skane 49.3% +/- 6.1%
danish norwegian 49.4% +/- 6.0 %

5 epochs, 200sec, 10 loops, sequencialy normalized

skane danish 58.0 +/- 4.1%
west norwegian 69.6% +/- 3.4%
west skane 62.87% +/- 5.89%
danish norwegian 57.7% +/- 3.7%

Experiments

NO CONV

LSTM(64) + DENSE(2) -> skane danish 57.3 +/- 2.8%
LSTM(32) + DENSE(2) -> skane danish 49.4 +/- 6.0%

LSTM(64) + DENSE(2) -> skane danish 55.7 +/- 7.5%
LSTM(32) + DENSE(2) -> skane danish 53.1 +/- 10.3%

NO LSTM

CONV + DENSE(128) + DENSE(2) -> skane west 71.3 +/- 3.1%
0.5*CONV + DENSE(128) + DENSE(2) -> skane west 71.4 +/- 2.6%
0.25*CONV + DENSE(128) + DENSE(2) -> skane west 69.9 +/- 3.7%
0.25*CONV + DENSE(256) + DENSE(64) + DENSE(2) -> skane west 72.1 +/- 4.1%
CONV + DENSE(64) + DENSE(2) -> skane west 69.8 +/- 3.6%
CONV + DENSE(256) + DENSE(64) + DENSE(2) -> skane west 67.0 +/- 6.2%

CONV + DENSE(64) + DENSE(2) -> skane danish 60.7 +/- 3.8%

BOTH

LSTM(64) + CONV + DENSE(64) + DENSE(2) -> skane west 69.7 +/- 5.7%
LSTM(64) + CONV + DENSE(32) + DENSE(2) -> skane west 70.9 +/- 4.5%
LSTM(64) + DENSE(32) + DENSE(2) -> skane west 56.4 +/- 8.3%
LSTM(64) + CONV + DENSE(64) + DENSE(32) + DENSE(2) -> skane west 67.9 +/- 4.3%

LSTM(8) + 0.5*CONV + DENSE(256) + DENSE(64) + DENSE(2) -> skane west 72.2 +/- 1.8%
LSTM(8) + 0.5*CONV + DENSE(256) + DENSE(64) + DENSE(2) -> skane danish 61.1 +/- 2.7%
LSTM(8) + 0.5*CONV + DENSE(256) + DENSE(64) + DENSE(2) -> west norwegian 69.4 +/- 5.5%
LSTM(8) + 0.5*CONV + DENSE(256) + DENSE(64) + DENSE(2) -> danish norwegian 59.4 +/- 1.5%

CONVOLUTION IN SERIES WITH LSTM

0.5*CONV + LSTM(16) + DENSE(256) + DENSE(64) + DENSE(2) -> skane west 67.0 +/- 3.8%

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
__pycache__		__pycache__
danish		danish
norwegian		norwegian
scripts		scripts
skane		skane
toy_data		toy_data
west		west
.gitignore		.gitignore
README.md		README.md
TR1.f0		TR1.f0
TR1.frm		TR1.frm
TR1.pwr		TR1.pwr
best_model		best_model
bitrate.sh		bitrate.sh
extraxt_feature.sh		extraxt_feature.sh
f0_list.txt		f0_list.txt
gather.py		gather.py
get_data.py		get_data.py
get_mean.py		get_mean.py
l.txt		l.txt
list.txt		list.txt
loop.py		loop.py
lstm.py		lstm.py
lstm_fcn.py		lstm_fcn.py
name_list.txt		name_list.txt
postprocess.py		postprocess.py
sed		sed
statistics.sh		statistics.sh
tensorboard_callback_wrapper.py		tensorboard_callback_wrapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AccentComparison

Data Statistics

Total Statistics

west

skane

norwegian

danish

The Script for Pitch Extraction

Preprocessing

Classification

Results

30 epochs, 200sec, 10 loops, globaly normalized

30 epochs, 200sec, 10 loops, sequencialy normalized

30 epochs, 50sec, 10 loops, sequencialy normalized

5 epochs, 200sec, 10 loops, sequencialy normalized

Experiments

NO CONV

NO LSTM

BOTH

CONVOLUTION IN SERIES WITH LSTM

About

Releases

Packages

Contributors 4

Languages

AloveIs/AccentComparison

Folders and files

Latest commit

History

Repository files navigation

AccentComparison

Data Statistics

Total Statistics

west

skane

norwegian

danish

The Script for Pitch Extraction

Preprocessing

Classification

Results

30 epochs, 200sec, 10 loops, globaly normalized

30 epochs, 200sec, 10 loops, sequencialy normalized

30 epochs, 50sec, 10 loops, sequencialy normalized

5 epochs, 200sec, 10 loops, sequencialy normalized

Experiments

NO CONV

NO LSTM

BOTH

CONVOLUTION IN SERIES WITH LSTM

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages