This project, extracts emotion from sound by looking at the meaning of sentence and sound features. Model can classify the emotions of fear, anger, disgust, joy, surprise, sadness and neutral.
As dataset, MELD (Multimodal EmotionLines Dataset) is used. We especially choose this dataset, because we need different emotions with meaningful sentences for sentiment analysis to work properly.
WhisperAI is used to transcirbe text from sound for sentiment analysis.
NLTK model is used to do sentiment analysis from transcribed text, which gives possible results of positive, negative and neutral.
LSTM model is used to do emotion analysis. Model uses sentiment and sound file's features to determine which emotion the sound file belongs to.
Firstly, dataset must be downloaded for the project to run correctly. Since dataset is too big to upload to GitHub, it is uploaded to Google Drive.
Here's the link: https://drive.google.com/file/d/1547d1dz2_kgBUKx-AHKC110-18PeNbVl/view?usp=drive_link
After downloading the zip file in the drive link, dataset folder must be extracted to project folder.
Secondly, WhisperAI does not support Python versions that are newer than 3.9.9. So, we recommend users to use version 3.9.9. Otherwise, project can't transcribe the text from sound.
Project has 3 different Python files.
"delete_short_files.py" is a Python file that deletes sound files which uses only one word to form a sentence. This file clears dataset from sound file's with less meaning.
"model_optimization" uses TensorBoard to find the optimal hyperparameters for LSTM model. Results are saved as log files, so they can be examined if wanted.
"main_file.py" used to run the main project. It has a simple application UI. User can upload files and record its sound for the emotion extraction process. After the extraction process, resulting graphs will be seen on the application screen. Resulting graphs is saved in the Graph folder of the project. Also, user can see the performance of the model by running training function.
Accuracy results of all of the models in optimization process:
From all of the models, two of the most accurate models are choosen. Accuracy of these two models:
Loss of these two models:
Accuracy results of the trained model: Loss of the trained model: Confusion matrix of the test set: Scores of the model:
User uploads more than one files: Real values of these files are: As we can see from the comparison of these results, anger and sadness, surprise and joy can be mixed up. We can come up with this result by looking at this comparison and confusion matrix
If there is only one sound file to extract emotion, graph changes. This graph shows the every emotion's percentage of the sound file. Here are some of the results:
Here are some images showing the GUI:
Accuracy of the other projects that are done with the same dataset: Accuracy of our project: As we can see from the comparison above, other projects' accuracy results are below %70. This project's accuracy changes between %78-%80. By adding the sentiment analysis to the emotion extraction we build a better model.