Great Speech Analysis

This project aims to determine a decision rule to identify whether a speech is likely to be considered great. Metrics such as emotions, polarity, lexical richness, named entities proportion, complexity, imagery, stop words, and mean sentence length were analyzed, as used by 137 leading scholars to classify the most significant American political speeches of the 20th century.

Seventy-seven great speeches were compared with 77 typical ones, randomly selected from American Rhetoric (https://www.americanrhetoric.com/). Four classifier algorithms were trained to distinguish between important and typical speeches, with a random forest classifier chosen for its high ROC and accuracy.

Dataset

The dataset includes measures related to both typical and important speeches. These speeches were scraped from the American Rhetoric website and should be placed in the dataset folder to run the code. Creating this dataset may take around three hours, depending on processing capacity. The imagery words used in this analysis are sourced from the MRC Psycholinguistic Database (https://websites.psychology.uwa.edu.au/school/mrcdatabase/uwa_mrc.htm) and should be placed in the resources folder in this repository.

Project Structure

The project includes scripts such as BasicDataset, DataLoader, Speech, SpeechDataset, preprocessors, polarity_graph, sentiments_per_position, radar, and the Speech Analysis notebook. The analysis is conducted using these scripts. DataLoader and SpeechDataset work together to process PDF speeches, with Speech-containing methods for feature computation. The preprocessors include basic and advanced techniques like stop word removal, lemmatization, and punctuation handling.

Model Interpretation

Below is a SHAP value plot from this project's final random forest model. SHAP values help explain the impact of each feature on the model's predictions, allowing us to understand which factors contribute most to determining whether a speech is classified as "great" or "typical."

From the SHAP plot, we can see that features like sadness, fear, and the proportion of entities in the speech have a significant influence on the model's output.

Setup Instructions

Clone the Repository:

git clone https://github.com/asaines/Great-Speech-Analysis.git
cd Great-Speech-Analysis

Install Dependencies:
```
pip install -r requirements.txt
```
Run the Analysis: Follow the instructions in the Speech Analysis notebook to perform the speech analysis.

License

This project is licensed under the MIT License.

Contact

For collaboration or inquiries, please reach out to me at https://www.linkedin.com/in/asaines/.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
notebooks		notebooks
resources		resources
results		results
src		src
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Great Speech Analysis

Dataset

Project Structure

Model Interpretation

Setup Instructions

License

Contact

About

Releases

Packages

Languages

asaines/Great-Speech-Analysis

Folders and files

Latest commit

History

Repository files navigation

Great Speech Analysis

Dataset

Project Structure

Model Interpretation

Setup Instructions

License

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages