This repository contains the code and documentation for my bachelor's thesis titled "Tracking Greenwashing: A Data-driven Analysis of Sustainability Reports from DAX 40 Companies using NLP" The thesis aims to explore and identify potential greenwashing tendencies in sustainability reports using Natural Language Processing (NLP) techniques.
The thesis focuses on analyzing sustainability reports from DAX 40 companies to uncover discrepancies between the stated sustainability practices and actual corporate behavior. Employing NLP, the project aims to provide a comprehensive understanding of the language used in these reports and identify potential instances of greenwashing.
The methodology involves data extraction, pre-processing, sentiment analysis, SDG alignment assessment using embeddings, and incorporating ESG scores. The final Greenwashing Tendency Score is calculated as:
Greenwashing Tendency Formula:
with:
Where:
- SV : Sentiment Value
- SDGA : SDG Alignment Value
- ESGS : ESG Value
- SVnorm : Normalized Sentiment Value
- SDGAnorm : Normalized SDG-Alignment Value
- ESGSnorm : Normalized ESG Value
- w1 and w2 : Weight factors for the respective components
- min() and max() : Represent the minimum and maximum values of the respective metrics in the dataset
The individual components of the equation (Sentiment value, SDG-Alignment & ESG-Score) could have different effects on the greenwashing tendency. Therefore, it is useful to apply weight factors that reflect the relative importance of each component.
Normalizing values is crucial for ensuring a consistent scale and comparability of results. Employing min-max normalization, each value is standardized within the range of 0 to 1, enhancing the coherence of my analysis.
The Formula shows how much the company is 'exaggerating.' If the ESG score is very low (close to 0), but the Sentiment Value and SDG Alignment are high, this formula would indicate a particularly high level of greenwashing. It emphasizes the discrepancy between the proclaimed and actual performance of a company. Nevertheless, it is important to consider the sensitivity of this formula to extreme values and possibly take appropriate measures for smoothing or limitation.
data/
: Contains the raw and processed data.code/
: Holds the Python scripts for data processing, analysis, and visualization.results/
: Stores the results and visualizations generated during the analysis.
- Clone the repository.
- Install the required dependencies using
pip install -r requirements.txt
. - Follow the instructions in the
code/README.md
for step-by-step execution.
Detailed instructions for running the analysis and reproducing the results can be found in the code/README.md
file.
The findings and visualizations obtained during the analysis will be presented in the final thesis document.
This research aims to contribute valuable insights into the effectiveness of sustainability reporting and the prevalence of greenwashing practices among DAX 40 companies.
A comprehensive list of references and cited works will be provided in the final thesis document.