Skip to content

Text analysis of the speeches, conferences and interviews of the current president of Mexico

Notifications You must be signed in to change notification settings

Wittline/text-analysis-speeches-amlo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Text Analysis of Andres Manuel Lopez Obrador’s Speeches

This repository proposes to analyze the text of the speeches, conferences and interviews of the current president of Mexico, and has an educational aim, there are no purposes of political interest in this document, you are free to interpret the data in your own way. I personally think that formalize this type of practices helps us to follow up on the political promises of the presidents of Latin America and could help us to make decisions in advance for our countries, however, what I intend to do is to show you a basic flow of text analysis using Python, visualize aggregated data and get insight at every step.

image

Data source

The official AMLO website has a stenographic version of each speech, It is the punctual and faithful transcription about what was expressed verbally, we will take this URL for the experiments.

https://lopezobrador.org.mx/

Web scraping with Python

Today AMLO’s site is grouping the speeches into 667sections, and this number grows weekly, each section has a group of speeches by date, my scrapping technique is targeting each element of a group, which represents the speech, check the image below:

image

image

Let’s start looking for insight

The amlo_analysis.ipynb notebook contains all the analysis code, here we will explain each step taken and interpretation of each visualization.

Reading the .csv file generated in the previous step

Here we can see the speeches described for each box of each url with its dates, and this is what I want.

df = pd.read_csv('C:/Users/ramse/Downloads/amlo_speechs.csv')
columns = ['id_speech', 'date','title','url','content']
df = df[columns]
df['content'] = df['title'] + ' ' +  df['content']
df = df[df['content'].notna()]
df

image

Text Preprocessing

  • Convert to lowercase
  • Remove extra spaces, carriage change and tabs
  • Remove punctuation marks and question marks
  • Remove stopwords
  • Apply Stemming, Lematización and NER

image

Let’s delve into the count of words used by AMLO before and during his presidential term

image

The code above is used to generate the below visualization, there are three colors separating the chart vertically, which are the three periods of AMLO, He began its campaign in 2011, lost against Peña Nieto in December 2012, and he remained active in the mandate of President Peña Nieto, in 2018 he won the Mexican presidential elections and his number of words used per speech increased, as well as the vocabulary used.

image

Let’s group by Year and Month and visualize the word clouds for each presidential term

This could help us understand what were his plans and priorities over each period before and while his presidential term

image

Before Peña Nieto

image

Peña Nieto’s presidential term

image

AMLO’s presidential term until September 2021

image

Although there are NLP techniques to detect topics, here we will not use these practices because this study involves political terms that are very common in the political environment of each country, in this case I will choose the topics and terms related to the topics manually

  • Economy: gasto, remesas, ahorro, economía, finanzas, inflación, salarios, inversión, impuesto, austeridad
  • Political institutions: pan, prd, pri, morena, pt
  • General institutions: onu, inm, ine, oms, imss, cfe, cndh, inegi, sep, conacyt, insabi, sat, profeco
  • Education: maestros,maestras, estudiante, educación, escuela, universidad, beca, reformaeducativa, libros, escuela, primaria, profesores
  • Energetic: energética, aceite, gas, pemex, petróleo, gasolina, combustible
  • Migration: migración, extranjeros, frontera, centroamericanos, deportaciones, centroamérica
  • Alimentation: hambre, alimentación, canastabásica, comida, alimentos, desnutrición, nutrición, comida, pobreza
  • Southern federative entities: campeche, chiapas, quintanaroo, tabasco, veracruz, yucatán
  • Central federative entities: cdmx, edomex, guerrero, hidalgo, oaxaca, puebla, querétaro, tlaxcala
  • North federal entities: aguascalientes, bajacalifornia, bajacaliforniasur, chihuahua, coahuila, colima, durango, guanajuato, guerrero, jalisco, michoacán, morelos, nayarit, nuevoleón, sanluispotosí, sinaloa, sonora, tamaulipas, zacatecas
  • Health: bienestar, salud, médico, camas, vacuna, enfermedad, contagio, infección, pandemia, epidemia, covid, hospital, dosis, doctores, enfermeras, paramédicos, enfermeros, hospitalización
  • Corruption: ayotzinapa, justicia, corrupción, extorsión, soborno, desaparición, secuestro, narcotráfico, homicidio, anticorrupción
  • Political rivals: zedillo, salinas, peñanieto, anaya, meade, fox, calderon, margarita, videgaray, lozoya, chong, yunes
  • Political allies: gordillo, buylla, tclouthier, mclouthier, clouthier, aliados, aliado, ebrard, sheinbaum, monreal, gatell, alcocer, mueller
  • International people: trump, obama, biden
  • Other problems: 4t, indígenas, avión, tren, huachicol, méxico, pueblo
  • Countries: venezuela, chile, colombia, bolivia, argentina, cuba, honduras, nicaragua, canada, panama, españa, francia, alemania, china, rusia

Now we will see the frequency of each word by year into each Topic

image

image

image

image

image

image

image

Funny right? let’s dig into other types of insights, observe the evolution of the frequency over time is very important as well

image

image

Apparently the use of the word “economía” in his speeches before his presidential term had relevance, but the use of this word is stronger in his current presidential term, also take a look at the word “impuesto”

image

The most mentioned political party before his presidential term was “Morena”, at the same time he mentioned political parties rivals like “PRD” and “PRI”

image

CFE , IMSS and CNDH the most mentioned words related to general institutions

image

“educación” has always been a word used since the beginning of his campaign, and now in his presidential term “escuela” is the most relevant word related to education.

image

During the first years of Peña Nieto’s mandate, AMLO used the words “petróleo” and “energética” for a large period of time.

image

The word “extranjeros” was common from the beginning of his campaign until today, however, lately “migración” and “frontera” appears stronger

image

Veracruz and Tabasco

image

“CDMX” is the most mentioned area in the center of the country, but there is a spiky rebound in the word “oaxaca” the last months of 2019

image

image

The words “pobreza” and “alimentación” were very common since the beginning of his presidential campaign, and now both stronger

image

From the last months of 2020 until today, He’s been using the word “vacuna”, It was his priority.

image

“corrupción” is one of his favorite words

image

From the beginning of his presidential campaign, “Peña nieto” was the target.

image

Since the beginning of his presidential term “ebrard” and “sheinbaum” have been his political allies that he has mentioned the most.

image

“trump” very relevant and now “biden”, the dates make sense.

image

His other favorite word is “pueblo” but it is also wor