Slides and exercices for my course "Natural Language Processing" (École Pour l'Informatique et les Techniques Avancées, 2024). Semester-long course to 70 final-year engineering students (30h).
An introduction to the fundamentals of natural language processing. We explore various algorithms for text classification and generation. The goal is to start with simple language models like n-grams and progress towards understanding modern architectures like transformers. During exercises, we implement different algorithms from scratch.
0 - Project description
1 - Tokenization: regular expressions and the Byte-Pair Encoding algorithm
2 - N-grams
3 - Text classification with naive bayes
4 - Text classification with logistic regression
5 - Vector semantics: tf-idf and word2vec
6 - Feedforward neural networks
7 - Recurrent neural networks and attention mechanisms
8 - Transformer
The final project involves applying the various algorithms covered in the course to multiple datasets and identifying limitations and possible improvements.
Students' projects have been very diverse: generating song lyrics and poems, automatically moderating Twitch conversations, detecting spoilers in movie comments, detecting generated texts, etc.