Algorithms for Massive Data final project - Università degli Studi di Milano
Authors: Mathias Cardarello Fierro & Lorenzo Polli
This project aims at investigating techniques generally used to conduct market-basket analysis over huge datasets in order to find frequent itemsets. In this specific case, the dataset is taken from the public repository Kaggle and contains more than 16 million of paragraphs about old newspapers. Although the newspapers were written in 67 different languages, for the scope of the research, the analysis is conducted over English newspapers only. In total, a subset of more than 1 million articles published between the years 2005 and 2012 were analyzed, using algorithms for massive datasets.