This program aim is to show how it is possible to analyze the quality of a provided datasets and make some preliminary analysis and comparisons on chunks of data, in order to prepare the data for data mining tasks.
We were provided with an almost perfect dataset (no missing values, no duplicated rows etc.) so we had to dirty it a little bit before feeding it to the web application for the analysis.
See:
To see how we dirtyied our datasets and how we created the quality attributes, run this notebook:
jupyter notebook Orginal-Data/File\ Conversion.ipynb
To start the webapp that contains the Data Quality Analyzer, simply run:
python webapp.py
Note: This command will also start a flask server on port 5000. To access it, open on your browser the page:
http://localhost:5000/query
Students Giacomo Astolfi, Leonardo Febbo. Project for the course on 'Data and Information Quality' held at Politecnico di Milano by Prof. Cinzia Cappiello, A.Y. 2018/2019