Data Profiler is a powerful and user-friendly web application built with Streamlit that allows you to analyze and visualize your datasets with ease. Simply upload your data in .csv
or .xlsx
format, and generate comprehensive profiling reports that help you detect anomalies, patterns, and trends within your data.
- Automated Data Analysis: Quickly generate detailed profiling reports by uploading your dataset.
- Customizable Reports: Choose between different display modes, including
Primary
,Dark
, andOrange
. - Support for Multiple Formats: Upload
.csv
or.xlsx
files (up to 10 MB) for analysis. - Interactive UI: Easy-to-use interface with options to select specific sheets for
.xlsx
files. - Downloadable Reports: Save the profiling report as an HTML file for offline analysis.
To run the Data Profiler application on your local machine, follow the steps below:
git clone https://github.com/srinibas-masanta/data-profiler.git
cd data-profiler
Create and activate a virtual environment to manage dependencies.
python -m venv dataprofile
.\dataprofile\Scripts\activate # On Windows
source dataprofile/bin/activate # On macOS/Linux
Install the required Python packages listed in the requirements.txt
file.
pip install -r requirements.txt
Alternatively, manually install the necessary packages:
pip install numpy pandas scipy matplotlib streamlit ydata-profiling streamlit-pandas-profiling openpyxl xlrd
Start the Streamlit application by running the following command:
streamlit run app.py
Once the application is running, follow these steps:
- Upload Your Data: Use the sidebar to upload a
.csv
or.xlsx
file (up to 10 MB). - Select Options: Choose the report mode (
Primary
,Dark
,Orange
), and decide if you want a minimal report or a full report. - Generate Report: Click to generate the report, which will be displayed within the app.
- Download Report (Optional): If desired, save the report as an HTML file using the download button.
- app.py: Main script containing the Streamlit application logic.
- media/DP Logo.jpg: Logo used in the welcome page of the application.
- requirements.txt: List of all the Python dependencies required to run the application.
This project is licensed under the MIT License - see the LICENSE.txt file for details.