This project contains a Python script that uses Pandas to compute descriptive statistics for a dataset and generate data visualizations. The script reads data from a CSV or Excel file and provides key summary statistics such as the mean, median, and standard deviation.
- Reads a dataset from a CSV or Excel file.
- Generates summary statistics:
- Mean
- Median
- Standard Deviation
- Produces at least one data visualization (e.g., histogram, boxplot, etc.).
- Continuous Integration/Continuous Deployment (CI/CD) pipeline integrated with a build status badge.
You can view the detailed summary statistics report generated in PDF format here:
- Python 3.x
- Pandas
- Matplotlib or Seaborn (for visualization)
- pytest (for testing)
-
Clone the repository:
git clone https://github.com/nogibjj/Ramil-Pandas-Descriptive-Statistics-Script cd Ramil-Pandas-Descriptive-Statistics-Script
-
Install required dependencies: You can install all required packages by running:
make setup
-
Run the script: To run the descriptive statistics script, use the following command:
python main.py
- Place your dataset in the project directory.
- Modify the script (
main.py
) to specify the path to your CSV or Excel file.
The script will automatically compute and display the descriptive statistics, and generate the data visualization.
The project includes tests to ensure the correctness of the script. To run the tests, use:
make test
To ensure code quality, Pylint is used. You can check for linting issues by running:
make lint