Skip to content

lotfiferaga/EDA-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Exploratory Data Analysis on Car Features and MSRP using Python, NumPy, Seaborn, and Matplotlib

Introduction:

The project aims to conduct an Exploratory Data Analysis (EDA) on a dataset containing information about car features and Manufacturer's Suggested Retail Price (MSRP). The analysis will be performed using Python, leveraging essential libraries such as NumPy, Seaborn, and Matplotlib.

Dataset Overview:

The dataset comprises various attributes related to cars, including features like make, model, year, horsepower, fuel efficiency, and MSRP. The primary focus is on understanding the relationships and patterns within the data.

Objectives:

  • Identify and handle missing data.
  • Explore the distribution of car features.
  • Examine the relationship between different features and MSRP.
  • Visualize patterns and trends in the dataset.

Tools and Libraries:

  • Python: The primary programming language for data analysis.
  • NumPy: Used for numerical operations and handling arrays.
  • Seaborn: Employed for creating visually appealing statistical graphics.
  • Matplotlib: Utilized for creating static, animated, and interactive visualizations.

Methodology:

  • Data Loading: Import the dataset into a Pandas DataFrame.
  • Data Cleaning: Address missing values and handle any inconsistencies in the data.
  • Descriptive Statistics: Obtain summary statistics to understand the central tendency and variability of the features.
  • Visualization:
    • Use Matplotlib and Seaborn to create histograms, box plots, and scatter plots to visualize the distribution of features.
    • Explore pairwise relationships between features and the target variable (MSRP).
  • Correlation Analysis: Utilize correlation matrices and heatmaps to identify relationships between numerical variables.
  • Outlier Detection: Identify and handle outliers that may affect the overall analysis.

Key Visualizations:

  • Histograms: Display the distribution of numerical features.
  • Box Plots: Visualize the spread of features and identify potential outliers.
  • Scatter Plots: Explore the relationship between individual features and MSRP.
  • Heatmaps: Illustrate correlations between features.

Releases

No releases published

Packages

No packages published