Skip to content

Welcome to the Pandas for Data Science repository! This course is designed to take you from beginner to proficient in using Pandas, the powerful data manipulation library in Python. Whether you're just starting your data science journey or looking to sharpen your skills, this repository contains all the resources

License

Notifications You must be signed in to change notification settings

rubydamodar/The-Ultimate-Pandas-Bootcamp

Repository files navigation

image

image


Module 1: Introduction to Pandas

  • Week 1: Understanding Data Science and Pandas

    • What is Data Science?
    • Introduction to Pandas: History and Evolution
    • The Role of Pandas in Data Analysis
  • Week 2: Setting Up the Environment

    • Installing Pandas via Anaconda and pip
    • Overview of Jupyter Notebooks
    • Importing Pandas and Common Libraries
      import pandas as pd

Module 2: Basics of Pandas Series

  • Week 3: Introduction to Pandas Series

    • What is a Pandas Series?
    • Creating a Series from a List
      s = pd.Series([1, 2, 3, 4])
    • Creating a Series from a Dictionary
      s = pd.Series({'a': 1, 'b': 2, 'c': 3})
  • Week 4: Accessing and Modifying Series

    • Accessing Elements by Index
      • Positional Indexing
      • Label Indexing
    • Modifying Series Data
      • Changing Values and Index
      s[0] = 10

Module 3: Series Attributes and Methods

  • Week 5: Understanding Series Attributes

    • Key Attributes: index, values, dtype, name
    • Using len() to Get Series Length
    • Data Types in Series
  • Week 6: Common Series Methods

    • Summary Statistics: .sum(), .mean(), .min(), .max()
    • Using .describe() for Quick Overview
    • String Methods: .str Accessor
      s = pd.Series(['apple', 'banana', 'cherry'])
      s.str.upper()

Module 4: Advanced Series Operations

  • Week 7: Handling Missing Data in Series

    • Identifying Missing Values: isnull(), notnull()
    • Filling Missing Values: fillna()
    • Dropping Missing Values: dropna()
  • Week 8: Vectorized Operations and Functions

    • Applying Functions: apply(), map(), applymap()
    • Common Mathematical Operations
    • Using Numpy Functions with Series

Module 5: Introduction to Pandas DataFrames

  • Week 9: What is a DataFrame?

    • Understanding the Structure of a DataFrame
    • Creating DataFrames from Various Data Structures (Lists, Dictionaries, Numpy Arrays)
      df = pd.DataFrame({
          'A': [1, 2, 3],
          'B': ['a', 'b', 'c']
      })
  • Week 10: DataFrame Attributes and Basic Operations

    • Key Attributes: shape, columns, index, dtypes
    • Basic Operations: .head(), .tail(), .info(), .describe()
    • Accessing and Modifying Columns
      df['C'] = [4, 5, 6]

Module 6: Indexing and Selecting Data in DataFrames

  • Week 11: Basic Indexing and Slicing

    • Selecting Columns and Rows
    • Label-Based Indexing with .loc[]
    • Position-Based Indexing with .iloc[]
  • Week 12: Advanced DataFrame Selection

    • Conditional Selection and Filtering
    • Using Boolean Indexing
    • Chaining Conditions

Module 7: Data Cleaning and Preparation

  • Week 13: Handling Missing Data in DataFrames

    • Identifying Missing Data
    • Imputing and Filling Missing Values
    • Dropping Rows and Columns with Missing Data
  • Week 14: Data Type Conversion and Transformation

    • Converting Data Types: astype()
    • DataFrame Transformation Methods
    • String Manipulations: Using .str Methods in DataFrames

Module 8: Grouping, Aggregating, and Merging DataFrames

  • Week 15: Grouping DataFrames

    • Introduction to groupby()
    • Aggregating Data: Mean, Sum, Count
    • Custom Aggregation Functions
  • Week 16: Merging and Joining DataFrames

    • Different Types of Joins: Inner, Outer, Left, Right
    • Using merge() and join() Methods
    • Concatenating DataFrames: concat()

Module 9: Time Series in DataFrames

  • Week 17: Time Series Basics

    • Creating DateTime Index
    • Resampling Time Series Data
    • Handling Time Zones
  • Week 18: Time Series Operations

    • Rolling and Expanding Windows
    • Time Series Visualization Techniques
    • Date Arithmetic

Module 10: Data Visualization with Pandas

  • Week 19: Introduction to Visualization

    • Basic Plotting with Pandas
    • Customizing Plots: Titles, Labels, Legends
    • Plot Types: Line, Bar, Histogram, Boxplot
  • Week 20: Advanced Visualization Techniques

    • Creating Subplots and Multiple Plots
    • Integrating with Matplotlib for Custom Visuals
    • Saving and Exporting Plots

Module 11: Best Practices and Optimization

  • Week 21: Performance Optimization Techniques

    • Profiling DataFrames: Memory Usage
    • Using Vectorization for Speed
    • Best Practices for Handling Large Datasets
  • Week 22: Writing Clean and Maintainable Code

    • Code Readability and Documentation
    • Error Handling Techniques
    • Developing Reusable Functions

Supplementary Resources

  • Recommended Books: Python for Data Analysis by Wes McKinney, Pandas Cookbook by Theodore Petrou
  • Online Courses: DataCamp, Coursera, edX
  • Practice Datasets: Kaggle, Open Data Sources

Final Assessment

  • Comprehensive Quiz on Series and DataFrame Concepts
  • Hands-on Project: Data Analysis using Series and DataFrames
  • Peer Review and Feedback Session

Special Thanks to the Contributors

About

Welcome to the Pandas for Data Science repository! This course is designed to take you from beginner to proficient in using Pandas, the powerful data manipulation library in Python. Whether you're just starting your data science journey or looking to sharpen your skills, this repository contains all the resources

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published