Skip to content

This project focuses on exploring and analyzing data from different schools in the same district. the project includes an ETL process and data manipulation and analysis using pandas and NumPy

Notifications You must be signed in to change notification settings

mlachha/School_District_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

School_District_Analysis

Resources

  • Python 3.9.0,
  • Anaconda Navigator 1.9.12,
  • Jupyter notebook 6.0.3,
  • Pandas, NumPy
  • Data Source: clean_students_complete.csv

Project Overview

initial analysis

in the first part of this analysis, we are going to explore data from different schools in the district, in order to see how they compare to each other based on different metrics; for that we are goign to produce

  • A high-level snapshot of the district's key metrics, presented in a table format
  • An overview of the key metrics for each school, presented in a table format
  • Tables presenting each of the following metrics:
    • Top 5 and bottom 5 performing schools, based on the overall passing rate
    • The average math score received by students in each grade level at each school
    • The average reading score received by students in each grade level at each school
    • School performance based on the budget per student
    • School performance based on the school size
    • School performance based on the type of school

results

Output for the initial anlysis:

  • Districts Details: picture

  • metrics per school: picture

detailed matrics tables :
  • top 5 school based on overall passing rate. picture

  • bottom 5 school based on overall passing rate. picture

  • Average math score per grade per school. picture

  • Average reading score per grade per school. picture

  • Performance based on the budget per student picture

  • Performance by the school size picture

  • Performance by school type picture

additional Analysis

After rumors of academic dishonesty at Thomas High School relating to the 9th grad math and reading scores, we were adviced to not take the data that relates to the incident into account, and to reproduce the same analysis with the new altered data, in order to see if that would affect the previously displayed results.

for that we will :

nullify all 9th graders math and reading scores:

student_data_df.loc[(student_data_df["grade"] == "9th") & (student_data_df["school_name"] == "Thomas High School"), ["math_score","reading_score"]] = np.nan
  • we can see that here : picture

  • Districts Details become : picture

  • metrics per school become: picture

we can see that Thomas High School's average scores went down by about the third.

In order to keep integrity and fairness on our part, we are going to replace the average scores for Thomas High School with the new averages that discount the 9th grad scores.

picture

detailed matrics tables :
  • new top 5 school based on overall passing rate. picture

we can see that Thomas High School is still amongst the top 5 schools even whithout taking the contested data into account.

  • bottom 5 school based on overall passing rate. picture

we can see no effect on the bottom shools.

  • Average scores per grade per school. picture

we see a Nan for 9th grad for Thomas High School

  • new Performance based on the budget per student picture

  • new Performance by the school size picture

  • new Performance by school type picture

Conclusion:

  • the changes made affected little to no change on the results, because they were limited to one grade in one high school.

  • Thomas High School finished second in both analysis.

  • the changes made little difference on Thomas High School's result itself, which opens up 2 questions:

  • since the other grades follow the same trends 9th grad scores do is there :

    1 - manipulation of the scores beyond the 9th graders?

    2 - no manipulation on the 9th graders scores?

About

This project focuses on exploring and analyzing data from different schools in the same district. the project includes an ETL process and data manipulation and analysis using pandas and NumPy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published