Skip to content

EDA on King County house sales 2014 to 2015. EDA project for the neuefische Data Science, Machine Learning & AI Bootcamp 2024 in Hamburg.

License

Notifications You must be signed in to change notification settings

jottemka/eda_neuefische

Repository files navigation

EDA Project: King County House Sales 2014-2015

This repo contains the results of the EDA project in the neuefische Data Science, Machine Learning & AI Bootcamp. It consists of 2 notebooks:

  1. The EDA notebook itself containing a classical EDA and a client-focused EDA:

  2. A presentation notebook that was used to generate the corresponding Jupyter slides for the stakeholder meeting:

Data Insights

There are 3 interesting data insights that might be contrary to common views:

  1. More rooms does mean higher price, but the relationship is not as strong as one might expect.

  2. Older houses are not generally cheaper. The correlation is almost zero.

  3. Surprisingly, just like agricultural products, house prices exhibit seasonality effects.

Client Recommendations

When to buy?

We recommend buying in February and to avoid buying in April.

alt text

We also recommend buying in the middle of the month and to avoid buying in the beginning.

alt text

Where to buy?

Based on our client's needs, we recommend low-fluctuation neighborhoods. The plot below shows all zipcode areas, ranked according to their fluctuation. Our client should pick from the neighborhoods on the left-hand side.

alt text

What to buy?

Instead of specific buying recommendations, we decided to propose the following methodology to our (fictional) client:

  1. Start with most affordable house with at least 3 bedrooms and 2 bathrooms

  2. Ask yourself: would you be willing to pay for a neighborhood lower fluctuation?

The first five result of this procedure are shown in the table below. The least expensive option resulting from this procedure is a house with ID 15796 in Rainier Beach with 5 bedrooms for 133,000 USD. Notice that improving on the neighborhood can mean compromising on other aspects.

house_id price bedrooms bathrooms sqft_living
7129304540 133000.000000 5.000000 2.000000 1430.000000
1823049182 147400.000000 3.000000 2.000000 1080.000000
2976800749 150000.000000 4.000000 2.000000 1460.000000
3356403304 154000.000000 3.000000 3.000000 1530.000000
7129300595 158000.000000 3.000000 2.000000 1090.000000

Environment Setup

This repo contains a requirements.txt file with a list of all the packages and dependencies you will need.

Before you can start with plotly in Jupyter Lab you have to install node.js (if you haven't done it before). Check Node version by run the following commands:

node -v

If you haven't installed it yet, begin at step_1. Otherwise, proceed to step_2.

macOS type the following commands :

Step_1: Update Homebrew and install Node by following commands:

brew update
brew install node

Step_2: Install the virtual environment and the required packages by following commands:

pyenv local 3.11.3
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

WindowsOS type the following commands :

Step_1: Update Chocolatey and install Node by following commands:

choco upgrade chocolatey
choco install nodejs
  • Step_2: Install the virtual environment and the required packages by following commands.

For PowerShell CLI :

pyenv local 3.11.3
python -m venv .venv
.venv\Scripts\Activate.ps1
pip install --upgrade pip
pip install -r requirements.txt

For Git-Bash CLI :

pyenv local 3.11.3
python -m venv .venv
source .venv/Scripts/activate
pip install --upgrade pip
pip install -r requirements.txt

Note: If you encounter an error when trying to run pip install --upgrade pip, try using the following command:

python.exe -m pip install --upgrade pip

About

EDA on King County house sales 2014 to 2015. EDA project for the neuefische Data Science, Machine Learning & AI Bootcamp 2024 in Hamburg.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published