Skip to content

Latest commit

 

History

History
176 lines (127 loc) · 6.08 KB

README.md

File metadata and controls

176 lines (127 loc) · 6.08 KB

EDA Project: King County House Sales 2014-2015

This repo contains the results of the EDA project in the neuefische Data Science, Machine Learning & AI Bootcamp. It consists of 2 notebooks:

  1. The EDA notebook itself containing a classical EDA and a client-focused EDA:

  2. A presentation notebook that was used to generate the corresponding Jupyter slides for the stakeholder meeting:

Data Insights

There are 3 interesting data insights that might be contrary to common views:

  1. More rooms does mean higher price, but the relationship is not as strong as one might expect.

  2. Older houses are not generally cheaper. The correlation is almost zero.

  3. Surprisingly, just like agricultural products, house prices exhibit seasonality effects.

Client Recommendations

When to buy?

We recommend buying in February and to avoid buying in April.

alt text

We also recommend buying in the middle of the month and to avoid buying in the beginning.

alt text

Where to buy?

Based on our client's needs, we recommend low-fluctuation neighborhoods. The plot below shows all zipcode areas, ranked according to their fluctuation. Our client should pick from the neighborhoods on the left-hand side.

alt text

What to buy?

Instead of specific buying recommendations, we decided to propose the following methodology to our (fictional) client:

  1. Start with most affordable house with at least 3 bedrooms and 2 bathrooms

  2. Ask yourself: would you be willing to pay for a neighborhood lower fluctuation?

The first five result of this procedure are shown in the table below. The least expensive option resulting from this procedure is a house with ID 15796 in Rainier Beach with 5 bedrooms for 133,000 USD. Notice that improving on the neighborhood can mean compromising on other aspects.

house_id price bedrooms bathrooms sqft_living
7129304540 133000.000000 5.000000 2.000000 1430.000000
1823049182 147400.000000 3.000000 2.000000 1080.000000
2976800749 150000.000000 4.000000 2.000000 1460.000000
3356403304 154000.000000 3.000000 3.000000 1530.000000
7129300595 158000.000000 3.000000 2.000000 1090.000000

Environment Setup

This repo contains a requirements.txt file with a list of all the packages and dependencies you will need.

Before you can start with plotly in Jupyter Lab you have to install node.js (if you haven't done it before). Check Node version by run the following commands:

node -v

If you haven't installed it yet, begin at step_1. Otherwise, proceed to step_2.

macOS type the following commands :

Step_1: Update Homebrew and install Node by following commands:

brew update
brew install node

Step_2: Install the virtual environment and the required packages by following commands:

pyenv local 3.11.3
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

WindowsOS type the following commands :

Step_1: Update Chocolatey and install Node by following commands:

choco upgrade chocolatey
choco install nodejs
  • Step_2: Install the virtual environment and the required packages by following commands.

For PowerShell CLI :

pyenv local 3.11.3
python -m venv .venv
.venv\Scripts\Activate.ps1
pip install --upgrade pip
pip install -r requirements.txt

For Git-Bash CLI :

pyenv local 3.11.3
python -m venv .venv
source .venv/Scripts/activate
pip install --upgrade pip
pip install -r requirements.txt

Note: If you encounter an error when trying to run pip install --upgrade pip, try using the following command:

python.exe -m pip install --upgrade pip