Anthony Straine
Junior Data Scientist
Christopher Ortiz
Junior Data Scientist
Predict the market value of single unit properties using properties that were sold in May and June of 2017.
Working together we discovered for our MVP that the features that appear to drive home value as measure by taxvaluedollarcnt are bathroomcnt, bedroomcnt, calculatedfinishedsquarefeet. We discovered this by going through an iterative, manual process of feature selection using a Pearson's R correlation test to select the top two features of bathroomcnt and calculatedfinishedsquarefeet and using industry knowledge to also include calculatedfinishedsquarefeet and homes having more than 2 bathrooms.
After testing a few models, a polynomial model performed the best. Features used in our model:
- bathroomcnt
- bedroomcnt
- calculatedfinishedsquarefeet
- taxvaluedollarcnt
These features explain 38% of the variance in the tax value dollar amount. For our next iteration we will look at additional features while controlling for outliers.
Feature | Definition | Data Type |
---|---|---|
id | row index number, range: 0 - 2985216 | int64 |
parcelid | Unique numeric id assigned to each property: 10711725 - 169601949 | int64 |
bathroomcnt | Number of bathrooms a property has: 0 - 32 | float64 |
bedroomcnt | Number of bedrooms a property has: 0 - 25 | float64 |
calculatedfinishedsquarefeet | Number of square feet of the property: 1 - 952576 | float64 |
fips | (FIPS) Five digit number of which the first two are the FIPS code of the state to which the county belongs. Leading 0 is removed from the data: 6037=Los Angeles County, 6059=Orange County, 6111=Ventura County | float64 |
lotsizesquarefeet | The land the property occupies in squared feet : 100 - 371000512 | float64 |
propertylandusetypeid | Unique numeric id that identifies what the land is used for: the 261=Single Family Residential, 262=Rural Residence, 273=Bungalow | float64 |
roomcnt | Total number of rooms in the principal residence | float64 |
yearbuilt | Year the property was built | float64 |
transactiondate | The most recent date the property was sold: yyyy-mm-dd | object |
Target | Definition | Data Type |
---|---|---|
taxamount | The total property tax assessed for that assessment year | float64 |
taxvaluedollarcnt | The total tax assessed value of the parcel | float64 |
├── README.md <- The top-level README for developers using this project.
│
│
├── mvp.ipynb <- The main notebook for the project
│
│
├── acquire.py <- The script to download or generate data
│
├── prepare.py <- The script for preparing the raw data
│
├── wrangle.py <- The script for preparing the raw data for exploration
│
├── model.py <- The script for preprocessing, modeling, and interpreting
- numpy >= 1.1.2
- pandas >= 1.18.1
- scipy >=1.4.1
- sklearn >= 0.23.2
- matplotlib >= 3.3.1
- seaborn >= 0.11.0
-
Download a zip file of the repository here
-
Clone this repository using:
$ git clone git@github.com:Robust-Analytics/zillow-project.git
To open the file in a jupyter notebook use following code:
import pandas as pd
df = pd.read_csv('zillow.csv')
- Codeup Data Science Team
- Darden Cohort
- Generated with ryans_codeup_data_science_mvp
How to reach Anthony
How to reach Chris