I am currently enrolled at the New York City Data Science Academy.
In my third project with the Academy, I experiment with a range of Machine Learning algorithms to predict house prices in Ames, Iowa (2006~2010).
There are several variations of the dataset available on Kaggle:
-
Some datasets have fewer observations with 1460 each for training and testing.
(https://www.kaggle.com/datasets/marcopale/housing?select=AmesHousing.csv). -
More recent datasets are larger in nature with 2500 for training and 1500 for testing. (https://www.kaggle.com/competitions/stat101ahouseprice/data?select=HTestW19Final+No+Y+values.csv)
We were provided with an in-house version of 2580 observations which we could use for training and testing. If you would like to test the results of this repo with our version of the dataset, please feel free to reach out to me and I will happily share it. Most of the models employed in my project can also be extended to other versions of the dataset as well, although results may vary on the margin depending on the exact version being used.