From 5c9563a501f2f35428f8a50dba45db67736daf7f Mon Sep 17 00:00:00 2001 From: Alexis Cook Date: Thu, 16 Jul 2020 11:23:54 -0500 Subject: [PATCH 1/4] first stab --- notebooks/ds_portfolio/raw/tut2.ipynb | 95 +++++++++++++++++++++++++++ 1 file changed, 95 insertions(+) create mode 100644 notebooks/ds_portfolio/raw/tut2.ipynb diff --git a/notebooks/ds_portfolio/raw/tut2.ipynb b/notebooks/ds_portfolio/raw/tut2.ipynb new file mode 100644 index 000000000..927e8842c --- /dev/null +++ b/notebooks/ds_portfolio/raw/tut2.ipynb @@ -0,0 +1,95 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Introduction\n", + "\n", + "The first thing to do is pick a topic for your project. This involves:\n", + "- selecting a dataset of interest, and \n", + "- deciding (at a high level) how you will examine and use your dataset. \n", + "\n", + "Picking a topic is usually quite difficult, because there are so many possible directions any project can take. For instance,\n", + "- \n", + "\n", + "How do you choose? \n", + "\n", + "In this tutorial, we provide some advice, along with a specific case study for inspiration.\n", + "\n", + "# Advice\n", + "\n", + "### Use interesting data.\n", + "\n", + "Popular datasets like [Titanic](https://www.kaggle.com/c/titanic/data), [Iris](https://www.kaggle.com/uciml/iris), and [Breast Cancer Wisconsin](https://www.kaggle.com/uciml/breast-cancer-wisconsin-data) are great for learning, because you can learn a lot from the detailed analyses that many people have written using them. However, they are not good candidates for building your portfolio project. This is because it is incredibly difficult to discover something that has not already been found by thousands of people before you. \n", + "\n", + "Instead of using one of these popular datasets, you should look for a dataset that hasn't been overly explored, so you are more likely to find interesting ...\n", + "\n", + "### Your project should inspire you.\n", + "\n", + "Choose a project that is interesting to you personally, so when you talk about it in an interview, you can be more engaging. \n", + "\n", + "### The success of your project should not depend on what you find.\n", + "\n", + "Jason Goodman, a Data Scientist at Airbnb, [describes](https://medium.com/@jasonkgoodman/advice-on-building-data-portfolio-projects-c5f96d8a0627) I did a project in college looking for the impact of fraud on neighboring nonprofits. It was only going to be interesting if there was an impact, but it turned out that there wasn’t.\n", + "\n", + "but should not use them to build a portfolio project. try to select dataset no one has worked with before / that hasn't been over-analyzed\n", + "\n", + "### Base your project on a desired job description.\n", + "\n", + "if you know what job you'd like, can tailor project to demonstrating specific skills. This skill lines up with what a real word data scientist does." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Case study\n", + "\n", + "Zillo is .... Say you're interested in a job with Zillow Offers as a Data Scientist.\n", + "\n", + "> Are you passionate about analyzing historical trends looking for patterns? Are you driven to understand not just WHAT happened, but more importantly, WHY? Our team is tasked with absorbing billions of rows of data from dozens of sources, organizing them, analyzing them, and visualizing them to help inform both short and long-term decision-making. As a Data Scientist on the Zillow Offers team, you will help improve our ability to accurately value homes thereby creating fairer offers for sellers and enabling us to lower the costs of the business. You will help us personalize and optimize the consumer experience for home sellers seeking a Zillow Offer. You will help us understand what’s working well, and where we need to improve.\n", + "\n", + "then i can loook for data on kaggle around y\n", + "\n", + "### Exploring Kaggle datasets\n", + "\n", + "how to search for that data, discover dataset. \n", + "\n", + "look at tasks ...\n", + "\n", + "### Exploring Kaggle notebooks\n", + "\n", + "look at kernels that people have written ... learn more in next tutorial how to work with them" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 425ec23438be05f2e7bcee98a33956a66bd00aed Mon Sep 17 00:00:00 2001 From: Alexis Cook Date: Tue, 21 Jul 2020 16:09:29 -0500 Subject: [PATCH 2/4] pushing changes before switch to another branch --- notebooks/ds_portfolio/raw/tut2.ipynb | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/notebooks/ds_portfolio/raw/tut2.ipynb b/notebooks/ds_portfolio/raw/tut2.ipynb index 927e8842c..19c5a391c 100644 --- a/notebooks/ds_portfolio/raw/tut2.ipynb +++ b/notebooks/ds_portfolio/raw/tut2.ipynb @@ -10,10 +10,7 @@ "- selecting a dataset of interest, and \n", "- deciding (at a high level) how you will examine and use your dataset. \n", "\n", - "Picking a topic is usually quite difficult, because there are so many possible directions any project can take. For instance,\n", - "- \n", - "\n", - "How do you choose? \n", + "Picking a topic is usually quite difficult, because there are so many possible directions any project can take. How do you choose? \n", "\n", "In this tutorial, we provide some advice, along with a specific case study for inspiration.\n", "\n", @@ -23,21 +20,21 @@ "\n", "Popular datasets like [Titanic](https://www.kaggle.com/c/titanic/data), [Iris](https://www.kaggle.com/uciml/iris), and [Breast Cancer Wisconsin](https://www.kaggle.com/uciml/breast-cancer-wisconsin-data) are great for learning, because you can learn a lot from the detailed analyses that many people have written using them. However, they are not good candidates for building your portfolio project. This is because it is incredibly difficult to discover something that has not already been found by thousands of people before you. \n", "\n", - "Instead of using one of these popular datasets, you should look for a dataset that hasn't been overly explored, so you are more likely to find interesting ...\n", + "Instead of using one of these popular datasets, you should look for a dataset that hasn't been overly explored, so you are more likely to find something that is interesting and novel. Having a project topic that feels unique is important: it demonstrates to a potential employer that you can proactively find opportunities to leverage analyze unstructured data, in a way that results in well-informed, data-driven decisions.\n", "\n", "### Your project should inspire you.\n", "\n", - "Choose a project that is interesting to you personally, so when you talk about it in an interview, you can be more engaging. \n", + "Choose a project that is interesting to you personally. This way, you're more likely to complete your project. If you're excited about it, you're more likely to be genuinely engaging when you talk about it in a job interview.\n", "\n", "### The success of your project should not depend on what you find.\n", "\n", - "Jason Goodman, a Data Scientist at Airbnb, [describes](https://medium.com/@jasonkgoodman/advice-on-building-data-portfolio-projects-c5f96d8a0627) I did a project in college looking for the impact of fraud on neighboring nonprofits. It was only going to be interesting if there was an impact, but it turned out that there wasn’t.\n", + "Jason Goodman, a Data Scientist at Airbnb, has a [Medium post](https://medium.com/@jasonkgoodman/advice-on-building-data-portfolio-projects-c5f96d8a0627) with practical advice for building DS portfolio projects. In the post, he descibes a project in college where he investigated the impact of fraud on neighboring nonprofits. Unfortunately, he discovered that fraud had no impact, which ultimately resulted in a project that was not very interesting.\n", "\n", - "but should not use them to build a portfolio project. try to select dataset no one has worked with before / that hasn't been over-analyzed\n", + "### Nothing is truly original.\n", "\n", "### Base your project on a desired job description.\n", "\n", - "if you know what job you'd like, can tailor project to demonstrating specific skills. This skill lines up with what a real word data scientist does." + "If you know what kind of job you would like, you can tailor your project to demonstrate mastery of specific skills, or familiarity with certain types . For instance, " ] }, { From 823e03d4488024fcccdd2359be3bb32a4e56b5bc Mon Sep 17 00:00:00 2001 From: Alexis Cook Date: Tue, 21 Jul 2020 17:40:28 -0500 Subject: [PATCH 3/4] need to switch branches --- notebooks/ds_portfolio/raw/tut2.ipynb | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/notebooks/ds_portfolio/raw/tut2.ipynb b/notebooks/ds_portfolio/raw/tut2.ipynb index 19c5a391c..9dbbd883c 100644 --- a/notebooks/ds_portfolio/raw/tut2.ipynb +++ b/notebooks/ds_portfolio/raw/tut2.ipynb @@ -30,8 +30,6 @@ "\n", "Jason Goodman, a Data Scientist at Airbnb, has a [Medium post](https://medium.com/@jasonkgoodman/advice-on-building-data-portfolio-projects-c5f96d8a0627) with practical advice for building DS portfolio projects. In the post, he descibes a project in college where he investigated the impact of fraud on neighboring nonprofits. Unfortunately, he discovered that fraud had no impact, which ultimately resulted in a project that was not very interesting.\n", "\n", - "### Nothing is truly original.\n", - "\n", "### Base your project on a desired job description.\n", "\n", "If you know what kind of job you would like, you can tailor your project to demonstrate mastery of specific skills, or familiarity with certain types . For instance, " @@ -43,9 +41,9 @@ "source": [ "# Case study\n", "\n", - "Zillo is .... Say you're interested in a job with Zillow Offers as a Data Scientist.\n", + "Say you're interested in a job with Zillow Offers as a Data Scientist. You read the following in a job description.\n", "\n", - "> Are you passionate about analyzing historical trends looking for patterns? Are you driven to understand not just WHAT happened, but more importantly, WHY? Our team is tasked with absorbing billions of rows of data from dozens of sources, organizing them, analyzing them, and visualizing them to help inform both short and long-term decision-making. As a Data Scientist on the Zillow Offers team, you will help improve our ability to accurately value homes thereby creating fairer offers for sellers and enabling us to lower the costs of the business. You will help us personalize and optimize the consumer experience for home sellers seeking a Zillow Offer. You will help us understand what’s working well, and where we need to improve.\n", + "> Are you passionate about **analyzing historical trends** looking for patterns? Are you driven to understand not just WHAT happened, but more importantly, WHY? Our team is tasked with absorbing billions of rows of data from dozens of sources, organizing them, analyzing them, and **visualizing them** to help inform both short and long-term decision-making. As a Data Scientist on the Zillow Offers team, you will help improve our ability to **accurately value homes** thereby creating fairer offers for sellers and enabling us to lower the costs of the business. You will help us personalize and optimize the consumer experience for home sellers seeking a Zillow Offer. You will help us understand what’s working well, and where we need to improve.\n", "\n", "then i can loook for data on kaggle around y\n", "\n", From eb7670a4631596b97be06bf82930dd7e0b3ce791 Mon Sep 17 00:00:00 2001 From: Alexis Cook Date: Wed, 22 Jul 2020 16:51:35 -0500 Subject: [PATCH 4/4] to switch branches --- notebooks/ds_portfolio/raw/tut2.ipynb | 30 ++++++++------------------- 1 file changed, 9 insertions(+), 21 deletions(-) diff --git a/notebooks/ds_portfolio/raw/tut2.ipynb b/notebooks/ds_portfolio/raw/tut2.ipynb index 9dbbd883c..10c1db1d4 100644 --- a/notebooks/ds_portfolio/raw/tut2.ipynb +++ b/notebooks/ds_portfolio/raw/tut2.ipynb @@ -12,13 +12,13 @@ "\n", "Picking a topic is usually quite difficult, because there are so many possible directions any project can take. How do you choose? \n", "\n", - "In this tutorial, we provide some advice, along with a specific case study for inspiration.\n", + "In this tutorial, we provide some advice, along with some specific tools you can use to guide your search.\n", "\n", "# Advice\n", "\n", "### Use interesting data.\n", "\n", - "Popular datasets like [Titanic](https://www.kaggle.com/c/titanic/data), [Iris](https://www.kaggle.com/uciml/iris), and [Breast Cancer Wisconsin](https://www.kaggle.com/uciml/breast-cancer-wisconsin-data) are great for learning, because you can learn a lot from the detailed analyses that many people have written using them. However, they are not good candidates for building your portfolio project. This is because it is incredibly difficult to discover something that has not already been found by thousands of people before you. \n", + "Popular datasets like [Titanic](https://www.kaggle.com/c/titanic/data), [Iris](https://www.kaggle.com/uciml/iris), and [Breast Cancer Wisconsin](https://www.kaggle.com/uciml/breast-cancer-wisconsin-data) are great for learning, because you can learn a lot from the detailed analyses that many people have written using them. However, they are not good candidates for building your portfolio project. This is because they are often covered in great detail in introductory courses, and so it's not clear to a viewer how much of your progress is \n", "\n", "Instead of using one of these popular datasets, you should look for a dataset that hasn't been overly explored, so you are more likely to find something that is interesting and novel. Having a project topic that feels unique is important: it demonstrates to a potential employer that you can proactively find opportunities to leverage analyze unstructured data, in a way that results in well-informed, data-driven decisions.\n", "\n", @@ -32,38 +32,26 @@ "\n", "### Base your project on a desired job description.\n", "\n", - "If you know what kind of job you would like, you can tailor your project to demonstrate mastery of specific skills, or familiarity with certain types . For instance, " + "If you know what kind of job you would like, you can tailor your project to demonstrate mastery of specific skills, or familiarity with certain types of data. Even if you end up ultimately working for a different company, many of the skills you'll demonstrate in your project will likely be transferable to other businesses.\n", + "\n", + "For instance, say you're interested in a job with Zillow as a Data Scientist. ([Zillow](https://www.zillow.com/) is an online real estate database company.)\n", + "- If there's an active Kaggle competition that is hosted by Zillow, this is an excellent opportunity to collaborate with an international community while working on a project that the company cares about. For instance, [Zillow's Home Value Prediction (Zestimate) competition](https://www.kaggle.com/c/zillow-prize-1/overview) was held a few years ago. \n", + "- After a competition has closed, the [notebooks](https://www.kaggle.com/c/zillow-prize-1/notebooks) written by other users can be good inspiration for your project. The [discussion](https://www.kaggle.com/c/zillow-prize-1/discussion) might help you to clarify doubts you have about working with similar data.\n", + "- If you'd prefer to work with data that is more raw and unstructured than what you typically find in a Kaggle competition, you can search for Zillow data in Kaggle Datasets. This [Zillow dataset](https://www.kaggle.com/paultimothymooney/zillow-house-price-data) is updated every month and downloaded directly from [https://www.zillow.com/research/data/](https://www.zillow.com/research/data/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Case study\n", - "\n", - "Say you're interested in a job with Zillow Offers as a Data Scientist. You read the following in a job description.\n", - "\n", - "> Are you passionate about **analyzing historical trends** looking for patterns? Are you driven to understand not just WHAT happened, but more importantly, WHY? Our team is tasked with absorbing billions of rows of data from dozens of sources, organizing them, analyzing them, and **visualizing them** to help inform both short and long-term decision-making. As a Data Scientist on the Zillow Offers team, you will help improve our ability to **accurately value homes** thereby creating fairer offers for sellers and enabling us to lower the costs of the business. You will help us personalize and optimize the consumer experience for home sellers seeking a Zillow Offer. You will help us understand what’s working well, and where we need to improve.\n", - "\n", - "then i can loook for data on kaggle around y\n", - "\n", - "### Exploring Kaggle datasets\n", + "# Working with Kaggle datasets\n", "\n", "how to search for that data, discover dataset. \n", "\n", "look at tasks ...\n", "\n", - "### Exploring Kaggle notebooks\n", - "\n", "look at kernels that people have written ... learn more in next tutorial how to work with them" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": {