Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Building a DS Portfolio] New course #297

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions notebooks/ds_portfolio/raw/tut2.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction\n",
"\n",
"The first thing to do is pick a topic for your project. This involves:\n",
"- selecting a dataset of interest, and \n",
"- deciding (at a high level) how you will examine and use your dataset. \n",
"\n",
"Picking a topic is usually quite difficult, because there are so many possible directions any project can take. How do you choose? \n",
"\n",
"In this tutorial, we provide some advice, along with some specific tools you can use to guide your search.\n",
"\n",
"# Advice\n",
"\n",
"### Use interesting data.\n",
"\n",
"Popular datasets like [Titanic](https://www.kaggle.com/c/titanic/data), [Iris](https://www.kaggle.com/uciml/iris), and [Breast Cancer Wisconsin](https://www.kaggle.com/uciml/breast-cancer-wisconsin-data) are great for learning, because you can learn a lot from the detailed analyses that many people have written using them. However, they are not good candidates for building your portfolio project. This is because they are often covered in great detail in introductory courses, and so it's not clear to a viewer how much of your progress is \n",
"\n",
"Instead of using one of these popular datasets, you should look for a dataset that hasn't been overly explored, so you are more likely to find something that is interesting and novel. Having a project topic that feels unique is important: it demonstrates to a potential employer that you can proactively find opportunities to leverage analyze unstructured data, in a way that results in well-informed, data-driven decisions.\n",
"\n",
"### Your project should inspire you.\n",
"\n",
"Choose a project that is interesting to you personally. This way, you're more likely to complete your project. If you're excited about it, you're more likely to be genuinely engaging when you talk about it in a job interview.\n",
"\n",
"### The success of your project should not depend on what you find.\n",
"\n",
"Jason Goodman, a Data Scientist at Airbnb, has a [Medium post](https://medium.com/@jasonkgoodman/advice-on-building-data-portfolio-projects-c5f96d8a0627) with practical advice for building DS portfolio projects. In the post, he descibes a project in college where he investigated the impact of fraud on neighboring nonprofits. Unfortunately, he discovered that fraud had no impact, which ultimately resulted in a project that was not very interesting.\n",
"\n",
"### Base your project on a desired job description.\n",
"\n",
"If you know what kind of job you would like, you can tailor your project to demonstrate mastery of specific skills, or familiarity with certain types of data. Even if you end up ultimately working for a different company, many of the skills you'll demonstrate in your project will likely be transferable to other businesses.\n",
"\n",
"For instance, say you're interested in a job with Zillow as a Data Scientist. ([Zillow](https://www.zillow.com/) is an online real estate database company.)\n",
"- If there's an active Kaggle competition that is hosted by Zillow, this is an excellent opportunity to collaborate with an international community while working on a project that the company cares about. For instance, [Zillow's Home Value Prediction (Zestimate) competition](https://www.kaggle.com/c/zillow-prize-1/overview) was held a few years ago. \n",
"- After a competition has closed, the [notebooks](https://www.kaggle.com/c/zillow-prize-1/notebooks) written by other users can be good inspiration for your project. The [discussion](https://www.kaggle.com/c/zillow-prize-1/discussion) might help you to clarify doubts you have about working with similar data.\n",
"- If you'd prefer to work with data that is more raw and unstructured than what you typically find in a Kaggle competition, you can search for Zillow data in Kaggle Datasets. This [Zillow dataset](https://www.kaggle.com/paultimothymooney/zillow-house-price-data) is updated every month and downloaded directly from [https://www.zillow.com/research/data/](https://www.zillow.com/research/data/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Working with Kaggle datasets\n",
"\n",
"how to search for that data, discover dataset. \n",
"\n",
"look at tasks ...\n",
"\n",
"look at kernels that people have written ... learn more in next tutorial how to work with them"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}