diff --git a/content/files/codespaces_screenshot.png b/content/files/codespaces_screenshot.png new file mode 100644 index 0000000..980acba Binary files /dev/null and b/content/files/codespaces_screenshot.png differ diff --git a/content/python_overview.ipynb b/content/python_overview.ipynb index a6401da..df2736c 100644 --- a/content/python_overview.ipynb +++ b/content/python_overview.ipynb @@ -1,205 +1,41 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "f1125d5b-1a63-4df3-8abd-5a2680c9892e", - "metadata": { - "tags": [] - }, - "source": [ - "# Python Overview\n", - "\n", - "- [Intro](#intro)\n", - "- [What is Python?](#what-is-python)\n", - "- [Coding contexts](#coding-contexts)\n", - " - [Python interactive interpreter](#python-interactive-interpreter)\n", - " - [Python scripts](#python-scripts)\n", - " - [Jupyter notebooks](#jupyter-notebooks)\n", - " - [Python in the Browser](#python-in-the-browser)\n", - " - [Coding in the Cloud](#coding-in-the-cloud)\n", - "\n", - "\n", - "## Intro\n", - "\n", - "Python is a general-purpose programming language, meaning that it's useful for a variety of tasks. Python is far from the only language used in newsrooms, but it *is* one of the most common because it's so versatile and relatively easy to learn.\n", - "\n", - "Newsrooms use Python in countless ways:\n", - "\n", - "* Scraping data from government websites\n", - "* Mining documents\n", - "* Accessing data in APIs\n", - "* Building data-driven web applications\n", - "* Automating workflows\n", - "* Provisioning servers in the cloud\n", - "* Generating automated news content\n", - "* Creating data gathering web admins\n", - "* Analyzing satellite imagery\n", - "* Visualizing data\n", - "\n", - "...the list goes on and on.\n", - "\n", - "## What is Python?\n", - "\n", - "But what, precisely, is Python? Here's a definition from [Automate the Boring Stuff](https://automatetheboringstuff.com/2e/chapter0/):\n", - "\n", - "> Python is a programming language (with syntax rules for writing what is considered valid Python code) and the Python interpreter software that reads source code (written in the Python language) and performs its instructions.\n", - "\n", - "Pretty concise, but still fairly abstract. Let's explore that definition with a simple example.\n", - "\n", - "Open a Terminal, type the following and hit `return`:\n", - "\n", - "```python\n", - "python -c \"print('hello world')\"\n", - "```\n", - "\n", - "Congratulations! You've just used both the Python *interpreter* and the Python *language*!\n", - "\n", - "Above, the `print('hello world')` portion of code represents one small bit of the Python programming language. On its own, this bit of code won't do anything. We need to use the [Python interpreter](https://docs.python.org/3/tutorial/interpreter.html) to...well...interpret and execute this bit of source code. We did this by passing the Python command -- in this case a simple [print](https://docs.python.org/3.5/library/functions.html#print) statement -- to the Python interpreter.\n", - "\n", - "> The `-c` flag allows us to pass commands to the Python interpreter directly on the command line.\n", - "\n", - "Yes, this is a very basic example. But it highlights an important point:\n", - "\n", - "**When we write code, we're creating instructions for a machine to interpret and execute. We need both pieces of the puzzle -- the instructions and the interpreter -- to do real work.**\n", - "\n", - "## Coding contexts\n", - "\n", - "Python programmers work in a variety of environments, or coding \"contexts\". The tools and workflow vary by coder and task.\n", - "\n", - "In this course, we'll introduce you to a few common contexts for writing and executing code. Each of these tools and associated workflows have their own strengths, as detailed below.\n", - "\n", - "For this course, we'll draw clear lines between these tools. In particular, we'll focus on using traditional Python scripts for data acquisition while performing data transformation and analysis in [Jupyter][] notebooks.\n", - "\n", - "This separation of concerns is subjective and the lines are often blurred in practice by data journalists. For example, it's common for data journalists to download CSVs or grab data from APIs in a Jupyter notebook. Similarly, they may perform sophisticated transformations and analyses directly in a script as part of a larger data processing pipeline.\n", - "\n", - "Individual or team preferences as well as practical concerns (e.g. the complexity or size of data) often dictate where journalists draw these lines.\n", - "\n", - "### Python interactive interpreter\n", - "\n", - "The Python interpreter can be run in an [interactive mode](https://docs.python.org/3/tutorial/interpreter.html#interactive-mode) on the command line by simply typing `python` or `python3`.\n", - "\n", - "This interactive environment allows you to write code and execute it in real time. It's a great tool for experimenting with Python syntax and libraries during active development of a larger script or library.\n", - "\n", - "```\n", - "~> python\n", - "Python 3.7.0 (default, Jun 7 2019, 14:35:44)\n", - "[Clang 9.1.0 (clang-902.0.39.2)] on darwin\n", - "Type \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n", - ">>> 2 + 2\n", - "4\n", - ">>> print('hello world')\n", - "hello world\n", - ">>> exit()\n", - "~>\n", - "```\n", - "\n", - "### Python scripts\n", - "\n", - "Python scripts are static text files with a `.py` extension that contain Python code. These files are created using a code editor such as Visual Studio Code.\n", - "\n", - "Similar to bash shell scripts, they are typically run from the [command line](https://docs.python.org/3/using/cmdline.html#command-line) and are quite handy for peforming automated tasks such as scraping a web site and updating a database.\n", - "\n", - "A script by itself is just Python code in a text file. You must pass the script to the Python interpreter in order to execute the code.\n", - "\n", - "```\n", - "# Create a toy python script\n", - "~> echo \"print('hello world')\" > myscript.py\n", - "\n", - "# See the script contents\n", - "~> cat myscript.py\n", - "print('hello world')\n", - "\n", - "# Run the script\n", - "~> python myscript.py\n", - "hello world\n", - "~>\n", - "```\n", - "\n", - "\n", - "### Jupyter notebooks\n", - "\n", - "[Jupyter][] is an interactive Python environment that runs in a web browser. Data journalists use Jupyter to create human-friendly notebooks that blend narrative explanations of their work with actual, working code.\n", - "\n", - "![jupyter example](files/jupyter_demo.png)\n", - "\n", - "The traditional way to run Jupyter Notebooks is to install the Jupyter Lab software on your machine and start the program from the command line. \n", - "\n", - "It's important to note, however, that there are also hosted Jupyter environments such as [Google Colab][], [Kaggle Notebooks][], etc where third parties run the Jupyter Notebook or Jupyter Lab software for you. These can be very convenient, providing a nice combination of zero overhead with the ability to do real work, in some cases including features such as real-time collaboration. However, these environments also have limitations (e.g. the amount of data you can process or analyze) as well as their own non-standard workflows.\n", - "\n", - "### Python in the Browser\n", - "\n", - "In the last few years, we've also seen the rise of a technology called [WebAssembly][], which among other things allows you to run more computationally heavy software that is not native web code (e.g. Javascript) directly in your browser. This power extends to lower-level programming languages such as Python and its Jupyter Lab environment, which traditionally have been run on our own machines, on virtual machines in the cloud, or hosted for us by third parties such as [Google Colab][].\n", - "\n", - "The ability to run Python and Jupyter directly in your browser means that you don't need to install the programming language or the Jupyter Lab software. Nor do you need to use a third party such as Google to host a Jupyter notebook for you.\n", - "\n", - "It's a super-convenient way to learn without having to slog through the process of setting up your own local installation. And in fact, many of the tutorials we'll use in this course -- including the one you're currently reading -- run on JupyterLite.\n", - "\n", - "However, there are drawbacks. JupyterLite installations are not intended for handling large quantities of data, and there are limitations and friction points when it comes to saving work and normal day-to-day usages of Python, such as idiosyncratic workflows for the very common case of obtaining files from other websites, e.g. when scraping a government agency for data or documents.\n", - "\n", - "\n", - "## Coding in the Cloud\n", - "\n", - "![codespaces](files/codespaces/codespaces_landing.png)\n", - "\n", - "Perhaps the most exciting development in the last few years has been the rise of cloud coding environment such as GitHub Codespaces. These environments combine the simplicity of setup with the flexibility to support standard and customized workflows, without the idiosyncracies common to platforms such as Google Colab and JupyterLite.\n", - "\n", - "They run on small virtual machines in the cloud, and they allow you to save work directly to a GitHub code repository and save the state of your virtual machine so you can pick up where you left off at your next coding session.\n", - "\n", - "And of course, there are a caveats. Most importantly, these environments typically operate on a freemium model, where you get a certain number of \"compute\" time for free (e.g. at the time of writing you can run a basic machine with 2GB of RAM on Codespaces for 60 hours before incurring hourly charges of 0.18 per hour. You also get 15GB of storage for free, with each additional GB costing .07 cents per month.\n", - "\n", - "In this course, we'll make regular use of GitHub Codespaces for our assignments, since they offer a nice balance of standardized workflows and a reasonable free tier. \n", - "\n", - "Equally important -- you can trust that as you learn to code in this environment, it transfers readily to a \"local\" workflow on your machine using the same tools and environments.\n", - "\n", - "## So what Python environment should I use?\n", - "\n", - "In our opinion, there's a time and a place for each of these different coding contexts.\n", - "\n", - "JupyterLite -- ie Python in your Browser -- is a great way to start ramping up immediately. It's so handy that the First Python Notebook is actually a JupyterLite instance that requires no installation of Python or related libraries for you to get started.\n", - "\n", - "But when you're working on projects, we prefer other options. A plain old code editor is handy for whipping up Python scripts or multi-step pipelines which need to run on a regular schedule on a virtual machine in the cloud. These types of machines typically have no graphical interface, and while you *can* run Jupyter Notebooks as scripts in a shell, it's far more common and convenient to use plain old Python scripts.\n", - "\n", - "For data analysis, we of course recommend Jupyter Notebooks/Lab, either running in your browser or using a third party provider such as Google Colab. \n", - "\n", - "When starting out, it can be tempting to choose convenience (e.g. Google Colab) over learning the slightly harder but more standard way of doing things. In this course, we'll take the latter route, primarily because we want you to learn standard workflows that most teams in the news use, and many of the tutorials and blog posts assume out on the wider Internet. That said, we're very excited about CodeSpaces, which combine standard workflows with a zero setup environment based entirely in the cloud. While it too has limitations in terms of pricing and resources, it's a convenient way to get up and running on real work, using standard practices.\n", - "\n", - "Last but not least, even the humble Python interpreter in your shell can be handy for quickly testing out code snippets and exploring a library, without the overhead of having to install and run a Jupyter Notebook.\n", - "\n", - "\n", - "[Jupyter]: https://jupyter.org/\n", - "[Google Colab]: https://research.google.com/colaboratory/\n", - "[Kaggle Notebooks]: https://www.kaggle.com/docs/notebooks\n", - "[WebAssembly]: https://webassembly.org/" - ] + "metadata": { + "kernelspec": { + "name": "python", + "display_name": "Python (Pyodide)", + "language": "python" + }, + "language_info": { + "codemirror_mode": { + "name": "python", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8" + } }, - { - "cell_type": "code", - "execution_count": null, - "id": "aa04f308-346b-4549-8a48-ee5fc1f7432a", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.4" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} + "nbformat_minor": 5, + "nbformat": 4, + "cells": [ + { + "cell_type": "markdown", + "source": "# Python Overview\n\n- [Intro](#intro)\n- [What is Python?](#what-is-python)\n- [Coding contexts](#coding-contexts)\n - [Python interactive interpreter](#python-interactive-interpreter)\n - [Python scripts](#python-scripts)\n - [Jupyter notebooks](#jupyter-notebooks)\n - [Python in the Browser](#python-in-the-browser)\n - [Coding in the Cloud](#coding-in-the-cloud)\n\n\n## Intro\n\nPython is a general-purpose programming language, meaning that it's useful for a variety of tasks. Python is far from the only language used in newsrooms, but it *is* one of the most common because it's so versatile and relatively easy to learn.\n\nNewsrooms use Python in countless ways:\n\n* Scraping data from government websites\n* Mining documents\n* Accessing data in APIs\n* Building data-driven web applications\n* Automating workflows\n* Provisioning servers in the cloud\n* Generating automated news content\n* Creating data gathering web admins\n* Analyzing satellite imagery\n* Visualizing data\n\n...the list goes on and on.\n\n## What is Python?\n\nBut what, precisely, is Python? Here's a definition from [Automate the Boring Stuff](https://automatetheboringstuff.com/2e/chapter0/):\n\n> Python is a programming language (with syntax rules for writing what is considered valid Python code) and the Python interpreter software that reads source code (written in the Python language) and performs its instructions.\n\nPretty concise, but still fairly abstract. Let's explore that definition with a simple example.\n\nOpen a Terminal, type the following and hit `return`:\n\n```python\npython -c \"print('hello world')\"\n```\n\nCongratulations! You've just used both the Python *interpreter* and the Python *language*!\n\nAbove, the `print('hello world')` portion of code represents one small bit of the Python programming language. On its own, this bit of code won't do anything. We need to use the [Python interpreter](https://docs.python.org/3/tutorial/interpreter.html) to...well...interpret and execute this bit of source code. We did this by passing the Python command -- in this case a simple [print](https://docs.python.org/3.5/library/functions.html#print) statement -- to the Python interpreter.\n\n> The `-c` flag allows us to pass commands to the Python interpreter directly on the command line.\n\nYes, this is a very basic example. But it highlights an important point:\n\n**When we write code, we're creating instructions for a machine to interpret and execute. We need both pieces of the puzzle -- the instructions and the interpreter -- to do real work.**\n\n## Coding contexts\n\nPython programmers work in a variety of environments, or coding \"contexts\". The tools and workflow vary by coder and task.\n\nIn this course, we'll introduce you to a few common contexts for writing and executing code. Each of these tools and associated workflows have their own strengths, as detailed below.\n\nFor this course, we'll draw clear lines between these tools. In particular, we'll focus on using traditional Python scripts for data acquisition while performing data transformation and analysis in [Jupyter][] notebooks.\n\nThis separation of concerns is subjective and the lines are often blurred in practice by data journalists. For example, it's common for data journalists to download CSVs or grab data from APIs in a Jupyter notebook. Similarly, they may perform sophisticated transformations and analyses directly in a script as part of a larger data processing pipeline.\n\nIndividual or team preferences as well as practical concerns (e.g. the complexity or size of data) often dictate where journalists draw these lines.\n\n### Python interactive interpreter\n\nThe Python interpreter can be run in an [interactive mode](https://docs.python.org/3/tutorial/interpreter.html#interactive-mode) on the command line by simply typing `python` or `python3`.\n\nThis interactive environment allows you to write code and execute it in real time. It's a great tool for experimenting with Python syntax and libraries during active development of a larger script or library.\n\n```\n~> python\nPython 3.7.0 (default, Jun 7 2019, 14:35:44)\n[Clang 9.1.0 (clang-902.0.39.2)] on darwin\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>> 2 + 2\n4\n>>> print('hello world')\nhello world\n>>> exit()\n~>\n```\n\n### Python scripts\n\nPython scripts are static text files with a `.py` extension that contain Python code. These files are created using a code editor such as Visual Studio Code.\n\nSimilar to bash shell scripts, they are typically run from the [command line](https://docs.python.org/3/using/cmdline.html#command-line) and are quite handy for peforming automated tasks such as scraping a web site and updating a database.\n\nA script by itself is just Python code in a text file. You must pass the script to the Python interpreter in order to execute the code.\n\n```\n# Create a toy python script\n~> echo \"print('hello world')\" > myscript.py\n\n# See the script contents\n~> cat myscript.py\nprint('hello world')\n\n# Run the script\n~> python myscript.py\nhello world\n~>\n```\n\n\n### Jupyter notebooks\n\n[Jupyter][] is an interactive Python environment that runs in a web browser. Data journalists use Jupyter to create human-friendly notebooks that blend narrative explanations of their work with actual, working code.\n\n![jupyter example](files/jupyter_demo.png)\n\nThe traditional way to run Jupyter Notebooks is to install the Jupyter Lab software on your machine and start the program from the command line. \n\nIt's important to note, however, that there are also hosted Jupyter environments such as [Google Colab][], [Kaggle Notebooks][], etc where third parties run the Jupyter Notebook or Jupyter Lab software for you. These can be very convenient, providing a nice combination of zero overhead with the ability to do real work, in some cases including features such as real-time collaboration. However, these environments also have limitations (e.g. the amount of data you can process or analyze) as well as their own non-standard workflows.\n\n### Python in the Browser\n\nIn the last few years, we've also seen the rise of a technology called [WebAssembly][], which among other things allows you to run more computationally heavy software that is not native web code (e.g. Javascript) directly in your browser. This power extends to lower-level programming languages such as Python and its Jupyter Lab environment, which traditionally have been run on our own machines, on virtual machines in the cloud, or hosted for us by third parties such as [Google Colab][].\n\nThe ability to run Python and Jupyter directly in your browser means that you don't need to install the programming language or the Jupyter Lab software. Nor do you need to use a third party such as Google to host a Jupyter notebook for you.\n\nIt's a super-convenient way to learn without having to slog through the process of setting up your own local installation. And in fact, many of the tutorials we'll use in this course -- including the one you're currently reading -- run on JupyterLite.\n\nHowever, there are drawbacks. JupyterLite installations are not intended for handling large quantities of data, and there are limitations and friction points when it comes to saving work and normal day-to-day usages of Python, such as idiosyncratic workflows for the very common case of obtaining files from other websites, e.g. when scraping a government agency for data or documents.\n\n\n## Coding in the Cloud\n\n![codespaces](files/codespaces_screenshot.png)\n\nPerhaps the most exciting development in the last few years has been the rise of cloud coding environment such as [GitHub Codespaces][]. These environments combine the simplicity of setup with the flexibility to support standard and customized workflows, without the idiosyncracies common to platforms such as Google Colab and JupyterLite.\n\nThey run on small virtual machines in the cloud, and they allow you to save work directly to a GitHub code repository and save the state of your virtual machine so you can pick up where you left off at your next coding session.\n\nAnd of course, there are a caveats. Most importantly, these environments typically operate on a freemium model, where you get a certain number of \"compute\" time for free (e.g. at the time of writing you can run a basic machine with 2GB of RAM on Codespaces for 60 hours before incurring hourly charges of 0.18 per hour. You also get 15GB of storage for free, with each additional GB costing .07 cents per month.\n\nIn this course, we'll make regular use of GitHub Codespaces for our assignments, since they offer a nice balance of standardized workflows and a reasonable free tier. \n\nEqually important -- you can trust that as you learn to code in this environment, it transfers readily to a \"local\" workflow on your machine using the same tools and environments.\n\n## So what Python environment should I use?\n\nIn our opinion, there's a time and a place for each of these different coding contexts.\n\nJupyterLite -- ie Python in your Browser -- is a great way to start ramping up immediately. It's so handy that the First Python Notebook is actually a JupyterLite instance that requires no installation of Python or related libraries for you to get started.\n\nBut when you're working on projects, we prefer other options. A plain old code editor is handy for whipping up Python scripts or multi-step pipelines which need to run on a regular schedule on a virtual machine in the cloud. These types of machines typically have no graphical interface, and while you *can* run Jupyter Notebooks as scripts in a shell, it's far more common and convenient to use plain old Python scripts.\n\nFor data analysis, we of course recommend Jupyter Notebooks/Lab, either running in your browser or using a third party provider such as Google Colab. \n\nWhen starting out, it can be tempting to choose convenience (e.g. Google Colab) over learning the slightly harder but more standard way of doing things. In this course, we'll take the latter route, primarily because we want you to learn standard workflows that most teams in the news use, and many of the tutorials and blog posts assume out on the wider Internet. That said, we're very excited about CodeSpaces, which combine standard workflows with a zero setup environment based entirely in the cloud. While it too has limitations in terms of pricing and resources, it's a convenient way to get up and running on real work, using standard practices.\n\nLast but not least, even the humble Python interpreter in your shell can be handy for quickly testing out code snippets and exploring a library, without the overhead of having to install and run a Jupyter Notebook.\n\n\n[Jupyter]: https://jupyter.org/\n[Google Colab]: https://research.google.com/colaboratory/\n[Kaggle Notebooks]: https://www.kaggle.com/docs/notebooks\n[WebAssembly]: https://webassembly.org/\n[GitHub Codespaces]: https://github.com/features/codespaces", + "metadata": { + "tags": [] + }, + "id": "f1125d5b-1a63-4df3-8abd-5a2680c9892e" + }, + { + "cell_type": "code", + "source": "", + "metadata": {}, + "execution_count": null, + "outputs": [], + "id": "aa04f308-346b-4549-8a48-ee5fc1f7432a" + } + ] +} \ No newline at end of file