web scraping tweaks

stanfordjournalism · Apr 7, 2024 · 9b987fe · 9b987fe
1 parent 9cbca85
commit 9b987fe
Show file tree

Hide file tree

Showing 2 changed files with 5 additions and 5 deletions.
diff --git a/content/web_scraping/skip_scraping_cheat.ipynb b/content/web_scraping/skip_scraping_cheat.ipynb
@@ -78,7 +78,7 @@
     "- Clicking on the web request for the API call\n",
     "- Heading over to the `Headers` tab for the web request\n",
     "\n",
-    "In the information panel, you should see a downright awful URL. It contains a boatload of URL parameters after the `?` in the form `key=value` pairs, separated by ampersands (`&`). These are variables of sorts that instruct the API on what data to return. Normally, these parameters are configured by a web form filled out by a human visiting the website.\n",
+    "In the information panel, you should see a downright awful URL. It contains a boatload of URL parameters after the `?` in the form of `key=value` pairs, separated by ampersands (`&`). These are variables of sorts that instruct the API on what data to return. Normally, these parameters are configured by a web form filled out by a human visiting the website.\n",
     "\n",
     "If you look close, you may notice that the URL parameters include one particularly interesting morsel: `pageSize=20`\n",
     "\n",
@@ -103,7 +103,7 @@
     "\n",
     "There was no need to scrape the search page, fill out a form, get the results back, and then page through the search results, extracting data points from HTML along the way. If that sounds painful and error-prone, you have good instincts. It's a workable solution, but in this case it's total overkill.\n",
     "\n",
-    "Instead, we gave the site a phsyical exam (sorry, had to sneak one more in...) and realized that we could skip the scraping entirely and just grab the data.\n",
+    "Instead, we gave the site a [phsyical exam](dissecting_websites.ipynb) and realized that we could skip the scraping entirely and just grab the data.\n",
     "\n",
     "If you've never dissected a website like this before, all of the above likely seems like magic. It might even feel like this process would take just as long as writing a web scraper. But you'd be wrong. As you gain comfort with dissecting websites, the techniques described here will take you minutes -- perhaps even seconds -- on many sites.\n",
     "\n",

diff --git a/content/web_scraping/wysiwyg_scraping.ipynb b/content/web_scraping/wysiwyg_scraping.ipynb
@@ -15,7 +15,7 @@
     "\n",
     "Why bring this up?\n",
     "\n",
-    "Because it's a useful analogy for web pages. Back in the days of yore, many (perhaps most?) websites followed the WYSIWYG principle. These were simpler times, when the the content displayed on a web page closely matched the HTML in the underlying document for a page.\n",
+    "Because it's a useful analogy for web pages. Back in the days of yore, many (perhaps most?) websites followed the WYSIWYG principle. These were simpler times, when the content displayed on a web page closely matched the HTML in the underlying document for a page.\n",
     "\n",
     "If your web browser showed a table of data, it was quite likely that you'd find a `<table>` element somewhere in the page's HTML. \n",
     "\n",
@@ -424,7 +424,7 @@
     "        fields[6].text.strip()\n",
     "    ]\n",
     "    # Mash up the headers with the field values into a dictionary\n",
-    "    # - zip creates pairs each column header with the corresponding field in a two-element list\n",
+    "    # - zip pairs each column header with the corresponding field in a two-element list\n",
     "    # - dict transforms the list of column/value pairs into a dictionary\n",
     "    bank_data = dict(zip(column_names, field_values))\n",
     "    all_banks.append(bank_data)\n",
@@ -456,7 +456,7 @@
    "id": "d57695a8-f62b-4c6e-9615-0afdd73bc3c3",
    "metadata": {},
    "source": [
-    "Does that number match the count on the FDIC site? And in their download CSV?"
+    "Does that number match the count on the FDIC site? And in their downloadable CSV?"
    ]
   },
   {