diff --git a/content/web_scraping/skip_scraping_cheat.ipynb b/content/web_scraping/skip_scraping_cheat.ipynb index c01c459..bac98f5 100644 --- a/content/web_scraping/skip_scraping_cheat.ipynb +++ b/content/web_scraping/skip_scraping_cheat.ipynb @@ -78,7 +78,7 @@ "- Clicking on the web request for the API call\n", "- Heading over to the `Headers` tab for the web request\n", "\n", - "In the information panel, you should see a downright awful URL. It contains a boatload of URL parameters after the `?` in the form `key=value` pairs, separated by ampersands (`&`). These are variables of sorts that instruct the API on what data to return. Normally, these parameters are configured by a web form filled out by a human visiting the website.\n", + "In the information panel, you should see a downright awful URL. It contains a boatload of URL parameters after the `?` in the form of `key=value` pairs, separated by ampersands (`&`). These are variables of sorts that instruct the API on what data to return. Normally, these parameters are configured by a web form filled out by a human visiting the website.\n", "\n", "If you look close, you may notice that the URL parameters include one particularly interesting morsel: `pageSize=20`\n", "\n", @@ -103,7 +103,7 @@ "\n", "There was no need to scrape the search page, fill out a form, get the results back, and then page through the search results, extracting data points from HTML along the way. If that sounds painful and error-prone, you have good instincts. It's a workable solution, but in this case it's total overkill.\n", "\n", - "Instead, we gave the site a phsyical exam (sorry, had to sneak one more in...) and realized that we could skip the scraping entirely and just grab the data.\n", + "Instead, we gave the site a [phsyical exam](dissecting_websites.ipynb) and realized that we could skip the scraping entirely and just grab the data.\n", "\n", "If you've never dissected a website like this before, all of the above likely seems like magic. It might even feel like this process would take just as long as writing a web scraper. But you'd be wrong. As you gain comfort with dissecting websites, the techniques described here will take you minutes -- perhaps even seconds -- on many sites.\n", "\n", diff --git a/content/web_scraping/wysiwyg_scraping.ipynb b/content/web_scraping/wysiwyg_scraping.ipynb index 2d71733..3b72e53 100644 --- a/content/web_scraping/wysiwyg_scraping.ipynb +++ b/content/web_scraping/wysiwyg_scraping.ipynb @@ -15,7 +15,7 @@ "\n", "Why bring this up?\n", "\n", - "Because it's a useful analogy for web pages. Back in the days of yore, many (perhaps most?) websites followed the WYSIWYG principle. These were simpler times, when the the content displayed on a web page closely matched the HTML in the underlying document for a page.\n", + "Because it's a useful analogy for web pages. Back in the days of yore, many (perhaps most?) websites followed the WYSIWYG principle. These were simpler times, when the content displayed on a web page closely matched the HTML in the underlying document for a page.\n", "\n", "If your web browser showed a table of data, it was quite likely that you'd find a `