Crawl cambridge a level papers from papers.gceguide.com
-
Amend
subject.json
orsubject_dict
to add additional subjects -
Crawl delay is set to minimum value of 30 seconds according to robots.txt from the website.
-
Papers are saved in current working folder by default. If needed,
save_path
may be modified.
requests
pyqt5
(essential for gui version)
- git clone
https://github.com/luke-tangh/a-level-paper-downloader.git
or downloadlite_downloader.py
- run
lite_downloader.py
- input
subject
andyear
- wait for download to complete
- git clone
https://github.com/luke-tangh/a-level-paper-downloader.git
- run
main.pyw
- modify
Subject
,Year
andType
and click download - wait for download to complete
qp - question paper
ms - mark scheme
gt - grade threshold
ci - confidential information
- 9231 Mathematics Further
- 9489 History
- 9608 Computer Science (old)
- 9618 Computer Science (new)
- 9696 Geography
- 9700 Biology
- 9701 Chemistry
- 9702 Physics
- 9708 Economics
- 9709 Mathematics
- 9990 Psychology
Retrieved from https://www.gceguide.com/robots.txt
User-agent: *
Crawl-delay: 30
Disallow:
Disallow: /wp-includes/*
Disallow: /Books/*
Disallow: /assets/*
Disallow: /files/*
Disallow: /draft/*
Disallow: /wp-content/*