{"id":272,"date":"2026-02-27T16:41:21","date_gmt":"2026-02-27T08:41:21","guid":{"rendered":"http:\/\/nicole.wordplayer.top\/?page_id=272"},"modified":"2026-03-06T09:12:14","modified_gmt":"2026-03-06T01:12:14","slug":"cambridge-past-paper-web-scraper","status":"publish","type":"page","link":"http:\/\/nicole.wordplayer.top\/index.php\/cambridge-past-paper-web-scraper\/","title":{"rendered":"Cambridge Past Paper Web Scraper"},"content":{"rendered":"\n<div class=\"wp-block-file\"><a href=\"http:\/\/nicole.wordplayer.top\/wp-content\/uploads\/2026\/03\/tool_windows.zip\" class=\"wp-block-file__button wp-element-button\" download>Download in windows<\/a><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">(Mac&#8217;s version is too large&#8230;)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>## Activity type<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Personal pursuit (eg Art \/ Computing \/ Making \/ Music \/ Drama \/ Sport)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>## Description<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Developed an automated web scraping tool that earned me the Stand Up Award, my school&#8217;s highest honor. The project was inspired by a friend learning Python and addressed a significant pain point: manually downloading past papers from Cambridge&#8217;s official website was inefficient, requiring unstable network connections, VPN access, and tedious manual file organization.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Built a terminal-based Python scraper with extensive customization options. Users can select their curriculum level (IGCSE or A Level), specify subjects, choose years and paper types (Paper 1, Paper 2, etc.), and download exam papers, mark schemes, or audio MP3 files. Implemented multi-threading technology for rapid downloads and created an automatic classification system that organizes files by subject and year into corresponding folders without any manual intervention.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-style-default\"><img loading=\"lazy\" decoding=\"async\" width=\"697\" height=\"483\" src=\"http:\/\/nicole.wordplayer.top\/wp-content\/uploads\/2026\/02\/\u622a\u5c4f2026-02-27-16.34.20-1.png\" alt=\"\" class=\"wp-image-280\" srcset=\"http:\/\/nicole.wordplayer.top\/wp-content\/uploads\/2026\/02\/\u622a\u5c4f2026-02-27-16.34.20-1.png 697w, http:\/\/nicole.wordplayer.top\/wp-content\/uploads\/2026\/02\/\u622a\u5c4f2026-02-27-16.34.20-1-300x208.png 300w\" sizes=\"auto, (max-width: 697px) 100vw, 697px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"697\" height=\"483\" data-id=\"279\" src=\"http:\/\/nicole.wordplayer.top\/wp-content\/uploads\/2026\/02\/\u622a\u5c4f2026-02-27-16.34.48-1.png\" alt=\"\" class=\"wp-image-279\" srcset=\"http:\/\/nicole.wordplayer.top\/wp-content\/uploads\/2026\/02\/\u622a\u5c4f2026-02-27-16.34.48-1.png 697w, http:\/\/nicole.wordplayer.top\/wp-content\/uploads\/2026\/02\/\u622a\u5c4f2026-02-27-16.34.48-1-300x208.png 300w\" sizes=\"auto, (max-width: 697px) 100vw, 697px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"697\" height=\"483\" data-id=\"278\" src=\"http:\/\/nicole.wordplayer.top\/wp-content\/uploads\/2026\/02\/\u622a\u5c4f2026-02-27-16.35.10-1.png\" alt=\"\" class=\"wp-image-278\" srcset=\"http:\/\/nicole.wordplayer.top\/wp-content\/uploads\/2026\/02\/\u622a\u5c4f2026-02-27-16.35.10-1.png 697w, http:\/\/nicole.wordplayer.top\/wp-content\/uploads\/2026\/02\/\u622a\u5c4f2026-02-27-16.35.10-1-300x208.png 300w\" sizes=\"auto, (max-width: 697px) 100vw, 697px\" \/><\/figure>\n<\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">One week after completion, faced a major setback when Cambridge&#8217;s website\u2014which rarely updates\u2014underwent a complete redesign, breaking my custom-built scraper entirely. Despite initial frustration, I persisted through the challenge, spending an entire week analyzing the new website structure, rewriting scraping paths and logic, and successfully restoring full functionality.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Dedicated extensive effort to this project, regularly working from 4 PM after school until 3 AM for multiple consecutive days. This intensive development process significantly strengthened my problem-solving abilities and resilience. My Computer Science teacher became an enthusiastic user of the tool, validating its practical value for the school community.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>## Skills<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8211; [x] Critical thinking<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8211; [x] Planning<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8211; [ ] Artistic skills<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8211; [ ] Communication<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8211; [ ] Teamwork<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8211; [ ] Leadership<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8211; [x] Problem solving<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8211; [x] Creativity \/ Innovation<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8211; [x] Independence<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8211; [x] Adaptability \/ Resilience<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8211; [ ] Risk-taking \/ Courage<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8211; [x] Inquisitiveness<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>## Date started<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Month: 3<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Year: 2025<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>## Date finished<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Month: 5<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Year: 2025<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>## Referee<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Computer Science Teacher<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(Mac&#8217;s version is too large&#8230;) ## Activity type Personal pursuit (eg Art \/ Computing \/ Making \/ Music \/ Drama \/ Sport) ## Description Developed an automated web scraping tool that earned me the Stand Up Award, my school&#8217;s highest honor. The project was inspired by a friend learning Python and addressed a significant pain [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-272","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/nicole.wordplayer.top\/index.php\/wp-json\/wp\/v2\/pages\/272","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/nicole.wordplayer.top\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/nicole.wordplayer.top\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/nicole.wordplayer.top\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/nicole.wordplayer.top\/index.php\/wp-json\/wp\/v2\/comments?post=272"}],"version-history":[{"count":8,"href":"http:\/\/nicole.wordplayer.top\/index.php\/wp-json\/wp\/v2\/pages\/272\/revisions"}],"predecessor-version":[{"id":399,"href":"http:\/\/nicole.wordplayer.top\/index.php\/wp-json\/wp\/v2\/pages\/272\/revisions\/399"}],"wp:attachment":[{"href":"http:\/\/nicole.wordplayer.top\/index.php\/wp-json\/wp\/v2\/media?parent=272"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}