Skip to content

gsznaier/codeforce_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Following code forked and modified from shashank-sharma's codeforce scraper.

I added a new script called CodeForceDataConstuctor.py, which creates a data set from the code. By using the following command, python ./CodeForceDataConstuctor.py -crawl=no, the script will not use the scrapy to crawl for the problem set ids. By using the following command ./CodeForceDataConstuctor.py -crawl=yes, the script will use scrapy to crawl for the problems ids.

Note: Crawling for the problem set ids takes a long time so it is best to this only once!

codeforces-scraper

Scrapy spider to scrape codeforces site and get all the successful submission for one particular language. Also to scrape top rated users for each particular language.

Spider location: cfspider/spiders/cf.py

Python verison: Python 3.6

How it works ?

At first it makes one request to the given URL (Example: http://codeforces.com/problemset/status/1/problem/A/) with appropriate contestId and index which was fetched from codeforces API.

After that it fills out the form to make sure that the result have solutions which are ACCEPTED and the language which was given by user. After this it goes through each page and fetch all submissions id and yield them in proper format. Page limit size can be set in program.

To get data in JSON format run

scrapy crawl cfSpider -o data.json

And it will save the data in data.json file.

Example: Image showing successful submission of Python 3 language which are accepted. Here page limit was set to 4.

Requirements

To run this on your local machine just create one virtual environment and clone this repository and then:

pip install Scrapy

And then you can run it successfully. If yoi find any issue feel free to create one here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published