Skip to content

arxiv-checker

Latest
Compare
Choose a tag to compare
@adamdempsey90 adamdempsey90 released this 01 Jul 20:01
· 25 commits to master since this release

arxiv-checker

A Python module to search arXiv.org. You can cross check a list of authors or keywords against either the most recent arXiv mailing or a given month/year. Works with python 2.7 and python 3.

To install::

pip install arxiv-checker

To check the most recent mailings of several arXiv subjects against a list of authors::

import arxivchecker
papers = arxivchecker.check_authors(['astro-ph', 'gr-qc', 'physics'], ['Smith, John', 'Doe, Jane'])

When displayed, each paper lists the title, a clickable url link to the abstract, and the author list.

To check against a long list of names, use a file::

papers = arxivchecker.check_authors(['astro-ph', 'gr-qc', 'physics'], 'names.txt')

If instead of names, you want to check each paper against a list of keywords in the title and abstract use::

papers = arxivchecker.check_keywords('astro-ph', ['Planet Formation','Hot Jupiter'])

You can also grab all of the papers first using the scrape_arxiv function::

papers = arxivchecker.scrape_arxiv('astro-ph')
results = arxivchecker.check_authors_from_papers(papers, 'Doe, Jane')

Similarly, for checking keywords::

papers = arxivchecker.scrape_arxiv('astro-ph')
results = arxivchecker.check_keywords_from_papers(papers, ['GJ876','Gilese-876'])

If you want to grab all of the papers from a given month you can supply the year and month arguments::

papers = arxivchecker.scrape_arxiv('astro-ph',year=2016,month=6) # June 2016
papers = arxivchecker.scrape_arxiv('astro-ph',year=2016,month='May') # May 2016

Or grab all of the papers for a given year::

papers = arxivchecker.scrape_arxiv('astro-ph',year=2016,month='all')

Note however that this can take a while to complete (there could be more than 10,000 papers), and arXiv discourages against crawling through the website.

Finally, to run straight from the command line::

python -c "import arxivchecker; arxivchecker.check_authors(['astro-ph', 'gr-qc', 'physics'], 'authors.txt') > results.txt

The arxivchecker requires the request and bs4 modules.