Motivation

Available tools to keep up with newly published research are often frustrating. Email alerts from publishers clutter the email inbox, arrive at seemingly random intervals, and do not include abstracts. Publisher RSS feeds are similarly frustrating to use as available RSS readers are either clunky or come with expensive subscription models. Signing up to email alerts or finding the RSS feeds from a handful of publishers can easily take an entire afternoon. Twitter/X - uhm.

Moritz Marbach built Paper Picnic to keep up with newly published research in Political Science. It relies on three key ideas:

Updates once a week at a known time.
Displays all new research on a single web page without clutter.
No registration, no ads and no personal data collection.

All data comes from the Crossref API. Crossref is the world’s largest registry of Digital Object Identifiers (DOIs) and metadata. Continuously updated by publishers, Crossref provides an easy way to get metadata for research articles.

Backend

The backend is a crawler written in R living in a GitHub repository. Every Friday, GitHub Actions executes the crawler. Once the crawler finishes, the crawled data is put in a JSON file and rendered into a HTML file using GitHub Pages.

For each journal, the crawler retrieves all articles added in the previous week. It does this by fetching all articles from the Crossref database where the “created” or “published” field is within the last 14 days. The generous overlap across weekly crawls is due to the occasional delay between when a record is created and when it becomes available to the crawler via the API. Once an article is crawled, its unique identifier (the DOI) is added to a list. This list is checked by the crawler at every runtime. Only articles that the crawler has not seen before are included in the data update. This ensures that articles from the previous week’s crawl are not included again, despite the overlap in the crawl period. It also ensures that articles appearing first online and then in print are only included once on Paper Picnic.

The crawler retrieves title, authors, full-text link, and abstract. Unfortunately, not all publishers add abstracts. Examples include the publisher Elsevier or Taylor & Francis, which for all of their journals never include abstracts (see this Crossref Blog for details).

Since journals typically have two ISSN numbers (one for print and one for electronic, see here), the crawler retrieves articles for both ISSN numbers and deduplicates the results. The ISSN numbers used for the crawler come from the Crossref lookup tool.

When the title is generic, e.g., when it includes the word “Errata”, “Frontmatter” or “Backmatter”, the crawler adds a filter tag. For articles from multidisciplinary journals, the crawler prompts GPT-4o mini: “You are given content from a new issue of a multidisciplinary scientific journal. Respond ‘Yes’ if the content is a research article in any social science discipline and ‘No’ otherwise”. All content that includes this filter tag is hidden in the default view but can be displayed by clicking on the +N button at the top left for every journal.

Data on the number of retrieved articles per week and journal is available here.

Contribute

Find and fix bugs or add new features to the crawler/web page. GitHub repository: github.com/sumtxt/paper-picnic.
Use the crawled data for your own tool:
Build a better (and equally open source) version of this page.
Support The Initiative for Open Abstracts.

Paper Picnic 🧺

Main Journals

Motivation

Backend

Contribute