A weekly basket with the latest published research in political science. On Fridays at 2 AM UTC, we query the Crossref API for new research articles that appeared in the previous 7 days across 61 journals in political science and adjacent fields.
Available tools to keep up with newly published research are often frustrating. Email alerts from publishers clutter the email inbox, arrive at seemingly random intervals, and do not include abstracts. Publisher RSS feeds are similarly frustrating to use as available RSS readers are either clunky or come with expensive subscription models. Signing up to email alerts or finding the RSS feeds from a handful of publishers can easily take an entire afternoon. Twitter/X - uhm.
Moritz Marbach built Paper Picnic to keep up with newly published research in Political Science. It relies on three key ideas:
All data comes from the Crossref API. Crossref is the world’s largest registry of Digital Object Identifiers (DOIs) and metadata. Continuously updated by publishers, Crossref provides an easy way to get metadata for research articles.
The backend is a crawler written in R living in a GitHub repository. Every Friday, GitHub Actions executes the crawler. Once the crawler finishes, the crawled data is put in a JSON file and rendered into a HTML file using GitHub Pages.
For each journal, the crawler retrieves all articles added to a journal in the previous week. To that end, it requests all articles for which the field “created” or “published” in the Crossref database is within the last seven days.
The crawler retrieves title, authors, full-text link, and abstract. Unfortunately, not all publishers add abstracts. Examples include the publisher Elsevier or Taylor & Francis, which for all of their journals never include abstracts (see this Crossref Blog for details).
Since journals typically have two ISSN numbers (one for print and one for electronic, see here), the crawler retrieves articles for both ISSN numbers and deduplicates the results. The ISSN numbers used for the crawler come from the Crossref lookup tool.
Once an article has been crawled, its unique identifier (the DOI) is added to a list. This list is checked by the crawler at every runtime. Only articles that the crawler has not seen before are included in the data update. This ensures that articles appearing first online and then again in print are only included once on Paper Picnic.
When the title is generic, e.g., when it includes the word “Errata”, “Frontmatter” or “Backmatter”, the crawler adds a filter tag. For articles from multidisciplinary journals, the crawler prompts GPT-4o mini: “You are given content from a new issue of a multidisciplinary scientific journal. Respond ‘Yes’ if the content is a research article in any social science discipline and ‘No’ otherwise”. All content that includes this filter tag is hidden in the default view but can be displayed by clicking on the +N button at the top left for every journal.
Find and fix bugs or add new features to the crawler/web page. GitHub repository: github.com/sumtxt/paper-picnic.
Use the crawled data for your own tool:
Build a better (and equally open source) version of this page.
Support The Initiative for Open Abstracts.