Automated assessment of scientific papers


scienceverse.github.io/talks/2024-EAISI-papercheck/

Lisa DeBruine @debruine

Abstract

Researchers are increasingly aware of the need to share all aspects of the research cycle, from pre-registered analysis plans and study materials to the data and analysis code that produce the reported results. However, much of this digital information is in a format that makes it difficult to find, access, and reuse. Additionally, best practices are evolving at a pace that is difficult for researchers to keep up with. In this talk, I will discuss the potential for automated checks of scientific papers to address these problems, and introduce {papercheck}, an application that combines simple regular expression text searching, R code, machine learning, and generative AI to assess scientific papers. This tool can be used to suggest improvements pre-publication, or to more efficiently conduct meta-scientific research on large numbers of papers.

The Problem

Best Practices are Rapidly Evolving

Un-FAIR Meta-Data

  • All research outputs should be FAIR
  • PDFs are where data goes to die
  • Meta-data use cases:
    • facilitating meta-analyses
    • improving the re-use of reliable measures
    • meta-scientific research

Solutions

Checklists?

Reporting guidelines, such as CONSORT, PRISMA, and JARS often provide extensive checklists.

  • Time-consuming
  • Requires expertise
  • Can be vague
  • Who checks the checklist?

Automated Checks

  • Time-efficient
  • Requires less expertise
  • Reproducible
  • Generates machine-readable metadata

Automation Strategies

Grobid: A machine learning software for extracting structured information from scholarly documents

And then…

Text Search

Code

Machine Learning

AI

R Package

Paper Import

file <- demopdf()
xml <- pdf2grobid(file)
paper <- read_grobid(xml)
---------------
to_err_is_human
---------------

* Sections: 4
* Sentences: 24
* References: 2
* Citations: 2

Batch Import

dir <- demodir()
papers <- read_grobid(dir)
--------
eyecolor
--------

* Sections: 8
* Sentences: 93
* References: 21
* Citations: 22
------
incest
------

* Sections: 7
* Sentences: 56
* References: 4
* Citations: 14
------
prereg
------

* Sections: 10
* Sentences: 180
* References: 23
* Citations: 31

ChatGPT

papers |> 
  search_text(section = "method") |>
  gpt(query = "How many subjects are in this study?")
id answer cost
eyecolor.xml There were 150 women and 150 men, making a total of 300 subjects in this study. 0.000598
incest.xml There were a total of 1998 participants in this study. 0.000551

Modules

module_list()
 * ai-summarise: Generate a 1-sentence summary for each section
 * all-p-values: List all p-values in the text, returning the matched text (e.g., 'p = 0.04') and document location in a table.
 * all-urls: List all the URLs in the main text
 * imprecise-p: List any p-values reported with insufficient precision (e.g., p < .05 or p = n.s.)
 * marginal: List all sentences that describe an effect as 'marginally significant'.
 * osf-check: List all OSF links and whether they are open, closed, or do not exist.
 * ref-consistency: Check if all references are cited and all citations are referenced
 * retractionwatch: Flag any cited papers in the RetractionWatch database
 * sample-size-ml: [DEMO] Classify each sentence for whether it contains sample-size information, returning only sentences with probable sample-size info.
 * statcheck: Check consistency of p-values and test statistics

Modules: StatCheck

This module uses the {statcheck} package to check the consistency of p-values and test statistics.

module_run(paper, "statcheck")

StatCheck

We detected possible errors in test statistics

reported_p computed_p raw error decision_error one_tailed_in_txt apa_factor
0.005 0.0046094 t(97.7) = 2.9, p = 0.005 FALSE FALSE FALSE 1
0.152 0.0528594 t(97.2) = -1.96, p = 0.152 TRUE FALSE FALSE 1

Modules: Imprecise P-Values

This module scans the text for all p-values and flags those reported inexactly, such as p < .01, p < .10, or p = n.s.

module_run(paper, "imprecise-p")

Imprecise P-Values

You may have reported some imprecise p-values

text section header div p s
p > .05 results Results 3 2 2

Modules: Marginal Significance

This module searches the text for phrases such as “marginally significant” or “borderline significance” and flags these as inappropriate descriptions.

module_run(paper, "marginal")

Marginal Significance

You described effects as marginally/borderline/close to significant. It is better to write ‘did not reach the threshold alpha for significance’.

text section header div p s
The paper shows examples of (1) open and closed OSF links; (2) citation of retracted papers; (3) missing/mismatched citations and references; (4) imprecise reporting of p-values; and (5) use of “marginally significant” to describe non-significant findings. abstract Abstract 0 1 3
On average researchers in the experimental condition found the app marginally significantly more useful (M = 5.06) than researchers in the control condition found the checklist (M = 4.5), t(97.2) = -1.96, p = 0.152. results Results 3 2 1

Modules: Inaccessible Resources

This module scans text for references to OSF projects and checks their status, flagging users if any of the links are either broken or lead to inaccessible private projects.

module_run(paper, "osf-check")

We detected closed OSF links

text section header div p s status
osf.io/5tbm9 method Method and Participants 2 1 2 closed
https://osf.io/5tbm9 results Results 3 1 1 closed
https://osf.io/629bx results Results 3 1 1 open

Modules: Reference Consistency

This modules checks for missing references or citations.

module_run(paper, "ref-consistency")

Reference Consistency

This module relies on Grobid correctly parsing the references. There may be some false positives. There are references that are not cited or citations that are not referenced

bib_id doi ref missing
b1 10.0000/0123456789 NA citation
(Smithy, 2020) NA From a human factors perspective, human error is a symptom of a poor design (Smithy, 2020). reference

Modules: Retracted Papers

This module searches the RetractionWatch database for all cited references in a paper and flags those that have been retracted.

module_run(paper, "retractionwatch")

RetractionWatch

You cited some papers in the Retraction Watch database; double-check that you are acknowledging their retracted status when citing them.

bib_id doi ref
b0 10.1177/0956797614520714 NA

Shiny App

Modules

Reports

ChatGPT

Promoting Adoption

Center for Open Science

Workflows

Individual

Automated

Meta-Science

Systemic

Thank You!

papercheck

scienceverse

@debruine