6 Using the Paper Database Corpora

library(metacheck)

The scienceverse/papers GitHub repository hosts pre-built paperlist corpora – collections of open-access articles already converted to Metacheck’s internal format with GROBID, ready to load to run modules on a set of papers. The goal of these paper databases is to provide a set of open access scientific papers that modules can be validated against. These databases are useful when developing new modules in Metacheck, but also to test different automated screening tools like MEtacheck against each other. This vignette shows how to discover, download, and use these corpora, and demonstrates three fast text-based modules on real papers from each one.

6.1 Discovering available corpora

papers_available() queries the GitHub Releases API and lists every corpus currently published, along with its size and whether you already have a cached local copy.

papers_available()
#>           name                    tag size_mb cached
#> 1         ece3        ece3-2026-06-21    55.3  FALSE
#> 2        elife       elife-2026-06-21    51.9  FALSE
#> 3     openmind    openmind-2026-06-21    12.6  FALSE
#> 4    frontiers   frontiers-2026-06-21    35.7  FALSE
#> 5          jdm         jdm-2026-06-21    24.9   TRUE
#> 6        iperc       iperc-2026-06-21    12.4  FALSE
#> 7         scan        scan-2026-06-21    29.1  FALSE
#> 8          joc         joc-2026-06-20    15.6  FALSE
#> 9      natcomm     natcomm-2026-06-20    39.2  FALSE
#> 10 psychsci_oa psychsci_oa-2026-06-20     8.2  FALSE
#> 11        jssm        jssm-2026-06-20    25.5  FALSE
#> 12     plosone     plosone-2026-06-20    30.3  FALSE
#> 13     plosbio     plosbio-2026-06-19    40.5  FALSE
#> 14     bmcoral     bmcoral-2026-06-19    21.4  FALSE
#> 15        ijos        ijos-2026-06-19    22.0  FALSE
#> 16      bmcmed      bmcmed-2026-06-19    30.5  FALSE
#> 17     plosmed     plosmed-2026-06-19    30.2   TRUE
#> 18    collabra    collabra-2026-06-19    31.7  FALSE

6.2 Loading a corpus

papers_load() downloads the corpus and returns it as a paperlist object. By default (cache = FALSE), the file is downloaded to a temporary location and discarded once loaded – the right choice if you only need the corpus for a single session.

jdm <- papers_load("jdm")
length(jdm)

If you expect to use a corpus repeatedly across sessions – for example, while developing and testing a new module – set cache = TRUE so the file is saved to your user data directory and reused on subsequent calls instead of being re-downloaded every time:

jdm <- papers_load("jdm", cache = TRUE)

Calling papers_load("jdm", cache = TRUE) again later in the same or a different session will load instantly from the cached copy rather than contacting GitHub. Pass overwrite = TRUE if you want to force a fresh download (e.g. after a corpus has been updated):

jdm <- papers_load("jdm", cache = TRUE, overwrite = TRUE)

Use papers_remove() to delete a cached copy and free up disk space:

papers_remove("jdm")

6.3 Corpus metadata

The corpus .rds file itself is just a list of papers – it carries no metadata about how it was built, what it covers, or its license. That provenance lives in a metadata.json file alongside each corpus on the papers repository, following the Dublin Core metadata standard. papers_metadata() fetches and parses it for you:

meta <- papers_metadata("jdm")
meta$dc_title
#> [1] "Judgment and Decision Making - Complete Open-Access Corpus (2006-2022)"
meta$dc_coverage
#> [1] "2006-01-01/2022-12-31"
meta$dc_rights
#> [1] "CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)"

Conversion provenance (which GROBID version, which tool, when) is nested under dc_provenance:

meta$dc_provenance$conversion_tool
#> [1] "GROBID 0.8"
meta$dc_provenance$doi_matching
#> [1] "CrossRef title search + manual verification (2026-06-17)"

papers_metadata() only covers the structured fields above. For the full narrative – known gaps, data-quality caveats, exact sampling procedure – see each corpus’s README.md on the papers repository, or fetch it directly with github_readme("scienceverse/papers").

6.4 Three fast, local modules

The modules below only search the paper’s text with regular expressions – no external API calls, no LLM queries – so they run quickly even across hundreds of papers.

Module	What it checks
`stat_check`	Recomputes p-values from reported test statistics (t-tests, F-tests) and flags inconsistencies
`stat_effect_size`	Checks whether t-tests and F-tests are reported with an effect size
`stat_p_nonsig`	Surfaces non-significant p-values for manual review of their interpretation

All three return a list with $traffic_light (“green”, “yellow”, “red”, or “na”), $summary_text, and $table with row-level detail. See vignette("stats", package = "metacheck") for a fuller walkthrough of all five statistical-reporting modules using the bundled psychsci dataset.

6.4.1 `stat_check` on JDM

stat_check runs StatCheck on every t-test and F-test it finds and reports whether the stated p-value is mathematically consistent with the test statistic and degrees of freedom. The code below performs this check only on the first 20 papers in the paper database.

result <- module_run(jdm[1:20], "stat_check")
result$summary_text
#> [1] "6 possible errors in t-tests or F-tests"

We can see that errors can be small, but of course, statistics should be reported adequately.

# the inconsistent test statistic and the sentence it appeared in
result$table[result$table$error, c("raw", "computed_p", "text")]
#>                             raw   computed_p
#> 34      t(56) = -3.43, p < .001 1.139850e-03
#> 86  F(1, 160) = 2.89, p = 0.092 9.107401e-02
#> 92        t(39) = 7.1, p = .001 1.562904e-08
#> 93         t(44) = 4.7, p = .01 2.579395e-05
#> 109  F (1, 144) = 3.72, p = .05 5.573168e-02
#> 119  F (1, 144) = 4.12, p = .05 4.422070e-02
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             text
#> 34                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Indeed, there is a strong negative correlation between this ratio and course ratings (r = -.42; t(56) = -3.43, p < .001; see also model 3, Table 1).
#> 86                                                                                                                                                                                                                                                                                                                                                               There was also a nearly significant interaction between perspective and uncertainty frame, F(1, 160) = 2.89, p = 0.092, with the no-flu-shot option receiving particularly low evaluations (treatment receiving particularly high evaluations) when the group perspective was coupled with frequency information (see Table 2).
#> 92                                                                                                                                                                                                                                                                                                                                                                                                                                  Actors and observers both took larger risks (i.e., turned over more cards) than was rational, as revealed by onesample t-tests against the rational choice (i.e., 5 cards), t(39) = 7.1, p = .001 (CI: 1.34 -2.41) and t(44) = 4.7, p = .01 (CI: .58 -1.46).
#> 93                                                                                                                                                                                                                                                                                                                                                                                                                                  Actors and observers both took larger risks (i.e., turned over more cards) than was rational, as revealed by onesample t-tests against the rational choice (i.e., 5 cards), t(39) = 7.1, p = .001 (CI: 1.34 -2.41) and t(44) = 4.7, p = .01 (CI: .58 -1.46).
#> 109 Secondly, a significant Direction X Mode interaction was obtained, F (1, 144) = 3.72, p = .05, η² = .03, indicating that participants who were instructed to engage in upward evaluation showed a larger increase in persistence (M = +225.67 sec, SD = 360.40) than did those who were instructed to engage in upward reflection (M = +86.85 sec, SD = 317.94), F (1, 144) = 3.94, p = .05, d = .41, whereas those who were instructed to engage in downward reflection showed a larger increase in persistence (M = +26.01 sec, SD = 274.79) than did those who were instructed to engage in downward evaluation (M = -31.63 sec, SD = 246.61), although not significantly, F < 1, d =.22.
#> 119                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Consistent with this hypothesis, downward reflection elicited a larger increase in persistence under prevention framing than it did under promotion framing, F (1, 144) = 4.12, p = .05, d =.48.

We can repeat this analysis on a journal in medicine:

plosmed <- papers_load("plosmed")
length(plosmed)

result <- module_run(plosmed[1:20], "stat_check")
result$summary_text

We see why it is useful to compare how modules work across literatures. Statcheck only detects statistics written in APA format, but this reporting style is limited to psychological journals.

6.4.2 `stat_p_nonsig` across corpora

stat_p_nonsig finds every non-significant p-value (p >= .05) so a reader can check whether the surrounding text correctly avoids interpreting it as evidence of no effect. The module makes no judgment on its own – it only surfaces the sentences.

result_nonsig <- module_run(plosmed, "stat_p_nonsig")
result_effect_size <- module_run(plosmed, "stat_effect_size")

result_nonsig$summary_text
#> [1] "We found 2483 non-significant p values that should be checked for appropriate interpretation."
result_effect_size$summary_text
#> [1] "We found 51 t-tests and/or F-tests where effect sizes are not reported. Check these tests in the table below, and consider adding effect sizes"

We see that these modules identify some tests in a medical journal, but medical journals might also rely much more on other analyses than t-tests and F-tests. This highlights how we need to validate modules across domains, and clearly communicate the domain within which modules are validated.