16  StatCheck

16.1 What it checks

The stat_check module runs Statcheck on the paper. Statcheck recomputes the p-value from each reported test statistic and degrees of freedom, and flags cases where the reported p-value is inconsistent with the statistic — and, more seriously, where the inconsistency changes the significance decision (a decision error).

Metacheck only returns Statcheck results for t-tests and F-tests, the tests for which the approach has been validated (Nuijten et al., 2016). Statcheck only detects statistics written in APA format, which is mostly used in psychology journals.

This module is fully offline (Statcheck runs locally).

16.2 Running the module

paper <- demopaper()
module_run(paper, "stat_check")

StatCheck: 1 possible error in t-tests or F-tests

The table has one row per detected test, comparing reported_p to computed_p, with error and decision_error flags:

mo <- module_run(paper, "stat_check")
mo$traffic_light
#> [1] "red"
mo$summary_text
#> [1] "1 possible error in t-tests or F-tests"
mo$table[, c("test_type", "reported_p", "computed_p", "error", "decision_error")] |>
  knitr::kable()
test_type reported_p computed_p error decision_error
t 0.005 0.0046094 FALSE FALSE
t 0.152 0.0528594 TRUE FALSE

16.3 Running on many papers

Statcheck on a large corpus can take a while (the full psychsci set has more than 27,000 sentences with numbers). Here we run on a small subset:

mo <- module_run(psychsci[1:20], "stat_check")
head(mo$summary_table) |>
  knitr::kable()
paper_id statcheck_found statcheck_errors statcheck_decision_errors
0956797613520608 6 0 0
0956797614522816 39 0 0
0956797614527830 0 0 0
0956797614557697 7 0 0
0956797614560771 4 0 0
0956797614566469 0 0 0

16.4 A clean example and one with problems

The flagged rows are the ones to inspect — especially decision_error == TRUE, where the reported and recomputed p-values fall on opposite sides of .05:

mo <- module_run(psychsci[1:20], "stat_check")
err <- mo$table[mo$table$error %in% TRUE, c("raw", "reported_p", "computed_p", "decision_error")]
head(err) |>
  knitr::kable()
raw reported_p computed_p decision_error
106 t(48) = 2.43, p = .020 0.020 0.0188882 FALSE
160 F(1, 21) = 11.94, p < .001 0.001 0.0023684 FALSE
235 t(237) = 2.21, p = .014 0.014 0.0280615 FALSE
236 F(1, 361) = 3.59, p = .029 0.029 0.0589271 TRUE
237 F(1, 361) = 5.89, p < .01 0.010 0.0157159 FALSE

16.5 Options

stat_check takes only the paper argument.

Note

Statcheck treats p = .000 as an error (you should report p < .001), and may misflag one-sided tests, since it assumes two-sided p-values. Always read the flagged sentence before concluding there is a real error.

16.6 Validation

In a sample of 685 tests with 34 instances of inconsistent reporting, Statcheck correctly detected all 34 and incorrectly identified 26. So 0% of true instances were missed, and 43% of detections were false positives. See https://osf.io/preprints/psyarxiv/tcxaj_v1/ for the full validation.

Statcheck was developed by Michèle Nuijten and Sascha Epskamp.

Nuijten, M. B., Hartgerink, C. H. J., Assen, M. A. L. M. van, Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48(4), 1205–1226. https://doi.org/10.3758/s13428-015-0664-2