16 StatCheck

16.1 What it checks

The stat_check module runs Statcheck on the paper. Statcheck recomputes the p-value from each reported test statistic and degrees of freedom, and flags cases where the reported p-value is inconsistent with the statistic — and, more seriously, where the inconsistency changes the significance decision (a decision error).

Metacheck only returns Statcheck results for t-tests and F-tests, the tests for which the approach has been validated (Nuijten et al., 2016). Statcheck only detects statistics written in APA format, which is mostly used in psychology journals.

This module is fully offline (Statcheck runs locally).

16.2 Running the module

paper <- demopaper()
module_run(paper, "stat_check")

StatCheck: 1 possible error in t-tests or F-tests

The table has one row per detected test, comparing reported_p to computed_p, with error and decision_error flags:

mo <- module_run(paper, "stat_check")
mo$traffic_light

#> [1] "red"

mo$summary_text

#> [1] "1 possible error in t-tests or F-tests"

mo$table[, c("test_type", "reported_p", "computed_p", "error", "decision_error")] |>
  knitr::kable()

test_type	reported_p	computed_p	error	decision_error
t	0.005	0.0046094	FALSE	FALSE
t	0.152	0.0528594	TRUE	FALSE

16.3 Running on many papers

Statcheck on a large corpus can take a while (the full psychsci set has more than 27,000 sentences with numbers). Here we run on a small subset:

mo <- module_run(psychsci[1:20], "stat_check")
head(mo$summary_table) |>
  knitr::kable()

paper_id	statcheck_found	statcheck_errors	statcheck_decision_errors
0956797613520608	6	0	0
0956797614522816	39	0	0
0956797614527830	0	0	0
0956797614557697	7	0	0
0956797614560771	4	0	0
0956797614566469	0	0	0

16.4 A clean example and one with problems

The flagged rows are the ones to inspect — especially decision_error == TRUE, where the reported and recomputed p-values fall on opposite sides of .05:

mo <- module_run(psychsci[1:20], "stat_check")
err <- mo$table[mo$table$error %in% TRUE, c("raw", "reported_p", "computed_p", "decision_error")]
head(err) |>
  knitr::kable()

	raw	reported_p	computed_p	decision_error
106	t(48) = 2.43, p = .020	0.020	0.0188882	FALSE
160	F(1, 21) = 11.94, p < .001	0.001	0.0023684	FALSE
235	t(237) = 2.21, p = .014	0.014	0.0280615	FALSE
236	F(1, 361) = 3.59, p = .029	0.029	0.0589271	TRUE
237	F(1, 361) = 5.89, p < .01	0.010	0.0157159	FALSE

16.5 Options

stat_check takes only the paper argument.

Note

Statcheck treats p = .000 as an error (you should report p < .001), and may misflag one-sided tests, since it assumes two-sided p-values. Always read the flagged sentence before concluding there is a real error.

16.6 Validation

In a sample of 685 tests with 34 instances of inconsistent reporting, Statcheck correctly detected all 34 and incorrectly identified 26. So 0% of true instances were missed, and 43% of detections were false positives. See https://osf.io/preprints/psyarxiv/tcxaj_v1/ for the full validation.

Statcheck was developed by Michèle Nuijten and Sascha Epskamp.

Nuijten, M. B., Hartgerink, C. H. J., Assen, M. A. L. M. van, Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48(4), 1205–1226. https://doi.org/10.3758/s13428-015-0664-2