mo <- module_run(paper, "stat_check")
mo$traffic_light#> [1] "red"
mo$summary_text#> [1] "1 possible error in t-tests or F-tests"
| test_type | reported_p | computed_p | error | decision_error |
|---|---|---|---|---|
| t | 0.005 | 0.0046094 | FALSE | FALSE |
| t | 0.152 | 0.0528594 | TRUE | FALSE |
The stat_check module runs Statcheck on the paper. Statcheck recomputes the p-value from each reported test statistic and degrees of freedom, and flags cases where the reported p-value is inconsistent with the statistic — and, more seriously, where the inconsistency changes the significance decision (a decision error).
Metacheck only returns Statcheck results for t-tests and F-tests, the tests for which the approach has been validated (Nuijten et al., 2016). Statcheck only detects statistics written in APA format, which is mostly used in psychology journals.
This module is fully offline (Statcheck runs locally).
paper <- demopaper()
module_run(paper, "stat_check")StatCheck: 1 possible error in t-tests or F-tests
The table has one row per detected test, comparing reported_p to computed_p, with error and decision_error flags:
mo <- module_run(paper, "stat_check")
mo$traffic_light#> [1] "red"
mo$summary_text#> [1] "1 possible error in t-tests or F-tests"
| test_type | reported_p | computed_p | error | decision_error |
|---|---|---|---|---|
| t | 0.005 | 0.0046094 | FALSE | FALSE |
| t | 0.152 | 0.0528594 | TRUE | FALSE |
Statcheck on a large corpus can take a while (the full psychsci set has more than 27,000 sentences with numbers). Here we run on a small subset:
The flagged rows are the ones to inspect — especially decision_error == TRUE, where the reported and recomputed p-values fall on opposite sides of .05:
| raw | reported_p | computed_p | decision_error | |
|---|---|---|---|---|
| 106 | t(48) = 2.43, p = .020 | 0.020 | 0.0188882 | FALSE |
| 160 | F(1, 21) = 11.94, p < .001 | 0.001 | 0.0023684 | FALSE |
| 235 | t(237) = 2.21, p = .014 | 0.014 | 0.0280615 | FALSE |
| 236 | F(1, 361) = 3.59, p = .029 | 0.029 | 0.0589271 | TRUE |
| 237 | F(1, 361) = 5.89, p < .01 | 0.010 | 0.0157159 | FALSE |
stat_check takes only the paper argument.
Statcheck treats p = .000 as an error (you should report p < .001), and may misflag one-sided tests, since it assumes two-sided p-values. Always read the flagged sentence before concluding there is a real error.
In a sample of 685 tests with 34 instances of inconsistent reporting, Statcheck correctly detected all 34 and incorrectly identified 26. So 0% of true instances were missed, and 43% of detections were false positives. See https://osf.io/preprints/psyarxiv/tcxaj_v1/ for the full validation.
Statcheck was developed by Michèle Nuijten and Sascha Epskamp.