1 Introducing metacheck

An Automated Tool to Check for Best Practices in Scientific Articles

1.1 The Problem

Metascientific research reveals substantial opportunities to improve how scientists design studies, report results, and implement open science practices. For example, researchers often use invalid and unreliable measures (Goos et al., 2024), misinterpret non-significant results (Aczel et al., 2018), write insufficiently specific preregistrations (Akker et al., 2024), make mistakes in power analyses (Thibault et al., 2024), and misuse Bayes factors (Tendeiro et al., 2024).

Despite the growing availability of best practices, their adoption remains slow (Sharpe, 2013). While scientists are responsible for staying informed, time constraints and the rapid pace of new developments pose significant challenges. Peer reviewers face similarly have a lack of time, and may overlook the absence of best practices. Checklists have been proposed to promote adherence, but in practice they have limited impact (Dexter & Shafer, 2017), as researchers might be unaware of them, and even when used, evaluating adherence still requires considerable expertise.

1.2 A Solution

Human factors research suggests that automation, where researchers stay in the loop, can offer a partial solution to automatically check if best practices are followed. For example, algorithms could detect passages that describe an a priori power analysis, and whether researchers check whether it is fully reported and specifies the alpha level, desired power, the effect size metric, and a justification for the effect size (Lakens, 2022).

Algorithms are also useful for automating straightforward but time consuming tasks. For example, a reviewer could manually check if all p-values in a manuscript are reported exactly (e.g., p = 0.007 instead of p < .05), and check whether authors unknowingly cite retracted papers . However, these tasks can be easily automated (as is for example done by the reference manager Zotero).

Recent progress in machine learning and artificial intelligence has made it increasingly viable to implement automated checks with sufficient accuracy, supporting broader adoption and adherence to best practices in scientific manuscripts. For example, the GROBID machine learning library (GROBID, 2008--2025) can turn scientific PDFs into structured text files from which specific content can be extracted. While many automated checks for best practices will require only simple text matching based on regular expression, more complex text extraction is possible by using large language models (LLMs).

1.3 Our Values

The values below guide our priorities and decision-making for all Scienceverse projects, including Metacheck.

Quality Control: The knowledge, data, code and software we create will be verified for accuracy, using publicly available methods and measures, such that users have enough context to interpret the results.

Transparency: We are committed to open source software and open access datasets for validation.

AI Optional: The use of large language models (LLMs) will be restricted to classification of existing text, not evaluation of the quality of practice. The use of LLMs will always be opt-in and transparently declared. We will prioritise non-LLM functions where possible, and limit use to cases where it provides substantial benefits that cannot be realised with other methods, such as regular expressions.

No Automated Evaluation: While our software is designed to assist the evaluation of research by highlighting aspects where practice may be improved, we will never support automated quality assessments, rankings or scoring.

Data Privacy: We will be fully transparent about what data we have access to and how those data will be used, as well as what will be shared with external services. Wherever possible, we will prioritise functions that share minimal data and services that have robust data protection. We will meet all EU data privacy regulations.

Accessibility: We will provide our knowledge, data, code and software to all potential users with minimal barriers to access. We will prioritise access that is not unduly limited by wealth, disability or technical knowledge.

Community Engagement: Our work will reflect and respect the needs of a diverse range of research communities, and benefit their research practices. We are open to critique from the research community.

Sustainability: We are committed to the long-term viability of our work. We will actively protect and future-proof our work through technical maintenance, governance structures and strategic planning to make it a durable and relevant public good.

Fair Recognition and Reward: Contributions to knowledge, data, code, and software will be uniquely and persistently identified, and benefit a diverse range of contributors.

1.4 Metacheck

Metacheck Logo, a green hexagon with METACHECK in art deco font

Metacheck is modular, with each module focused on a specific practice that the scientific community wants to improve, and for which automation is viable. This modularity allows the tool to be extended and customized by the community, enabling the development of specialized modules tailored to the standards and best practices of different scientific fields. It can also offer an overarching platform to integrate existing tools (e.g., checks for retracted articles, Statcheck, etc).

The automated checks can be performed as part of an individual or metascientific workflow. In the individual workflow, an author or editor can run selected modules on a single paper and receive a report highlighting potential areas for improvement and explanations or links to further resources. In the metascientific workflow, a single module can be run on a batch of hundreds of papers, producing tabular data that can be used to address metascientific questions, such as how prevalent a practice is in a specific field. For the individual workflow our philosophy is that error rates can be higher than in the meta-scientific workflow, as the user will check whether to implement each recommendation or not. For a tool to be used in a metascientific workflow, the module needs to be shown to have low error rates when run on manually coded ground truth files.

1.5 Future Plans

There are more possible modules than our team can create, and we welcome anyone who wants to collaborate with us on building and validating new modules. Future modules could be developed to check if open data and materials are Findable, Accessible, Interoperable, and Reusable (FAIR) (Wilkinson et al., 2016), to perform checks on specific types of articles (e.g., meta-analyses), check journal-specific reporting guidelines, check for mis-citations, develop checks for meta-analyses and systematic reviews, etc. We see great potential for new modules; the only limitation is your own creativity and your engineering skills.

We are excited to continue developing Metacheck in close collaboration with the broader scientific community and welcome feedback, suggestions, and contributions. You can explore Metacheck on GitHub: https://scienceverse.github.io/metacheck/. If you want to contribute to new Metacheck modules, validate existing modules in your own scientific discipline, or explore the use of Metacheck for metascientific projects, reach out to Lisa DeBruine or Daniel Lakens.

1.6 Metacheck Team

Metacheck is developed by a collaborative team of researchers, consisting of Lisa DeBruine (developer and maintainer, UofG), Daniël Lakens (developer, TU/e), Cristian Mesquida (postdoctoral researcher, TU/e), Jakub Werner (RA, TUE), Lavinia Ion (RA, TU/e), Hadeel Khawatmy (RA, TU/e), Levi Baruch (RA, TU/e), Mink Veltman (RA, TU/e) René Bekkers (collaborator and PI of Transparency Check, VUE), and Max Littel (RA, VUE).

Metacheck was initially developed by Lisa and Daniël in 2024 during Lisa’s visiting professorship at the Eindhoven Artificial Intelligence Systems Institute (EAISI).

Metacheck is partly financed by the Dutch Research Council (NWO) via VICI grant VI.C.241.013 to Daniel Lakens, the Thematic Digital Competence Centre Social Sciences & Humanities grant ICT.001.TDCC.018 awarded to René Bekkers and Daniel Lakens, and the Ammodo Science Award 2023 for the Social Sciences awarded to Daniel Lakens.

Aczel, B., Palfi, B., Szollosi, A., Kovacs, M., Szaszi, B., Szecsi, P., Zrubka, M., Gronau, Q. F., Bergh, D. van den, & Wagenmakers, E.-J. (2018). Quantifying support for the null hypothesis in psychology: An empirical investigation. Advances in Methods and Practices in Psychological Science, 1(3), 357–366.

Akker, O. R. van den, Bakker, M., Assen, M. A. van, Pennington, C. R., Verweij, L., Elsherif, M. M., Claesen, A., Gaillard, S. D., Yeung, S. K., Frankenberger, J.-L., et al. (2024). The potential of preregistration in psychology: Assessing preregistration producibility and preregistration-study consistency. Psychological Methods.

Dexter, F., & Shafer, S. L. (2017). Narrative review of statistical reporting checklists, mandatory statistical editing, and rectifying common problems in the reporting of scientific articles. Anesthesia & Analgesia, 124(3), 943–947.

Goos, C., Bakker, M., Wicherts, J. M., & Nuijten, M. B. (2024). Assessing reliable and valid measurement as a prerequisite for informative replications in psychology.

GROBID. (2008--2025). https://github.com/kermitt2/grobid; GitHub.

Lakens, D. (2022). Sample Size Justification. Collabra: Psychology, 8(1), 33267. https://doi.org/10.1525/collabra.33267

Sharpe, D. (2013). Why the resistance to statistical innovations? Bridging the communication gap. Psychological Methods, 18(4), 572–582. https://doi.org/10.1037/a0034177

Tendeiro, J. N., Kiers, H. A., Hoekstra, R., Wong, T. K., & Morey, R. D. (2024). Diagnosing the misuse of the bayes factor in applied research. Advances in Methods and Practices in Psychological Science, 7(1), 25152459231213371.

Thibault, R. T., Zavalis, E. A., Malički, M., & Pedder, H. (2024). An evaluation of reproducibility and errors in published sample size calculations performed using g* power. medRxiv, 2024–2007.

Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 1–9. https://doi.org/10.1038/sdata.2016.18