13  Open Practices

13.1 What it checks

The open_practices module searches the text of a paper for statements that data, code, materials, or a preregistration are openly available. It returns the sentences it finds, and records for each open-practice type whether a statement was detected. It also flags statements that data or materials are available only on request.

It is important to be precise about what this module does and does not do. It checks whether the paper says something is shared; it does not verify that anything actually was. A sentence such as “All data are available on the Open Science Framework” is reported as a detected data statement even if the link is dead, the repository is empty, or the files are not what the sentence claims.

This module is fully offline: it reads only the manuscript text and makes no network calls.

ImportantNot yet validated

open_practices has not yet been validated. We have not measured its accuracy against a hand-coded reference set, so the detection rates below should be read as illustrative, not as performance guarantees. We initially used the ODDPub package, but as this package is made for the biomedical literature, it missed too many real data sharing statements in the psychology literature we mainly work with. Compared with the older ODDPub-based approach, this module has a lower false-negative rate but a higher false-positive rate. Treat its output as a set of statements to review, not as a verdict. See Module Validation for what validation means in Metacheck. This module is still work in progress, and we are actively exploring the best solution to the detection of open practices statements. If you want to contribute to this development, reach out to the team.

13.2 How it differs from repo_check and code_check

Three modules touch on open data and code, and it is easy to confuse them. They answer different questions and work in fundamentally different ways:

Module Question it answers How it works Network
open_practices Does the paper claim data/code/materials/preregistration are shared? Text search of the manuscript for sharing statements Offline
repo_check What is actually in the linked repositories? Follows OSF/GitHub/Zenodo links and lists the files found Live calls
code_check What do the shared code files look like? Analyses the R/SAS/SPSS/Stata files that repo_check discovered Live calls

In short: open_practices reads what the authors wrote, while repo_check and code_check inspect what the authors deposited. A paper can trigger open_practices (it says “data are on OSF”) yet fail repo_check (the OSF project turns out to be empty), and vice versa — a repository can hold data and code that the manuscript never mentions in a formal statement. Because open_practices never leaves the text, it is the right tool when you have no internet access, when the linked repository is private or behind login, or when you only want to know what the paper discloses. Use repo_check and code_check when you want to confirm the disclosure against the real contents.

13.3 Running the module

The summary_text and traffic light summarise what was found, and the table holds the matching sentence(s):

paper <- psychsci[[4]]
module_run(paper, "open_practices")

Open Practices Check: Shared data detected.

13.4 A good example and a false positive

A single paper often shows both ends of the module’s behaviour at once. Paper 4 of the psychsci corpus returns three “data” statements:

mo <- module_run(psychsci[[4]], "open_practices")
mo$traffic_light
#> [1] "yellow"
mo$summary_text
#> [1] "Shared data detected."
mo$table[, c("text", "data", "code")] |>
  knitr::kable()
text data code
Additional supporting information can be found at http://pss .sagepub.com/content/by/supplemental-data TRUE FALSE
All data have been made publicly available via Open Science Framework and can be accessed at https://osf.io/e2aks/. TRUE FALSE
The complete Open Practices Disclosure for this article can be found at http://pss.sagepub.com/content/by/supplemental-data. TRUE FALSE

One of these is a good detection:

“All data have been made publicly available via Open Science Framework and can be accessed at https://osf.io/e2aks/.”

This is exactly what the module is meant to catch — a clear, specific statement pointing to an open repository.

The other two are false positives. They are the generic Sage journal footer (“Additional supporting information can be found at http://pss.sagepub.com/content/by/supplemental-data” and “The complete Open Practices Disclosure for this article can be found at …”). This boilerplate appears on a large share of Psychological Science articles regardless of whether any data were actually shared, but the module reads it as a data statement. Across the psychsci corpus this kind of journal boilerplate accounts for a meaningful fraction of all “data” hits, which is one reason the data-detection rate looks high in that corpus specifically.

The detector also occasionally misfires on the type of practice. In other papers, a methods sentence such as “the position data were polynomially interpolated using Qualisys Track Manager” is flagged as code = TRUE, simply because of the vocabulary it contains, even though no code was shared. Again: the flag means “a candidate statement was found”, not “this practice was verified”.

The practical takeaway mirrors the funding and COI modules: read the matched sentence before trusting the flag.

13.5 Running on many papers

The per-paper summary_table records, for each paper, whether a statement of each type was detected:

mo <- module_run(psychsci[1:20], "open_practices")
mo$summary_table[, c("paper_id", "data_open", "code_open", "materials_open", "prereg_open", "on_request")] |>
  knitr::kable()
paper_id data_open code_open materials_open prereg_open on_request
0956797613520608 FALSE FALSE NA NA NA
0956797614522816 FALSE FALSE NA NA NA
0956797614527830 TRUE FALSE FALSE FALSE FALSE
0956797614557697 TRUE FALSE FALSE FALSE FALSE
0956797614560771 FALSE FALSE NA NA NA
0956797614566469 TRUE FALSE FALSE FALSE FALSE
0956797615569001 TRUE FALSE TRUE FALSE FALSE
0956797615569889 TRUE FALSE FALSE FALSE FALSE
0956797615583071 TRUE FALSE TRUE FALSE FALSE
0956797615588467 TRUE FALSE FALSE FALSE FALSE
0956797615603702 FALSE FALSE NA NA NA
0956797615615584 TRUE FALSE FALSE FALSE FALSE
0956797615617779 TRUE FALSE FALSE FALSE FALSE
0956797615620784 TRUE FALSE FALSE FALSE FALSE
0956797615625973 TRUE FALSE FALSE FALSE FALSE
0956797616631990 TRUE FALSE FALSE FALSE FALSE
0956797616634654 TRUE FALSE FALSE FALSE FALSE
0956797616634665 TRUE FALSE FALSE FALSE FALSE
0956797616636631 TRUE FALSE TRUE FALSE FALSE
0956797616647519 TRUE FALSE TRUE FALSE FALSE

Detection rates vary widely and legitimately across journals and eras. In a sample of Psychological Science articles — a journal that adopted open-practice badges early — a large majority mention shared data. Corpora loaded through the paper databases tell a different story: in a sample of Journal of Decision Making articles almost none do, while PLOS Medicine articles fall in between. These differences reflect real variation in disclosure norms, not a malfunction of the module. (They are also a reminder that the module measures statements, so a high rate partly reflects journal boilerplate, as shown above.)

13.6 Interpreting the result

  • A detection (data_open, code_open, etc. is TRUE) means a sharing statement was found — not that the resource exists or is usable. Confirm with repo_check where possible, or by following the link yourself.
  • No detection means no recognisable statement was found. The paper may genuinely share nothing, or it may share something using wording the module does not match, or the relevant sentence may have been mangled during PDF extraction (garbled URLs such as osf .io/ideta are common and can cause both false positives and missed links).
  • An on_request flag means the paper says data or materials are available on request, which is worth noting because such availability is frequently not honoured in practice.

13.7 Options

open_practices takes only the paper argument; it has no additional options.