7 Creating Modules

Modules are user-created patterns for checking a paper or set of papers. Metacheck wants to make it possible for anyone to create their own modules. This makes it possible for toolbuilders to incorporate their automated checks in Metacheck, increases the probability that their automated check will be used. In line with our value to make sure that Metacheck will reflect and respect the needs of a diverse range of research communities, and benefit their research practices, allowing anyone to make a module enables research communities to create checks relevant to their field. If you work in a field where mice and rats are studies, you might need to follow reporting guidelines specific to the use of mice and rats. Any member of that community can build a module that will check if these guidelines are reported.

If you want to learn more about how to build a module, the information below should be helpful. But if you are mainly interested in using Metacheck, we recommend skipping the rest of this chapter.

7.1 How to create a Metacheck module

Module specifications are written in the same format as functions in R packages, using roxygen2 for documentation.

#' Module Title
#'
#' @description
#' A short description of the module.
#'
#' @details
#' This text will show when you use module_info()
#' and in the "How It Works" collapse box after the module report.
#' It can be multiple paragraphs and is styled with markdown.
#'
#' <validation>
#' In a sample of P papers with I instances of {thing you are checking},
#' TP were correctly detected (true positives),
#' FN were missed (false negatives) and
#' FP were incorrectly detected (false positives).
#' Overall, among all instances flagged as {thing},
#' (TP/(TP+FP))% were correct (positive predictive value) and
#' (TP/I)% of true instances were detected (sensitivity).
#' </validation>
#'
#' @keywords general|intro|method|results|discussion|reference
#'
#' @author Author Name (\email{name@email.com})
#'
#' @references
#' # Optional reference to include in reports
#'
#' @import dplyr
#'
#' @param paper a paper object or paperlist object
#' @param ... further arguments (not used)
#'
#' @returns report list
module_name <- function(paper, ...) {
  # see https://www.scienceverse.org/metacheck/articles/creating_modules.html

  # module code ----
  pattern <- "significant"

  # create return items ----

  ## table ----
  # detail your results in a format like the result of text_search()
  # this is stored to use in later modules in a report or pipeline
  table <- text_search(paper, pattern)

  ## summary_table ----
  # must have id column as the id of each paper, one row per paper
  # and further columns to be added to a master summary table
  summary_table <- dplyr::count(table, id, name = "n_significant")

  ## traffic light ----
  # displayed in reports, possible values:
  #   green: no problems detected
  #   yellow: something to check
  #   red: possible problems detected
  #   info: informational only
  #   na: not applicable
  #   fail: check failed
  tl <- if (nrow(table)) "info" else "na"

  ## summary_text ----
  # short text to be displayed at the top of reports
  # may be unique for each possible traffic light
  summary_text_options <- c(
    na = "Not applicable",
    info = "This table is provided for your information",
    red = "This is a potential problem",
    yellow = "There may be a problem",
    green = "No problems found",
    fail = "The check failed, sorry"
  )
  summary_text <- summary_text_options[[tl]]

  ## report ----
  # longer text to be displayed in the module section
  # use quarto / markdown for styling
  # https://quarto.org/docs/authoring/markdown-basics.html
  report_text <- "This table shows all of the sentences where the paper used the word *significant*. "
  report_table <- table[, c("section", "text")]
  further_info <- c(
    "For more opinions on the use of the word *significant*:",
    "* Motulsky, H. (2014). Opinion: Never use the word ‘significant’ in a scientific paper. Advances in regenerative biology, 1(1), 25155. doi: [10.3402/arb.v1.25155](https://doi.org/10.3402/arb.v1.25155)"
    )

  report <- c(
    report_text,
    scroll_table(report_table, colwidths = c(.2, .8)),
    collapse_section(further_info, title = "Further Info")
  )

  # return a list ----
  list(
    table = table,
    summary_table = summary_table,
    na_replace = 0,
    traffic_light = tl,
    summary_text = summary_text,
    report = report
  )
}

7.1.1 Roxygen Documentation

The module file starts with standard function documentation using roxygen2. Roxygen documentation always starts with #'.

7.1.2 Title

On the first line, give your module a short title, which will be used as a section header in reports.

#' Module Name

7.1.3 Description

You can skip a line and write a 1-sentence description, which will be shown in module_list(), or optionally start this with @description.

#' @description
#' A short description of the module

7.1.4 Details

You can write more detailed help under the tag @details, which will be shown when calling module_help(). This is optional.

#' @details
#' Here is more information about the module to help you use or understand it.
#' 
#' You can skip more lines to break up paragraphs.
#' 
#' * make a list
#' * check it twice

If you have experience writing R functions with roxygen, you can also omit the @description and @details tags and rely on paragraph spacing to distinguish description from details.

7.1.5 Validation

Our values statement lists quality control as a core value. Validation information gets special styling in the report, so is put at the end of the details section and marked with <validation> at the start and </validation> at the end.

#' <validation>
#' In a sample of P papers with I instances of {thing you are checking}, 
#' TP were correctly detected (true positives), 
#' FN were missed (false negatives) and 
#' FP were incorrectly detected (false positives). 
#' Overall, among all instances flagged as {thing}, 
#' (TP/(TP+FP))% were correct (positive predictive value) and 
#' (TP/I)% of true instances were detected (sensitivity).
#' </validation>

The example above works well for modules that detect a practice that may happen zero or more times per paper. Communicating validation information clearly and succinctly is challenging, so feel free to use other wordings. We encourage you to also include a link to a repository that contains more detailed information and evidence.

7.1.6 Keywords

Choose one section category for your module to be displayed under in reports.

#' @keywords general|intro|method|results|discussion|reference

7.1.7 Author

Include the module authors so they can get credit! Add a new @author tag for each author, and optionally add their email address.

#' @author Lisa DeBruine (\email{debruine@gmail.com})
#' @author Daniel Lakens (\email{lakens@gmail.com})

7.1.8 References

Optionally include references that you would want available to users. If you are building a module that uses citable resources, please list them here.

#' @references
#' The Retraction Watch Database [Internet].
#' New York: The Center for Scientific Integrity. 2018.
#' ISSN: 2692-4579. [Cited 2025-05-20].
#' Available from: http://retractiondatabase.org/.

7.1.9 Import

If you are using packages other than metacheck, add each with an @import statement.

#' @import dplyr
#' @import tidyr

Technically, you can then use functions from these packages in your function code without the package name prefix, but it is still best practice to use the package name prefix for all functions, like dplyr::case_when(), and we require this for contributed modules.

7.1.10 Parameters

Each argument should be defined for a function. All Metacheck modules require the first argument to be paper. The last argument can optionally be .... This allows the module_run() function to pass any arguments, and your code can use them by name (e.g., extra_args <- list(...)).

#' @param paper a paper object or paperlist object
#' @param ... further arguments (not used)

7.1.11 Returns

It is good practice to explain what your function returns. This is usually the default list with table, summary, traffic light, and report text, but you can edit this. It’s just a human-readable string.

#' @returns a list

7.1.12 Examples

You can add an example of how to use this module with the module_run() function. Give a paper or list of papers in the example so you can demonstrate the purpose of this module and it doesn’t take too much time to run the example.

#' @examples
#' module_run(psychsci, "module_name")

7.2 Function Code

The module function is written like any R package function, with the requirement that the first argument be paper. Set module_name to your module name, which must be a valid R variable name. Your module script should also have the same name, with a .R suffix (e.g., module_name.R).

module_name <- function(paper, ...) {
  # module code ----
  # create return items ----
  ## table ----
  ## summary_table ----
  ## traffic light ----
  ## summary_text ----
  ## report ----
  # return a list ----
}

You can define helper functions below your main module functions, but the first function defined in the script is what will be run on the paper object.

A module can technically do anything you want with the paper input, but you will need to follow the template below for your module to work automatically with reports and the metascience workflow.

If you are using your modules to build a report, you need to specify what type of output corresponds to good practice or practice that may need improvement. We do this through “traffic_light” and “report”.

7.2.1 Code Helpers

7.2.1.1 Progress Bars

If you want to display progress for a long module, you can use the pb() function. It’s a modified version of progress::progress_bar() that only displays a progress bar if verbose() is true.

steps <- c("beginning", "middle", "end")
pb <- pb(length(steps),
           ":what [:bar] :current/:total :elapsedfull")

for (step in steps) {
  pb$tick(tokens = list(what = step))
  Sys.sleep(2)
}

If you don’t know how many steps there will be in a process, you can use the spinner version:

steps <- c("beginning", "middle", "end")
pb <- pb(NA, "(:spin) :what")

for (step in steps) {
  pb$tick(tokens = list(what = step))
  Sys.sleep(2)
}

7.2.1.2 Get Previous Outputs

If you run module in a chain or via the report() function, the output of the previously run modules is available to later modules. You will need to handle what happens if the part of the output you require is missing, but always access these with the get_prev_outputs() function.

# get p_table from prev_outputs if available
p_table <- get_prev_outputs(module = "all_p_values", 
                            item = "table")

# run that module if not
if (is.null(p_table)) {
  p <- module_run(paper, "all_p_values")
  p_table <- p$table
}

# ... further code using p_table

7.2.2 Table

Most modules will need to structure their output in a table that can be shown in a report. The text_search() function below creates a table with a row for each sentence that contains to word “significant”.

  ## table ----
  # detail your results in a format like the result of text_search()
  # this is stored to use in later modules in a report or pipeline
  table <- text_search(paper, pattern)

You will need to make sure that your module works with both single paper objects and lists of paper objects. The Metacheck functions text_search() and llm() are already vectorised for paper lists.

module_run(psychsci[[1]], "module_name") # 1 paper
module_run(psychsci[1:10], "module_name") # 10 papers

7.2.3 Summary Table

For the metascience workflow, it is useful to create a table with a row for each paper in a list, and some columns that summarise the results. You can use nested tables if you want some of your cells to contain multiple values.

  ## summary_table ----
  # must have id column as the id of each paper, one row per paper
  # and further columns to be added to a master summary table
  summary_table <- dplyr::count(table, id, name = "n_significant")

Your summary table might omit some papers from the whole list because no relevant text was found. You don’t have to add them into your table, as the module_run() function will do that automatically for you. However, you may want the values of your summary variables to be something other than NA for these missing papers. You can set the value of na_replace in the return list (below) to this default value. For example, if you are returning a summary of the count of sentences with the word “significant”, you can replace NAs with 0.

If you are returning more than one summary column and have different replacement values, use a named list.

na_replace <- list(
  n_significant = 0,
  paper_type = "unknown"
)

7.2.4 Traffic Light

The traffic lights are used in single-paper reports to give a quick visual overview of the module results. There are 6 kinds of traffic lights:

✅️ no problems detected;
🔍 something to check;
⚠️ possible problems detected;
ℹ️ informational only;
⬜️ not applicable;
☠️️ check failed

You will need to write some code to determine which traffic lights apply to your case. If you don’t include a traffic light, but do include a table in the returned list, the following rule will be applied for the traffic light. The failed traffic light will be automatically applied if your code produces an error.

  ## traffic light ----
  # displayed in reports, possible values: 
  #   green: no problems detected
  #   yellow: something to check
  #   red: possible problems detected
  #   info: informational only
  #   na: not applicable
  #   fail: check failed
  tl <- if (nrow(table)) "info" else "na"

7.2.5 Summary Text

The reports begin with a summary of each module, showing the traffic light, the module name, and a short summary of the results. The returned item summary_text provides this text. You will usually want to customise the summary text for the traffic light or other aspects of the results, such as the number of instances of a practice found and the number that might be problematic.

  ## summary_text ----
  # short text to be displayed at the top of reports
  # may be unique for each possible traffic light
  summary_text_options <- c(
    na = "Not applicable",
    info = "This table is provided for your information",
    red = "This is a potential problem",
    yellow = "There may be a problem",
    green = "No problems found",
    fail = "The check failed, sorry"
  )
  summary_text <- summary_text_options[[tl]]

7.2.6 Report

Reports need to explain concepts or give resources for further learning. This is often specific to the outcome of a check.

The report should be a character vector, with one or more item. Use quarto / markdown for styling text, such as adding links or lists.

  ## report ----
  # longer text to be displayed in the module section
  # use quarto / markdown for styling
  # https://quarto.org/docs/authoring/markdown-basics.html
  report_text <- "This table shows all of the sentences where the paper used the word *significant*. "
  report_table <- table[, c("section", "text")]
  further_info <- c(
    "For more opinions on the use of the word *significant*:",
    "* Motulsky, H. (2014). Opinion: Never use the word ‘significant’ in a scientific paper. Advances in regenerative biology, 1(1), 25155. doi: [10.3402/arb.v1.25155](https://doi.org/10.3402/arb.v1.25155)"
    )

  report <- c(
    report_text,
    scroll_table(report_table, colwidths = c(.2, .8)),
    collapse_section(further_info, title = "Further Info")
  )

You can use the helper functions scroll_table() to display tables and collapse_section() to hide longer sections of text or supplemental tables.

7.2.6.1 Scroll Tables

The function scroll_table() generates the R code chunk needed to display a scrollable table in a quarto document, which is how the reports are created. It will include the contents of the table.

table <- data.frame(id = 1:10, letter = LETTERS[1:10])

scroll_table(
  table, 
  colwidths = c(.1, .9), # use NA for auto, numbers <=1 are %, > 1 are px, 
                         # or use characters, e.g., c("3em", NA)
  maxrows = 2,           # if table length <= this, show all rows, no paginations
  column = "body"        # page: spans full page, margin: just right margin
) |> cat()

#> 
#> ```{r}
#> #| echo: false
#> 
#> 
#> # table data --------------------------------------
#> table <- structure(list(id = 1:10, letter = c("A", "B", "C", "D", "E", 
#> "F", "G", "H", "I", "J")), row.names = c(NA, -10L), class = "data.frame")
#> 
#> # display table -----------------------------------
#> metacheck::report_table(table, c(0.1, 0.9), 2, FALSE)
#> ```

7.2.6.2 Collapsible Sections

The function collapse_section() generates the R code chunk needed to hide a section in a collapsible box.

text <- c("This is my first *paragraph*:",
          "* list item 1",
          "* list item 2")

collapse_section(
  text, # a vector of markdown text
  title = "See the full list", # defaults to "Learn More"
  callout = "note", # "tip", " note", "warning", "important", "caution"
  collapse = TRUE # defaults TRUE to start collapsed
) |> cat()
#> ::: {.callout-note title="See the full list" collapse="true"}
#> 
#> This is my first *paragraph*:
#> 
#> * list item 1
#> 
#> * list item 2
#> 
#> :::

7.2.6.3 Plurals

Feedback and summary text often needs to refer to the number of instances something happened. We found ourselves having to awkwardly write text templates like “We found %d inexact p-value(s).” The plural() function is a quick helper so you can add in an “s” if the number is not 1. We like the sprintf() function for setting up your template sentence and replacing in numbers or strings.

n <- 0:2

sprintf("We found %d problem%s that %s serious.",
        n, plural(n), plural(n, "is", "are"))
#> [1] "We found 0 problems that are serious."
#> [2] "We found 1 problem that is serious."  
#> [3] "We found 2 problems that are serious."

7.2.6.4 HTML Links

Usually, you will use markdown to create linked text, like [text](url). However, this doesn’t work inside a scroll table, so we made a helper for creating html links. If you just give it a URL, it will remove the “http(s)://” and use the rest of the URL as the linked text.

link("https://scienceverse.org/metacheck")
#> [1] "<a href='https://scienceverse.org/metacheck' target='_blank'>scienceverse.org/metacheck</a>"

More commonly, you will want to specify the linked text. Links created by markdown in the main text of a report are automatically opened in a new window, but you need to specify this for HTML links. Set new_window = FALSE if you don’t want this.

link(url = "https://scienceverse.org/metacheck", 
     text = "MetaCheck", 
     new_window = FALSE)
#> [1] "<a href='https://scienceverse.org/metacheck'>MetaCheck</a>"

7.2.6.5 Format References

In order to keep references displayed consistently, use the format_ref() function. This function can take a bibentry object (like the values in the ref column of a paper’s bib table), bibtex text, or plain text.

# get refs from a paper
paper <- demopaper()
format_ref(paper$bib$ref[2])

[1] “NULL”

bibentry <- bibentry(
    bibtype = "Article",
    title = "Improving transparency, falsifiability, and rigor by making hypothesis tests machine-readable",
    author = c(
      person("D.", "Lakens"),
      person(c("L.", "M."), "DeBruine")
    ),
    journal = "Advances in Methods and Practices in Psychological Science",
    year = 2021,
    volume = 4,
    number = 2,
    pages = "2515245920970949",
    doi = "10.1177/2515245920970949"
  )
format_ref(bibentry)

[1] “Lakens D, DeBruine LM (2021). “Improving transparency, falsifiability, and rigor by making hypothesis tests machine-readable.” Advances in Methods and Practices in Psychological Science, 4(2), 2515245920970949. doi:10.1177/2515245920970949.”

You can get a bibentry citation for any package with code.

bib <- citation("metacheck")
format_ref(bib)

[1] “DeBruine L, Mesquida C, Werner J, Lakens D (2026). metacheck: Check Research Outputs for Best Practices. doi:10.5281/zenodo.20704754, R package version 0.1.0, https://scienceverse.org/metacheck.”

The function can also handle references in bibtex format.

bibtex <- "@Article{,
  title = {Improving transparency, falsifiability, and rigor by making hypothesis tests machine-readable},
  author = {D. Lakens and L. M. DeBruine},
  journal = {Advances in Methods and Practices in Psychological Science},
  year = {2021},
  volume = {4},
  number = {2},
  pages = {2515245920970949},
  doi = {10.1177/2515245920970949},
}"
format_ref(bibtex)

If you just add plain text that isn’t a bibentry or in bibtex format, you will usually just get the text back.

format_ref("My wierd citation (2025)")

[1] “My wierd citation (2025)”

We will work towards future versions being able to collate all references in a report displayed using this method to automatically create a linked reference section,

7.2.7 Return

Structure the returned values in a list. The names below are reserved for specific uses in the report and piped workflow, but you can also return other objects for your own purposes.

  # return a list ----
  list(
    table = table,
    summary_table = summary_table,
    na_replace = 0,
    traffic_light = tl,
    summary_text = summary_text,
    report = report
  )

7.3 A complete worked example

The sections above describe each piece in isolation. Here we put them together into a complete, working module you can run. It is inspired by the ethics_check and coi_check modules, which scan a paper’s sentences for specific terms.

Many fields have reporting guidelines for how something should be named or described. In animal research, mouse and rat strains have a strict nomenclature maintained by MGI and RGD: inbred strains are written in uppercase Roman letters and numbers (C57BL/6J, BALB/c, DBA/2), substrains add a laboratory code after a slash (A/He), and so on. A simple but useful module is one that flags sentences mentioning a recognised strain, so an author or reviewer can check the nomenclature is correct.

We will call it strain_check. Because every module operates on a paper object, we can develop it interactively using text_search() (see Text Search).

7.3.1 Step 1: build and test the pattern

We start from a list of common strain names and turn it into one regular expression. Testing on a sentence we type ourselves with test_paper() confirms the pattern matches what we expect:

strains <- c(
  "C57BL/6[A-Za-z]*", "C57BL/10[A-Za-z]*", "BALB/c[A-Za-z]*",
  "DBA/[12][A-Za-z]*", "C3H/[A-Za-z]+", "CBA/[A-Za-z]+",
  "A/[A-Z][a-z]?", "129[A-Z][0-9A-Za-z]*", "FVB/[A-Za-z]+",
  "NOD/[A-Za-z]+", "SJL/[A-Za-z]+", "Wistar", "Sprague[- ]Dawley"
)
pattern <- paste(strains, collapse = "|")

example <- test_paper(
  "Subjects were male C57BL/6J mice (n = 20) and BALB/c controls.
   We also bred DBA/2 and 129S1 lines, and Sprague-Dawley rats."
)

# return = "match" shows exactly which strings matched
text_search(example, pattern, return = "match", perl = TRUE)$text
#> [1] "C57BL/6J"       "BALB/c"         "DBA/2"          "129S1"         
#> [5] "Sprague-Dawley"

7.3.2 Step 2: write the module function

Now we wrap the pattern in a function with the standard signature and return list. The structure mirrors coi_check: search the text, build a per-paper summary_table with dplyr::summarise(.by = ...), set a traffic_light, and assemble a report with the helper functions scroll_table() and plural().

strain_check <- function(paper) {
  strains <- c(
    "C57BL/6[A-Za-z]*", "C57BL/10[A-Za-z]*", "BALB/c[A-Za-z]*",
    "DBA/[12][A-Za-z]*", "C3H/[A-Za-z]+", "CBA/[A-Za-z]+",
    "A/[A-Z][a-z]?", "129[A-Z][0-9A-Za-z]*", "FVB/[A-Za-z]+",
    "NOD/[A-Za-z]+", "SJL/[A-Za-z]+", "Wistar", "Sprague[- ]Dawley"
  )
  pattern <- paste(strains, collapse = "|")

  # table ----
  table <- text_search(paper, pattern, perl = TRUE)

  # summary_table ----
  summary_table <- dplyr::summarise(table,
    strains_mentioned = dplyr::n(),
    .by = paper_id
  )

  # traffic_light ----
  # "na" when no strains are found (the module simply does not apply)
  tl <- ifelse(nrow(table) > 0, "green", "na")

  # summary_text and report ----
  if (tl == "green") {
    summary_text <- sprintf(
      "Found %d sentence%s mentioning a recognised mouse or rat strain.",
      nrow(table), plural(nrow(table))
    )
    report <- c(
      paste(
        "We detected references to standard mouse or rat strains.",
        "Check that the nomenclature follows the",
        "[MGI/RGD guidelines](https://www.informatics.jax.org/mgihome/nomen/strains.shtml)."
      ),
      scroll_table(table[, c("text", "header")])
    )
  } else {
    summary_text <- "No recognised mouse or rat strain names were detected."
    report <- summary_text
  }

  # return list ----
  list(
    table = table,
    summary_table = summary_table,
    na_replace = 0,
    traffic_light = tl,
    summary_text = summary_text,
    report = report
  )
}

7.3.3 Step 3: run it

Because the function follows the module template, you can call it directly, or register it and run it through module_run(). Calling it on our example paper:

result <- strain_check(example)

result$traffic_light
#> [1] "green"
result$summary_text
#> [1] "Found 1 sentence mentioning a recognised mouse or rat strain."
result$summary_table
#> # A tibble: 1 × 2
#>   paper_id       strains_mentioned
#>   <chr>                      <int>
#> 1 0301ca8c7e70f1                 1

On a paper that does not study animals — such as the bundled demopaper() — no strains are found and the traffic light is "na", meaning the check simply does not apply:

strain_check(demopaper())$traffic_light
#> [1] "na"

7.3.4 Going further

This module only checks whether strain names are present; a more sophisticated version could validate the nomenclature itself (for example, flagging a missing laboratory code, or a lowercase strain symbol that should be uppercase), or use an LLM to judge whether the strain is described with all the detail the guidelines require. The same pattern — search, summarise, judge, report — applies whatever the check.

Once you are happy with a module, save it as a standalone .R file (named to match the function) with the roxygen documentation described earlier, and validate it against hand-coded papers as shown in the validation sections above.