Appendix B — Function reference for new users

This article lists the functions in Metacheck with a short, plain-language explanation of what each one does and a small example. It is meant for people who are starting to use the package and want a map of what is available.

It is organised in two parts:

Public functions — the functions you call directly, grouped by task.
Internal functions — helpers used inside the package, listed for completeness.

Examples that work offline are run when this page is built, so their output is real. Examples that need the internet (looking up DOIs, downloading from repositories, querying an LLM) are shown but not run; they are marked with a comment.

Two example data sets ship with the package and are used throughout:

demopaper() is a single small paper object, good for quick examples.
psychsci is a list of 250 open-access papers from Psychological Science.

paper <- demopaper()

C Public functions

C.1 Getting a paper to work with

Everything in Metacheck operates on a paper object (or a list of them). These functions create or load one.

demopaper() — returns a ready-made example paper, so you can try things without loading a file.

paper <- demopaper()
class(paper)

#> [1] "scivrs_paper" "list"

test_paper(text) — builds a minimal paper from a string of text. Handy for testing a check on a sentence you type yourself.

tp <- test_paper("There was an effect, t(28) = 2.40, p = .02, d = 0.45.")
tp$text$text

#> [1] "There was an effect, t(28) = 2.40, p = .02, d = 0.45."

demofile(ext) — returns the path to a bundled example file of a given type (for trying the import functions).

demofile("pdf")

#> [1] "C:/Users/dlakens/AppData/Local/R/win-library/4.5/metacheck/demos/to_err_is_human.pdf"

read(file_path) — reads a paper from a Metacheck JSON file (or a folder of them) into a paper object.

paper <- read("mypaper.json")

paper(id, ...) — constructs a paper object from components directly. paperlist(...) — combines several papers into one list that the checks can loop over.

pl <- paperlist(demopaper(), test_paper("t(10) = 2.0, d = 0.6."))
length(pl)

#> [1] 2

paper_id(paper) — the paper’s identifier. paper_table(paper, table) — pull one of the paper’s component tables (e.g. "info", "author", "bib"). ref_table(paper) — the reference list as a tidy table.

paper_id(paper)

#> [1] "to_err_is_human"

ref_table(paper)

paper_validate(paper) — checks a paper object against the expected schema. paper_write(paper, file_name) — saves a paper object to a JSON file. fig_image_view(paper, figure_id) — displays a figure image from the paper.

paper_validate(paper)
paper_write(paper, "mypaper.json")

C.2 Importing papers from PDFs and XML

These convert source documents (usually a PDF) into the Metacheck JSON format. Most need an external service (a GROBID server or the bibr backend), so they are shown but not run here.

convert(file_path, save_path) — the main entry point: convert a document to Metacheck JSON, picking a method automatically. convert_grobid(...) / convert_bibr(...) — convert using a specific backend. grobid_to_bibr(xml_path) — turn GROBID TEI XML into the bibr structure. format_bib_authors(authors) — format an author list for a bibliography.

# needs a running GROBID server or the bibr backend
paper <- convert("mypaper.pdf", save_path = "out/")

C.3 Searching the text

search_text(paper, pattern) (also available as text_search() — they are the same function) — find text matching a regular expression.

search_text(paper, "significant") |> head(3)

Use return = "match" to get just the matched pieces, fixed = TRUE to match literally, and exclude = TRUE to get the rows that don’t match.

search_text(paper, "significant", return = "match")

search_text(paper, "p = 0.005", fixed = TRUE)

nrow(search_text(paper, "significant", exclude = TRUE))

#> [1] 29

expand_text(results, paper, expand_to) (also text_expand()) — add a column with surrounding context (e.g. the whole sentence or paragraph) to a search/extraction result.

pv <- extract_p_values(paper)
expand_text(pv, paper, expand_to = "sentence")$expanded[1]

#> [1] "On average researchers in the experimental (app) condition made fewer mistakes (M = 9.12) than researchers in the control (checklist) condition (M = 10.9), t(97.7) = 2.9, p = 0.005, d =0.59."

C.4 Extracting statistics and links

extract_eq(paper) — list every “name <comparator> value” equation in the text (test statistics, means, effect sizes, …). This is the canonical place to get statistics; checks should read from here rather than re-scanning the text.

extract_eq(paper) |> head()

extract_p_values(paper) — the p-values, with comparator and numeric value.

extract_p_values(paper)[, c("text", "p_comp", "p_value")]

extract_urls(paper) — text rows containing a URL or DOI.

extract_urls(paper)[, c("text_id", "text")] |> head(3)

stats(text) — run statcheck-style consistency checks on reported statistics. json_expand(table, col) — expand a column of JSON strings into columns.

stats(paper)

C.5 Running checks (modules)

Checks in Metacheck are modules. These functions list, describe, and run them.

module_list() — the available modules.

module_list()[, c("name", "title", "section")] |> head()

module_run(paper, module) — run a module; returns a list with table, summary_table, traffic_light, summary_text, and report.

out <- module_run(paper, "all_urls")
out$traffic_light

#> [1] "info"

module_help(module) / module_info(module) — documentation and metadata for a module. module_template(module_name) — scaffold a new module file. get_prev_outputs(module, item) — inside a module, read the output of a module that ran earlier.

module_info("all_urls")$title

#> [1] "List All URLs"

C.6 Building a report

report(paper, modules) — run a set of modules and render a full report document. report_module_run(...) — run the modules that make up a report. module_report(module_output) — turn one module’s output into its report section. report_qmd(...) — produce the Quarto source for a module’s section.

report_txt <- module_report(out)
class(report_txt)

#> [1] "character"

report(paper, modules = c("all_urls", "stat_effect_size"),
       output_file = "report.html")

C.7 Report-building helpers

Small helpers modules use to format their report sections.

scroll_table(table) / report_table(table) — render a data frame as an HTML table. collapse_section(text, title) — wrap text in a collapsible block. link(url, text) — make an HTML link. plural(n) — "s" when n != 1, else "" (for grammatical messages). format_ref(bib) — format a citation from a bibentry.

plural(1)

#> [1] ""

plural(2)

#> [1] "s"

link("https://example.org", "example")

#> [1] "<a href='https://example.org' target='_blank'>example</a>"

format_ref(citation("metacheck"))

#> [1] "DeBruine L, Mesquida C, Werner J, Lakens D (2026). <em>metacheck: Check Research Outputs for Best Practices</em>. <a href=\"https://doi.org/10.5281/zenodo.20704754\">doi:10.5281/zenodo.20704754</a>, R package version 0.1.0, <a href=\"https://scienceverse.org/metacheck\">https://scienceverse.org/metacheck</a>."

C.8 Looking up DOIs and metadata

doi_clean(doi) — normalise a DOI string. doi_valid_format(doi) — does it look like a DOI? doi_resolves(doi) / doi_lookup(doi) — check/look up a DOI online.

doi_clean("https://doi.org/10.1234/ABC")

#> [1] "10.1234/ABC"

doi_valid_format("10.1234/abc")

#> [1] TRUE

doi_lookup("10.1234/abc")   # needs internet

crossref_doi(doi) / crossref_query(ref) — Crossref metadata lookup. openalex_doi(doi) / openalex_query(title) — OpenAlex lookup. datacite_doi(doi) — DataCite lookup. add_bib_match(paper) — match a paper’s references to online records.

crossref_doi("10.1177/0956797614520714")  # needs internet

C.9 Databases of comments, replications, retractions

pubpeer_comments(doi) — PubPeer comments for a DOI. FLoRA() / FLoRA_update() / FLoRA_date() — the FLoRA replication database (loaded, refreshed, and dated). retractionwatch() (alias rw()) / rw_update() / rw_date() — the Retraction Watch database.

pubpeer_comments("10.1177/0956797614520714")  # needs internet
rw()                                            # downloads the database

C.10 Finding and downloading shared files

Each repository has a *_links() function that finds links in a paper, an *_info() function that fetches metadata, and (where supported) a *_file_download() function.

OSF: osf_links(), osf_info(), osf_file_download(), osf_type(), osf_check_id(), osf_api_check(), osf_delay(), osf_get_all_pages(), osf_preprint_list().

GitHub: github_links(), github_info(), github_repo(), github_files(), github_languages(), github_readme().

Zenodo: zenodo_links(), zenodo_info(), zenodo_file_download().

ResearchBox: rbox_links(), rbox_info(), rbox_file_download().

AsPredicted: aspredicted_links(), aspredicted_info().

Local files: local_files(path) — list files in a local folder.

The *_links() functions work offline on an already-loaded paper:

osf_links(paper)[, c("href", "text_id")]

The *_info() and download functions need the internet:

osf_info("https://osf.io/5tbm9")          # fetch metadata
osf_file_download("5tbm9", "downloads/")  # download files
github_readme("scienceverse/metacheck")

C.11 Asking a large language model

llm(text, ...) — send text to a language model and get a structured answer. llm_use(...) — set which model/platform to use. llm_model(model) / llm_model_list(platform) — pick or list models. llm_max_calls(n) — cap how many calls a run may make.

llm_use("ollama")                       # choose a backend
llm("Summarise this abstract: ...")     # query the model

C.12 Checking shared code

code_read(file_path) — read a code file. code_parse_r(file_path) — parse R code into a table of calls. code_remove_comments(code_text, lang) — strip comments from code.

code_remove_comments("x <- 1 # set x", lang = "R")

#> [1] "x <- 1 # set x"

C.13 ORCID and contributor info

check_orcid(orcid) — validate an ORCID and fetch the person. get_orcid(family, given) — look up an ORCID by name. orcid_person(orcid) — a person’s ORCID record. credit_roles() — the CRediT contributor-role taxonomy.

credit_roles() |> head()

#> [1/con] Conceptualization: Ideas; formulation or evolution of overarching research goals and aims.
#> [2/dat] Data curation: Management activities to annotate (produce metadata), scrub data and maintain research data (including software code, where it is necessary for interpreting the data itself) for initial use and later re-use.
#> [3/ana] Formal analysis: Application of statistical, mathematical, computational, or other formal techniques to analyse or synthesize study data.
#> [4/fun] Funding acquisition: Acquisition of the financial support for the project leading to this publication.
#> [5/inv] Investigation: Conducting a research and investigation process, specifically performing the experiments, or data/evidence collection.
#> [6/met] Methodology: Development or design of methodology; creation of models.
#> [7/adm] Project administration: Management and coordination responsibility for the research activity planning and execution.
#> [8/res] Resources: Provision of study materials, reagents, materials, patients, laboratory samples, animals, instrumentation, computing resources, or other analysis tools.
#> [9/sof] Software: Programming, software development; designing computer programs; implementation of the computer code and supporting algorithms; testing of existing code components.
#> [10/sup] Supervision: Oversight and leadership responsibility for the research activity planning and execution, including mentorship external to the core team.
#> [11/val] Validation: Verification, whether as a part of the activity or separate, of the overall replication/reproducibility of results/experiments and other research outputs.
#> [12/vis] Visualization: Preparation, creation and/or presentation of the published work, specifically visualization/data presentation.
#> [13/dra] Writing - original draft: Preparation, creation and/or presentation of the published work, specifically writing the initial draft (including substantive translation).
#> [14/edi] Writing - review & editing: Preparation, creation and/or presentation of the published work by those from the original research group, specifically critical review, commentary or revision -- including pre- or post-publication stages.

#> NULL

check_orcid("0000-0002-0247-239X")   # needs internet

C.14 File types

filetype(filename) — a coarse type for a filename (e.g. data, code, document). file_category(contents) — categorise file contents.

filetype("analysis.R")

#> analysis.R 
#>     "code"

filetype("data.csv")

#> data.csv 
#>   "data"

C.15 Validation utilities

validate(gt, module) — compare a module’s current output against a coded ground-truth file. accuracy(expected, observed) — accuracy measures (sensitivity, specificity, d-prime, …) from two logical vectors.

accuracy(c(TRUE, TRUE, FALSE, FALSE), c(TRUE, FALSE, FALSE, FALSE))

#> $hits
#> [1] 1
#> 
#> $misses
#> [1] 1
#> 
#> $false_alarms
#> [1] 0
#> 
#> $correct_rejections
#> [1] 2
#> 
#> $accuracy
#> [1] 0.75
#> 
#> $sensitivity
#> [1] 0.5
#> 
#> $specificity
#> [1] 0.25
#> 
#> $d_prime
#> [1] 0.6744898
#> 
#> $beta
#> [1] 1.255418
#> 
#> attr(,"class")
#> [1] "metacheck_accuracy_measures"

C.16 General utilities

email(email) — set the contact email used for polite API access. online(url) — is a URL reachable? verbose(verbose) — get/set whether functions print progress. pb(total) — a progress bar. rep_if(x, y, replace) — replace values equal to y with replace. path_sanitize(path) — make a string safe to use as a file path. message(...) — Metacheck’s message wrapper. logger(), logpath(), lastlog() — write and read run logs.

rep_if(c(1, NA, 3), NA, 0)

#> [1]  1 NA  3

path_sanitize("results: study/1?.csv")

#> [1] "results_study/1_.csv"

email("you@example.org")
online("https://osf.io")

D Internal functions

These are not part of the public interface and may change without notice. They are listed so the documentation is complete. Several are exported only for testing and begin with a dot.

#>  [1] ".aspredicted_info"             ".batch_query"                 
#>  [3] ".bibr_isalive"                 ".bibr_request_scivrs"         
#>  [5] ".bibr_request_selfhosted"      ".bibr_save_result"            
#>  [7] ".bibtype_convert"              ".coerce_bib_authors"          
#>  [9] ".crossref_parse_item"          ".crossref_query_parse"        
#> [11] ".detect_live_data"             ".github_config"               
#> [13] ".grobid_isalive"               ".grobid_to_bibr"              
#> [15] ".is_paper"                     ".is_paper_list"               
#> [17] ".llm_model_list_groq"          ".llm_ollama_native"           
#> [19] ".onAttach"                     ".onLoad"                      
#> [21] ".openalex_add_abstract"        ".osf_file_data"               
#> [23] ".osf_headers"                  ".osf_info"                    
#> [25] ".osf_node_data"                ".osf_parent_project"          
#> [27] ".osf_parse_response"           ".osf_pat_validate"            
#> [29] ".osf_preprint_data"            ".osf_reg_data"                
#> [31] ".osf_user_data"                ".paper_coerce"                
#> [33] ".paper_schema"                 ".papers_cache_dir"            
#> [35] ".papers_release_assets"        ".parse_author_string"         
#> [37] ".process_full_text"            ".rbox_info"                   
#> [39] ".read_bibr"                    ".regcheck_app_dir"            
#> [41] ".regcheck_poll"                ".regcheck_sanitize"           
#> [43] ".regcheck_submit"              ".regcheck_submit_multipart"   
#> [45] ".regcheck_venv_dir"            ".tei_authors"                 
#> [47] ".tei_bib"                      ".tei_text"                    
#> [49] ".tei_url"                      ".tei_xrefs"                   
#> [51] ".unnest_result"                ".xml_find_text"               
#> [53] ".xml_find1_text"               ".xml_read_grobid"             
#> [55] ".xml2bib"                      ".zenodo_id"                   
#> [57] ".zenodo_info"                  "[.scivrs_paperlist"           
#> [59] "module_find"                   "print.metacheck_module_help"  
#> [61] "print.metacheck_module_list"   "print.metacheck_module_output"
#> [63] "print.scivrs_paper"            "print.scivrs_paperlist"

#> 64 internal functions.