20  Repository Check

20.1 What it checks

The repo_check module finds links to research repositories in a paper (OSF, GitHub, ResearchBox, Zenodo), retrieves the list of files in each, and reports what is shared — for example whether there are data and code files, and whether the repository includes a README.

Note

This module makes live network calls to the linked repositories. You need an internet connection to run the code below.

20.2 Running the module

demopaper() links to several repositories.

paper <- demopaper()
mo <- module_run(paper, "repo_check")
mo$traffic_light
#> [1] "yellow"
cat(mo$summary_text)
#> 
#> -  We found 14 files in 3 repositories.
#> -  We found 1 README file and 2 repositories without READMEs.
#> -  We found 1 archive file.

The table lists every file found across all linked repositories:

mo$table[, c("repo_name", "file_name", "file_type", "file_size")] |>
  head(10) |>
  knitr::kable()
repo_name file_name file_type file_size
48ncu README readme 30
48ncu papercheck.png image 1564236
48ncu test_script.R code 183
629bx bad.R code 172
629bx example2.csv data 5
629bx bad.Rmd code 413
629bx Archive.zip archive 6410
4377 Code/Study 1.r code 183
4377 Codebook/Study 1 - csv - dataset 1.csv___CODEBOOK.csv data 96
4377 Codebook/Study 1.csv___CODEBOOK.csv data 96

20.3 Running on many papers

mo <- module_run(psychsci[1:10], "repo_check")
mo$summary_table

20.4 What it checks for

For every repository it finds, repo_check builds a unified file list and then assesses three things:

  • A README. Each repository should document its contents. The module looks for a file whose name contains “readme” (or “read me”). A repository without one is flagged, because readers have no map of what the shared files are.
  • Archive (zip) files. Files bundled into a .zip (or similar archive) are flagged, because their contents cannot be inspected, indexed, or reused without downloading and unpacking them. The module reports the archive names and suggests uploading the files individually.
  • What is shared. It counts the data files, code files, README files, and archives in each repository (the files_data, files_code, files_readme, and files_zip columns of the summary table), so you can see at a glance whether data and code were actually shared.

The traffic light follows from these checks:

  • green — repositories were found, every one has a README, and there are no archive files.
  • yellow — at least one repository is missing a README, or one or more archive files were found.
  • na — no repository links were found in the paper at all.

20.5 A clean example and one with problems

The demo paper links to three repositories, and running repo_check on it produces a yellow light — there are problems to address. The summary text reports exactly what:

mo$traffic_light
#> [1] "yellow"
cat(mo$summary_text)
#> 
#> -  We found 14 files in 3 repositories.
#> -  We found 1 README file and 2 repositories without READMEs.
#> -  We found 1 archive file.

Here the issues are a missing README and an archive file whose contents could not be examined. The per-paper summary table counts each file type — note files_readme is 0 (no README was shared) and files_zip is 1 (one archive):

mo$summary_table |>
  knitr::kable()
paper_id repo_n files_n files_data files_code files_readme files_zip
to_err_is_human 2 11 6 3 0 1

We can see the specific archive file that was flagged in the full file table:

mo$table[mo$table$file_type %in% "archive", c("repo_name", "file_name")] |>
  knitr::kable()
repo_name file_name
7 629bx Archive.zip

A clean result, by contrast, would include a README in every linked repository (files_readme ≥ 1 for each) and share files individually rather than as a zip (files_zip of 0) — producing a green light with no items to address.

20.6 Options

repo_check accepts arguments for working with local files instead of (or in addition to) online repositories:

# also include files from a local folder
module_run(paper, "repo_check", local_path = "path/to/downloaded/files")

# only check a local folder, skip all online lookups
module_run(paper, "repo_check", local_path = "path/to/files", local_only = TRUE)

local_only = TRUE is useful when you cannot or do not want to contact external services — for example when checking a reviewer submission you downloaded as a zip. See the Local Files chapter for details.