28 Checking Local Files with repo_check and code_check

library(metacheck)

28.1 Overview

Metacheck has two modules that work together to check research code:

repo_check lists all files in a repository and checks for basic organisational quality: is there a README? Are there unextracted zip files? How many code, data, and other files are there?
code_check reads the actual code files and checks them for four best-practice issues: comments, absolute file paths, grouped library loading, and missing data files.

Normally both modules fetch files automatically from online repositories (OSF, GitHub, ResearchBox, Zenodo) by following links found in a paper. This vignette shows how to point them at a local folder instead — useful when:

you want to check your own code before sharing it
you are reviewing a paper whose files were sent as a zip attachment
the authors used a repository service that Metacheck does not yet support automatically (GitLab, Figshare, Dataverse, and others)

28.2 Before you start

Load the package and note the path format for your operating system:

library(metacheck)

Operating system	Example path
Windows	`"C:/Users/researcher/Documents/my_study"`
macOS / Linux	`"/Users/researcher/Documents/my_study"`

Both forward slashes / and backslashes \\ work on Windows inside R.

When you provide a local path, the folder is treated as a repository in exactly the same way as an OSF project or GitHub repo. Each folder gets its own row in the summary tables and its own traffic light.

28.3 Part 1: `repo_check` on local files

28.3.1 What `repo_check` checks

For each folder you provide, repo_check will:

List all files and classify them by type (code, data, archive, readme, …)
Check for a README — a README file is good practice and helps others understand the contents of a folder
Flag zip archives — if the folder contains .zip or other archive files, repo_check will flag them because their contents cannot be inspected
Report counts — how many code files, data files, and so on

The traffic light is "green" when a README is present and there are no unextracted archives, and "yellow" otherwise.

28.3.2 Checking a single folder

repo <- module_run(test_paper(), "repo_check",
                   local_path = "C:/projects/reaction_time_study_2024")

# Plain-text summary
repo$summary_text

# Summary counts (one row per folder)
repo$summary_table

# "green" or "yellow"
repo$traffic_light

# Full file list — one row per file, with file_type and file_size columns
repo$table

test_paper() creates a minimal paper object with no repository links. The module pipeline always requires a paper argument, but when you are only checking a local folder there is no manuscript — test_paper() gives you a valid object that causes the online repository search to find nothing, so only the local files are checked.

Note on cloud-sync folders (OneDrive, iCloud, Dropbox): If the folder is stored online and files have not been downloaded locally, the check will be slow because each code file triggers a download when it is read. Before running, right-click the folder in your file browser and choose “Always keep on this device” (OneDrive) or “Download Now” (iCloud) to make all files available locally first.

A typical $summary_text when the folder is missing a README might look like:

We found 8 files in 1 repository.
We found 0 README files and 1 repository without READMEs.

And $summary_table will have columns: repo_n, files_n, files_code, files_data, files_readme, files_zip.

28.3.3 Checking multiple folders

Pass a vector of paths to check several folders at once. Each is treated as a separate repository, so you get one summary row per folder:

paths <- c(
  "C:/projects/study_1_power_analysis",
  "C:/projects/study_2_replication",
  "C:/projects/study_3_meta_analysis"
)

repo <- module_run(test_paper(), "repo_check", local_path = paths)

# Three rows — one per folder
repo$summary_table

# All files from all three folders, with repo_url identifying the source
repo$table

This is a quick way to compare whether several projects all have READMEs and no stray zip files before you submit or share them.

28.4 Part 2: `code_check` on local files

28.4.1 What `code_check` checks

code_check builds on repo_check: it first runs the file inventory, then reads every code file (R, Rmd, Qmd, SAS, SPSS, Stata) and checks for:

Comments — is the code explained? A file with no comment lines at all is flagged.
Absolute paths — paths like C:/myname/files/data.csv only work on one specific computer and will break for anyone else trying to reproduce the analysis.
Library loading — are all library() / require() calls grouped together near the top of the script, or are they scattered throughout?
Missing data files — if the script loads read.csv("data.csv") but data.csv is not in the folder, that is flagged.

28.4.2 Checking your own code

You have finished a project and want to review the quality of your scripts before sharing:

result <- module_run(test_paper(), "code_check",
                     local_path = "C:/projects/reaction_time_study_2024")

# Short summary of all findings
result$summary_text

# One row per code file with detailed metrics
result$table

# "green" (no issues), "yellow" (some issues), or "na" (no code files found)
result$traffic_light

# Full formatted report
result$report

Note on cloud-sync folders: If your files are on OneDrive, iCloud, or another cloud-sync service and are not fully downloaded, you will see the message “If folders are stored online, the check might be slow as all files need to be downloaded.” In that case, right-click the folder and choose “Always keep on this device” (OneDrive) or “Download Now” (iCloud) before running the check.

28.4.3 Checking multiple projects

paths <- c(
  "C:/projects/study_1_power_analysis",
  "C:/projects/study_2_replication",
  "C:/projects/study_3_meta_analysis"
)

result <- module_run(test_paper(), "code_check", local_path = paths)

result$summary_text

# Filter to one folder using dplyr
library(dplyr)

result$table |>
  filter(grepl("study_2", repo_url)) |>
  select(file_name, percentage_comment, code_abs_path, loaded_files_missing)

The repo_url column contains the folder path, so you can always identify which folder a code file came from. Each folder has its own independent file_limit (default 20 files).

28.4.4 Checking more than 20 code files

By default code_check analyses at most 20 code files per folder. This prevents accidentally processing hundreds of files in, say, an R package repository. If your folder has more code files than that and you want them all checked, raise the limit or remove it entirely:

# analyse up to 100 code files per folder
result <- module_run(test_paper(), "code_check",
                     local_path = "C:/projects/large_project",
                     file_limit = 100)

# no limit — check every code file found
result <- module_run(test_paper(), "code_check",
                     local_path = "C:/projects/large_project",
                     file_limit = Inf)

The summary text will tell you if the limit was applied: "Only the first 20 files per repository were analysed." If you see this message and want complete results, re-run with a higher file_limit.

28.4.5 Peer review — files sent as a zip archive

When reviewing a paper, authors sometimes supply code and data as a zip file attached to the submission. Once you unzip it, you have a local folder:

# After unzipping the submission to a folder:
result <- module_run(test_paper(), "code_check",
                     local_path = "C:/peer_review/garcia_2024_submission")

result$report   # copy this into your review

28.4.6 Code hosted on an unsupported repository

Metacheck currently retrieves files automatically from OSF, GitHub, ResearchBox, and Zenodo. Many other services exist that are not yet supported. If the paper links to one of these, download the files manually and point code_check at the downloaded folder.

The workflow is the same for all of them:

Go to the link in the paper
Download the files (usually a “Download all” button, or clone with git)
Unzip if needed
Run code_check

# Author shared code on GitLab — you downloaded or cloned it locally
result <- module_run(test_paper(), "code_check",
                     local_path = "C:/downloads/lee_2023_gitlab_code")

result$summary_text
result$traffic_light

28.5 Part 3: Combining a paper with local files

Sometimes a paper links to an online repository that Metacheck supports, but there are also supplementary files that were shared separately — for example, the main data is on OSF but additional analysis scripts were sent as a zip by the editor. You can check both at once:

paper <- read("garcia_2024.pdf")

# repo_check fetches the OSF files AND lists the local folder
result <- module_run(paper, "code_check",
                 local_path = "C:/peer_review/garcia_2024_extra_scripts")

# Online files have a URL in repo_url; local files have the folder path
unique(result$table$repo_url)

repo_check reports on each repository separately, so you can see at a glance whether the README and zip issues are in the online repo or the local folder:

repo <- module_run(paper, "repo_check",
                   local_path = "C:/peer_review/garcia_2024_extra_scripts")

# One row per repository — the OSF project and the local folder
repo$summary_table
repo$traffic_light

28.6 Part 4: Running `repo_check` and `code_check` as a two-step pipeline

By default, code_check runs repo_check internally and you do not see its output. If you want to inspect the file inventory before the code analysis, or save the repo_check results separately, run them as two steps. When you pass a repo_check result as the first argument to code_check, the file list is reused — the folder is not re-read:

path <- "C:/projects/reaction_time_study_2024"

# Step 1: inventory and organisational checks
repo <- module_run(test_paper(), "repo_check", local_path = path)

repo$summary_text    # how many files, README present?
repo$traffic_light   # green / yellow
repo$table           # full file list with types and sizes

# Step 2: code quality analysis (reuses the file list from step 1)
code <- module_run(repo, "code_check")

code$summary_text    # code-specific summary
code$table           # one row per code file
code$traffic_light

This is especially useful when the folder is on a cloud-sync drive: any slow file downloads happen once (during code_check), not twice.

28.7 Understanding the output

28.7.1 `repo_check` output

Element	What it contains
`$traffic_light`	`"green"` (README present, no zips), `"yellow"` (issues), or `"na"` (no repos)
`$summary_text`	Plain-text summary: file counts, README status, zip files
`$summary_table`	One row per folder: `repo_n`, `files_n`, `files_code`, `files_data`, `files_readme`, `files_zip`
`$table`	One row per file: `repo_url`, `file_name`, `file_url`, `file_location`, `file_size`, `file_type`
`$report`	Formatted report explaining any README or zip issues

28.7.2 `code_check` output

Element	What it contains
`$traffic_light`	`"green"` (no issues), `"yellow"` (issues found), or `"na"` (no code files)
`$summary_text`	Plain-text summary of all four checks across all files
`$summary_table`	Counts: `code_n`, `code_abs_path`, `code_missing_files`, `code_min_comments`
`$table`	One row per code file with detailed metrics (see below)
`$report`	Full formatted report suitable for pasting into a review

Key columns in code_check’s $table:

Column	What it means
`file_name`	Script filename
`repo_url`	Which folder (or online repo) the file came from
`language`	Detected language: R, SAS, SPSS, or Stata
`percentage_comment`	Share of non-blank lines that are comments (0 = no comments at all)
`loaded_files_missing`	Number of data files the script tries to load that are not in the folder
`loaded_files_missing_names`	The names of those missing files
`code_abs_path`	Number of absolute file paths detected
`library_max_between`	Largest gap in lines between consecutive `library()` calls

28.8 Downloading OSF files to check locally

Often the files you want to check live in an OSF project rather than on your own disk. You can download them first, then point repo_check / code_check at the downloaded folder using the local-file options above.

OSF projects let you organise information into nested components, and files within those components. To retrieve all of the files associated with a project by hand, you may need to navigate to several components, download a zip for each, then reorganise and rename the downloaded folders.

The function osf_file_download() does all of this for you, recreating a folder structure based on the component names and downloading all files smaller than max_file_size (defaults to 10 MB) up to a total size of max_download_size (defaults to 100 MB).

osf_file_download(osf_id = "pngda",
                  download_to = ".",
                  max_file_size = 1,
                  max_download_size = 10)

Starting retrieval for pngda
- omitting metacheck.png (1.5MB)
Downloading files [=====================] 24/24 00:00:35

The files are written into a folder named after the OSF id, preserving the project’s nested structure:

list.files("pngda", recursive = TRUE)

#>  [1] "Data/Individual/data-01.csv"                         
#>  [2] "Data/Individual/data-02.csv"                         
#>  [3] "Data/Individual/data-03.csv"                         
#>  [4] "Data/Individual/data-04.csv"                         
#>  [5] "Data/Individual/data-05.csv"                         
#>  [6] "Data/Individual/data-06.csv"                         
#>  [7] "Data/Individual/data-07.csv"                         
#>  [8] "Data/Individual/data-08.csv"                         
#>  [9] "Data/Individual/data-09.csv"                         
#> [10] "Data/Individual/data-10.csv"                         
#> [11] "Data/Individual/data-11.csv"                         
#> [12] "Data/Individual/data-12.csv"                         
#> [13] "Data/Individual/data-13.csv"                         
#> [14] "Data/Individual/data-14.csv"                         
#> [15] "Data/Processed Data/processed-data.csv"              
#> [16] "Data/Raw Data/data.xlsx"                             
#> [17] "Data/Raw Data/nest-1/nest-2/nest-3/nest-4/test-4.txt"
#> [18] "Data/Raw Data/nest-1/nest-2/nest-3/test-3.txt"       
#> [19] "Data/Raw Data/nest-1/nest-2/test-2.txt"              
#> [20] "Data/Raw Data/nest-1/README"                         
#> [21] "Data/Raw Data/nest-1/test-1.txt"                     
#> [22] "Data/Raw Data/README"                                
#> [23] "README"

Once downloaded, you can run the checks on that folder exactly as shown earlier:

module_run(test_paper(), "code_check", local_path = "pngda", local_only = TRUE)