Converts document files (PDF, DOC, DOCX) to structured JSON using the bibr
extraction service. Supports two backends: the Scienceverse platform
("scivrs") which uses a job queue with load balancing, and a
self-hosted bibr instance ("selfhosted") for direct API access.
Usage
convert_bibr(
file_path,
save_path = ".",
backend = c("auto", "scivrs", "selfhosted"),
api_key = NULL,
api_url = NULL,
include_figures = FALSE,
start_page = 1,
end_page = Inf,
poll_interval = 2,
timeout = 600
)Arguments
- file_path
Path to the document file, or a directory of documents
- save_path
Path to a directory in which to save the JSON file
- backend
Which backend to use:
"auto"(default) detects from the available API key,"scivrs"uses the Scienceverse platform,"selfhosted"uses a direct bibr API instance.- api_key
API key (scivrs backend only). A Bearer token starting with
sv_, defaults to theSCIVRS_API_KEYenv var. Ignored for the"selfhosted"backend, which requires no authentication.- api_url
Base URL of the API. Defaults to the appropriate URL for the selected backend.
- include_figures
Whether to include base64-encoded figure images in the output (default FALSE)
- start_page
First page of the file to extract (default 1)
- end_page
Last page of the file to extract (default Inf for all pages)
- poll_interval
Seconds between status polls, scivrs backend only (default 2)
- timeout
Maximum seconds to wait for processing, scivrs backend only (default 600)
Details
When backend = "auto" (the default), the "scivrs" backend is
used if api_key is provided or the SCIVRS_API_KEY environment
variable is set. Otherwise, "selfhosted" is used (no authentication
required).
Examples
if (FALSE) { # \dontrun{
# Auto-detect backend from environment variables
pdf <- demofile("pdf")
convert_bibr(pdf)
# Explicitly use Scienceverse platform
convert_bibr(pdf, backend = "scivrs")
# Use self-hosted bibr instance
convert_bibr(pdf, backend = "selfhosted")
# Extract specific pages
convert_bibr(pdf, start_page = 1, end_page = 10)
# Directory of papers
dir <- system.file("demo", package = "metacheck")
convert_bibr(dir, save_path = "results/")
} # }
