Expand a JSON column — json_expand • metacheck

It is useful to ask an LLM to return data in JSON structured format, but can be frustrating to extract the data, especially where the LLM makes syntax mistakes. This function tries to expand a column with a JSON-formatted response into columns and deals with it gracefully (sets an 'error' column to "parsing error") if there are errors. It also fixes column data types, if possible.

Usage

json_expand(table, col = "answer", suffix = c("", ".json"))

Arguments

table: the table with a column to expand
col: the name or index of the column to expand (defaults to "answer" or the first column)
suffix: the suffix for the extracted columns if they conflict with names in the table

Value

the table plus the expanded columns

Examples

table <- data.frame(
  paper_id = 1:5,
  answer = c(
    '{"number": "1", "letter": "A", "bool": true}',
    '{"number": "2", "letter": "B", "bool": "FALSE"}',
    '{"number": "3", "letter": "", "bool": null}',
    "oh no, the LLM misunderstood",
    '{"number": "5", "letter": ["E", "F"], "bool": false}'
  )
)

expanded <- json_expand(table, "answer")
expanded
#>   paper_id                                               answer number letter
#> 1        1         {"number": "1", "letter": "A", "bool": true}      1      A
#> 2        2      {"number": "2", "letter": "B", "bool": "FALSE"}      2      B
#> 3        3          {"number": "3", "letter": "", "bool": null}      3       
#> 4        4                         oh no, the LLM misunderstood     NA   <NA>
#> 5        5 {"number": "5", "letter": ["E", "F"], "bool": false}      5    E;F
#>    bool         error
#> 1  TRUE          <NA>
#> 2 FALSE          <NA>
#> 3    NA          <NA>
#> 4    NA parsing error
#> 5 FALSE          <NA>