It is useful to ask an LLM to return data in JSON structured format, but can be frustrating to extract the data, especially where the LLM makes syntax mistakes. This function tries to expand a column with a JSON-formatted response into columns and deals with it gracefully (sets an 'error' column to "parsing error") if there are errors. It also fixes column data types, if possible.
Usage
json_expand(table, col = "answer", suffix = c("", ".json"))Examples
table <- data.frame(
paper_id = 1:5,
answer = c(
'{"number": "1", "letter": "A", "bool": true}',
'{"number": "2", "letter": "B", "bool": "FALSE"}',
'{"number": "3", "letter": "", "bool": null}',
"oh no, the LLM misunderstood",
'{"number": "5", "letter": ["E", "F"], "bool": false}'
)
)
expanded <- json_expand(table, "answer")
expanded
#> paper_id answer number letter
#> 1 1 {"number": "1", "letter": "A", "bool": true} 1 A
#> 2 2 {"number": "2", "letter": "B", "bool": "FALSE"} 2 B
#> 3 3 {"number": "3", "letter": "", "bool": null} 3
#> 4 4 oh no, the LLM misunderstood NA <NA>
#> 5 5 {"number": "5", "letter": ["E", "F"], "bool": false} 5 E;F
#> bool error
#> 1 TRUE <NA>
#> 2 FALSE <NA>
#> 3 NA <NA>
#> 4 NA parsing error
#> 5 FALSE <NA>
