Database-creation

library(REDCapCAST)

Two different ways to create a data base

REDCapCAST provides two approaches to creating a data dictionary aimed at helping out in two different cases:

  1. Easily create a REDCap data base from an existing data set.

  2. Create a table in Word describing a variables in a data base and use this to create a data base.

In the following I will try to come with a few suggestions on how to use these approaches.

Easy data set to data base workflow

The first iteration of a dataset to data dictionary function is the ds2dd(), which creates a very basic data dictionary with all variables stored as text. This is sufficient for just storing old datasets/spreadsheets securely in REDCap.

d1 <- mtcars |>
  dplyr::mutate(record_id = seq_len(dplyr::n())) |>
  ds2dd() 

d1 |>
  gt::gt()
field_name form_name section_header field_type field_label select_choices_or_calculations field_note text_validation_type_or_show_slider_number text_validation_min text_validation_max identifier branching_logic required_field custom_alignment question_number matrix_group_name matrix_ranking field_annotation
record_id basis NA text record_id NA NA NA NA NA NA NA NA NA NA NA NA NA
mpg basis NA text mpg NA NA NA NA NA NA NA NA NA NA NA NA NA
cyl basis NA text cyl NA NA NA NA NA NA NA NA NA NA NA NA NA
disp basis NA text disp NA NA NA NA NA NA NA NA NA NA NA NA NA
hp basis NA text hp NA NA NA NA NA NA NA NA NA NA NA NA NA
drat basis NA text drat NA NA NA NA NA NA NA NA NA NA NA NA NA
wt basis NA text wt NA NA NA NA NA NA NA NA NA NA NA NA NA
qsec basis NA text qsec NA NA NA NA NA NA NA NA NA NA NA NA NA
vs basis NA text vs NA NA NA NA NA NA NA NA NA NA NA NA NA
am basis NA text am NA NA NA NA NA NA NA NA NA NA NA NA NA
gear basis NA text gear NA NA NA NA NA NA NA NA NA NA NA NA NA
carb basis NA text carb NA NA NA NA NA NA NA NA NA NA NA NA NA

The more advanced ds2dd_detailed() is a natural development. It will try to apply the most common data classes for data validation and will assume that the first column is the id number. It outputs a list with the dataset with modified variable names to comply with REDCap naming conventions and a data dictionary.

The dataset should be correctly formatted for the data dictionary to preserve as much information as possible.

d2 <- REDCapCAST::redcapcast_data |> 
  dplyr::mutate(record_id = seq_len(dplyr::n()),
                region=factor(region)) |>
  dplyr::select(record_id, dplyr::everything()) |>
  (\(.x){
    .x[!grepl("_complete$",names(.x))]
  })() |> 
  (\(.x){
    .x[!grepl("^redcap",names(.x))]
  })() |>  
  ds2dd_detailed() |> 
  purrr::pluck("meta") 

d2 |> 
  gt::gt()
field_name form_name section_header field_type field_label select_choices_or_calculations field_note text_validation_type_or_show_slider_number text_validation_min text_validation_max identifier branching_logic required_field custom_alignment question_number matrix_group_name matrix_ranking field_annotation
record_id NA NA text record_id NA NA NA NA NA NA NA NA NA NA NA NA NA
cpr NA NA text cpr NA NA NA NA NA NA NA NA NA NA NA NA NA
inclusion NA NA text inclusion NA NA date_dmy NA NA NA NA NA NA NA NA NA NA
inclusion_time NA NA text inclusion_time NA NA time_hh_mm_ss NA NA NA NA NA NA NA NA NA NA
dob NA NA text dob NA NA date_dmy NA NA NA NA NA NA NA NA NA NA
age NA NA text age NA NA number NA NA NA NA NA NA NA NA NA NA
age_integer NA NA text age_integer NA NA number NA NA NA NA NA NA NA NA NA NA
sex NA NA text sex NA NA NA NA NA NA NA NA NA NA NA NA NA
cohabitation NA NA text cohabitation NA NA NA NA NA NA NA NA NA NA NA NA NA
hypertension NA NA text hypertension NA NA NA NA NA NA NA NA NA NA NA NA NA
diabetes NA NA text diabetes NA NA NA NA NA NA NA NA NA NA NA NA NA
region NA NA radio region 1, East | 2, North | 3, South NA NA NA NA NA NA NA NA NA NA NA NA
mrs_assessed NA NA text mrs_assessed NA NA NA NA NA NA NA NA NA NA NA NA NA
mrs_date NA NA text mrs_date NA NA date_dmy NA NA NA NA NA NA NA NA NA NA
mrs_score NA NA text mrs_score NA NA number NA NA NA NA NA NA NA NA NA NA
event_datetime NA NA text event_datetime NA NA datetime_dmy NA NA NA NA NA NA NA NA NA NA
event_age NA NA text event_age NA NA NA NA NA NA NA NA NA NA NA NA NA
event_type NA NA text event_type NA NA NA NA NA NA NA NA NA NA NA NA NA

Additional specifications to the DataDictionary can be made manually, or it can be uploaded and modified manually in the graphical user interface on the REDCap server.

Data base from table

…instructions and examples are coming…

Meta data and data upload

Now the DataDictionary can be exported as a spreadsheet and uploaded or it can be uploaded using the REDCapR package (only projects with “Development” status).

Use one of the two approaches below:

Manual upload

write.csv(dd_ls$meta, "datadictionary.csv")

Upload with REDCapR

REDCapR::redcap_metadata_write(
  dd_ls$meta,
  redcap_uri = keyring::key_get("DB_URI"),
  token = keyring::key_get("DB_TOKEN")
)

In the “REDCap R Handbook” more is written on interfacing with REDCap in R using the library(keyring)to store credentials in chapter 1.1.

Step 4 - Data upload

The same two options are available for data upload as meta data upload: manual or through REDCapR.

Only the latter is shown here.

REDCapR::redcap_write(
  dd_ls$data,
  redcap_uri = keyring::key_get("DB_URI"),
  token = keyring::key_get("DB_TOKEN")
)