This vignette walks you through the key workflows for defining and working with ADaM specifications: loading and editing a single domain, assembling a multi-domain study, propagating metadata across domains, applying conditional includes, and producing flat output for downstream tooling.
Each ADaM domain is defined in a YAML file. The simplest case is a subject-level dataset like ADSL, which only has columns:
id: ADSL
label: Subject-Level Analysis Dataset
class: SUBJECT LEVEL ANALYSIS DATASET
structure: One record per subject
keys: USUBJID
population:
base:
- domain: DM
depends: [USUBJID]
filter: USUBJID != ""
columns:
- id: STUDYID
label: Study Identifier
method: DM.STUDYID
core: Req
- id: USUBJID
label: Unique Subject Identifier
method: DM.USUBJID
core: Req
# ... more columns ...Top-level keys define the domain identity (id,
label, class, structure,
keys). The population block declares which
source domains supply raw data and what row-level filters apply. The
mighty code generator reads this to build the initial population step;
mighty.metadata stores it but does not execute it.
Each column has an id, label, and
core conformance level: Req (Required),
Cond (Conditionally Required), or Perm
(Permissible). The method field describes how the column is
derived. A DOMAIN.COLUMN pattern (e.g.,
DM.STUDYID) means the column is a predecessor — its
metadata can be inherited from the referenced source via
populate_sparse().
BDS (Basic Data Structure) domains like ADVS add
parameters and rows:
id: ADVS
label: Vital Signs Analysis Dataset
keys: [USUBJID, PARAMCD, AVISITN]
parameters:
- id: BMI
label: Body Mass Index (kg/m^2)
columns:
- id: AVAL
method: Derived from height and weight
rows:
- id: BASELINE
method: Add baseline visit as a new rowSee vignette("adam-schema") for the full schema
reference.
The package provides a consistent set of verbs for columns, parameters, and rows:
| Action | Columns | Parameters | Rows |
|---|---|---|---|
| List | list_columns() |
list_parameters() |
list_rows() |
| Select | select_column() |
select_parameter() |
select_row() |
| Add | add_column() |
add_parameter() |
add_row() |
| Update | update_column() |
update_parameter() |
update_row() |
| Move | move_column() |
move_parameter() |
move_row() |
| Remove | remove_columns() |
remove_parameters() |
remove_rows() |
The remove_* functions accept a character vector to
remove multiple items at once.
Load a domain specification from a YAML file with
mighty_domain(). The file is validated against the ADaM
JSON schema on load.
path <- system.file("examples", "advs.yml", package = "mighty.metadata")
advs <- mighty_domain(path)
advs
#> <mighty.metadata::mighty_domain>
#> ADVS: Vital Signs Analysis Dataset
#> Class: BASIC DATA STRUCTURE
#> Keys: USUBJID, PARAMCD, and AVISITNUse list_*() functions to see what the specification
contains:
list_columns(advs)
#> [1] "STUDYID" "USUBJID" "SAFFL" "TRTP" "VISITNUM" "AVISITN"
#> [7] "AVISIT" "PARAMCD" "PARAM" "AVAL" "AVALC"
list_parameters(advs)
#> [1] "BMI" "BMIGRP"
list_rows(advs)
#> [1] "BASELINE"Drill into a specific column with select_column():
Every modification automatically re-validates the domain against the schema. All column functions return the modified domain, so they compose naturally into a pipe chain. Here we add an actual treatment column sourced from ADSL, update a label, and drop an unused column:
advs <- mighty_domain(path) |>
add_column(
id = "TRTA",
label = "Actual Treatment",
method = "ADSL.TRT01A",
.pos = 5
) |>
update_column(id = "AVAL", label = "Analysis Value (Numeric)") |>
remove_columns(id = "AVALC")
list_columns(advs)
#> [1] "STUDYID" "USUBJID" "SAFFL" "TRTP" "TRTA" "VISITNUM"
#> [7] "AVISITN" "AVISIT" "PARAMCD" "PARAM" "AVAL"Validation runs on every modification and on initial load. You can
also call validate() explicitly at any time:
If a modification violates the schema, you get an immediate error. For example, adding a column with a duplicate ID fails:
Parameters use the same verbs. The key difference is the
columns argument in add_parameter(), which
accepts a nested list of column overrides specific to that
parameter:
select_parameter(advs, id = "BMI") |>
str()
#> List of 3
#> $ id : chr "BMI"
#> $ label : chr "Body Mass Index (kg/m^2)"
#> $ columns:List of 1
#> ..$ :List of 2
#> .. ..$ id : chr "AVAL"
#> .. ..$ method: chr "Derived from height and weight"advs <- advs |>
add_parameter(
id = "WSTCIR",
label = "Waist Circumference (cm)",
columns = list(
list(id = "AVAL", method = "VS.VSSTRESN")
)
)
list_parameters(advs)
#> [1] "BMI" "BMIGRP" "WSTCIR"Update and remove work as expected:
Rows follow the same pattern. Inspect a row with
select_row():
Load all domain specifications from a directory with
mighty_study(). The directory can contain
_study.yml (study-level properties) and
_mighty.yml (mighty framework configuration).
study_path <- system.file("examples", package = "mighty.metadata")
study <- mighty_study(study_path)
study
#> <mighty.metadata::mighty_study/list/S7_object>
#> @ mighty: `external_data`
#> @ study: `study_id`
#> $ ADAE: <mighty.metadata::mighty_domain>
#> $ ADSL: <mighty.metadata::mighty_domain>
#> $ ADVS: <mighty.metadata::mighty_domain>Access individual domains with $. Study-level properties
from _study.yml are stored in @study and
mighty framework configuration from _mighty.yml is stored
in @mighty. The @ operator accesses properties
of S7 objects:
names(study)
#> [1] "ADAE" "ADSL" "ADVS"
str(study@study)
#> List of 1
#> $ study_id: chr "example_study"
str(study@mighty)
#> List of 1
#> $ external_data:List of 3
#> ..$ :List of 2
#> .. ..$ id : chr "DM"
#> .. ..$ keys: chr [1:2] "STUDYID" "USUBJID"
#> ..$ :List of 2
#> .. ..$ id : chr "VS"
#> .. ..$ keys: chr [1:2] "STUDYID" "USUBJID"
#> ..$ :List of 2
#> .. ..$ id : chr "AE"
#> .. ..$ keys: chr [1:2] "STUDYID" "USUBJID"The _study.yml file provides the study_id.
The _mighty.yml file provides external_data
definitions (source domains and their keys).
A “core variable” is an ADSL column that should appear in every consumer domain (ADVS, ADAE, etc.) as a predecessor column — for example, SEX and RACE for subgroup analyses.
Note: core (string: Req/Cond/Perm) records the ADaM
conformance level and is unrelated to “core variable” propagation. The
two fields that control propagation are:
is_core (Boolean, column-level in ADSL) — marks a
column for propagation to consumer domains.usecore (Boolean, domain-level) — signals that a domain
should receive the propagated columns. This is a top-level YAML property
on the consumer domain, so it is set via list assignment rather than
update_column().populate_core() reads these flags and adds the marked
ADSL columns to each consumer domain as predecessor columns.
The bundled examples do not include these fields, so we add them here to demonstrate the workflow:
study$ADSL <- study$ADSL |>
update_column(id = "SEX", is_core = TRUE) |>
update_column(id = "RACE", is_core = TRUE)
study$ADAE[["usecore"]] <- TRUE
study <- study |>
populate_core()
list_columns(study$ADAE)
#> [1] "STUDYID" "USUBJID" "AESEQ" "AETERM" "AEDECOD" "AEBODSYS"
#> [7] "ASTDT" "AENDT" "TRTEMFL" "SEX" "RACE"SEX and RACE now appear in ADAE as predecessor columns sourced from ADSL.
Columns that reference another domain (e.g.,
method: ADSL.SAFFL) can inherit metadata from the
referenced column. populate_sparse() performs this lookup
across the study, filling in only missing properties.
Here is the full pipeline in one block:
study <- mighty_study(study_path)
# Mark ADSL core variables
study$ADSL <- study$ADSL |>
update_column(id = "SEX", is_core = TRUE) |>
update_column(id = "RACE", is_core = TRUE)
# Mark consumer domains
study$ADAE[["usecore"]] <- TRUE
# Run the pipeline
study <- study |>
populate_core() |>
populate_sparse()If your YAML files already include the is_core and
usecore flags, the entire pipeline collapses to a single
call. Passing populate = TRUE is equivalent to calling
populate_core() then populate_sparse() after
loading:
Write all domain files, _study.yml, and
_mighty.yml back to disk with
write_config():
out <- withr::local_tempdir()
write_config(study, path = out)
list.files(out)
#> [1] "_mighty.yml" "_study.yml" "adae.yml" "adsl.yml" "advs.yml"Omit path to write back to the original directory
(study@path).
Pooled specifications can serve multiple studies by using conditional
include fields. Conditions are wrapped in
{braces} and evaluated as R expressions (via
glue::glue_data()) against the study’s @study
values.
study <- mighty_study(study_path)
study$ADVS <- study$ADVS |>
update_column(
id = "STUDYID",
include = "{study_id == 'example_study'}"
)When study_id in @study matches, the
condition is TRUE and the column is kept:
resolved <- resolve_includes(study)
list_columns(resolved$ADVS)
#> [1] "STUDYID" "USUBJID" "SAFFL" "TRTP" "VISITNUM" "AVISITN"
#> [7] "AVISIT" "PARAMCD" "PARAM" "AVAL" "AVALC"Override with a different value and the column is removed:
resolved <- resolve_includes(study, info = list(study_id = "other"))
list_columns(resolved$ADVS)
#> [1] "USUBJID" "SAFFL" "TRTP" "VISITNUM" "AVISITN" "AVISIT"
#> [7] "PARAMCD" "PARAM" "AVAL" "AVALC"include works on parameters and rows too, not just
columns.
create_md_col() flattens the study’s column
specifications into a single tibble. This is the format consumed by
downstream mighty tools.
create_md_col(study)
#> # A tibble: 32 × 15
#> table_id table_label order id label origin key is_core core method
#> <chr> <chr> <int> <chr> <chr> <chr> <lgl> <lgl> <chr> <chr>
#> 1 ADAE Adverse Events … 1 STUD… Stud… <NA> FALSE NA Req AE.ST…
#> 2 ADAE Adverse Events … 2 USUB… Uniq… <NA> TRUE NA Req AE.US…
#> 3 ADAE Adverse Events … 3 AESEQ Sequ… <NA> TRUE NA Cond AE.AE…
#> 4 ADAE Adverse Events … 4 AETE… Repo… <NA> FALSE NA Req AE.AE…
#> 5 ADAE Adverse Events … 5 AEDE… Dict… <NA> FALSE NA Cond AE.AE…
#> 6 ADAE Adverse Events … 6 AEBO… Body… <NA> FALSE NA Cond AE.AE…
#> 7 ADAE Adverse Events … 7 ASTDT Anal… <NA> FALSE NA Cond Deriv…
#> 8 ADAE Adverse Events … 8 AENDT Anal… <NA> FALSE NA Cond Deriv…
#> 9 ADAE Adverse Events … 9 TRTE… Trea… <NA> FALSE NA Cond Deriv…
#> 10 ADSL Subject-Level A… 1 STUD… Stud… <NA> FALSE NA Req DM.ST…
#> # ℹ 22 more rows
#> # ℹ 5 more variables: codelist <chr>, format_type <chr>, format_length <int>,
#> # format_display <chr>, comment <chr>This vignette covered the core workflows:
mighty_domain() for single datasets,
mighty_study() for collections,
populate_core() and populate_sparse() for
metadata propagation, write_config() for saving changes,
resolve_includes() for conditional specifications, and
create_md_col() for flat output.
To learn more:
vignette("adam-schema") documents the domain YAML
schema referencevignette("study-schema") documents the study YAML
schema referencevignette("mighty-schema") documents the mighty YAML
schema reference