--- title: "Getting Started with mighty.component" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with mighty.component} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(mighty.component) ``` ## What is a mighty component? Components let you write a data transformation once and reuse it across studies by swapping variable names at render time. Instead of copying and modifying code for each new study, you maintain a single template. A mighty component is a reusable code template for a single, well-defined data transformation step. Components are commonly used to generate ADaM (Analysis Dataset Model) programs, but the concept is general: any parameterized R code snippet that reads a data set, modifies it, and writes it back can be expressed as a component. Components combine two ideas: - **Mustache templating** — placeholders like `{{{ domain }}}` are filled in at render time, so the same logic works across different data sets and variables. - **Roxygen-like documentation** — tags like `@title`, `@param`, and `@depends` describe what the component does, what it needs, and what it produces. Think of components as reusable building blocks: each one handles a single derivation or transformation, and you compose several of them to build a complete program. In the broader mighty ecosystem, `mighty.metadata` provides study-level configuration (via [`mighty_study()`](https://novonordisk-opensource.github.io/mighty.metadata/reference/mighty_study.html) and `_study.yml`) that can drive which components are rendered and with what parameters. ## Anatomy of a component template Below is a minimal component that doubles a column. Every tag is visible at a glance: ```r #' @title Double a variable #' @description #' Creates a new column that is twice the value of an existing column. #' #' @param domain `character` Name of the domain (data frame) #' @param input `character` Name of the existing column to double #' @param output `character` Name of the new column to create #' @type column #' @origin Derived #' @depends {{{domain}}} {{{input}}} #' @outputs {{{output}}} #' @code {{{domain}}} <- {{{domain}}} |> dplyr::mutate( {{{output}}} = 2 * {{{input}}} ) ``` ### Tags reference | Tag | Purpose | |-----|---------| | `@title` | One-line title (required) | | `@description` | Multi-line description (required) | | `@param name description` | Declares a Mustache placeholder the user must provide in metadata specifications | | `@type` | Component type: `column`, `row`, `parameter`, or `internal` | | `@origin` | CDISC origin (optional): `Assigned`, `Collected`, `Derived`, `Not Available`, `Other`, `Predecessor`, or `Protocol` | | `@depends domain column` | Declares that the code reads `column` from `domain` (repeat for each) | | `@outputs variable` | Declares a column the code creates (repeat for each) | | `@code` | Everything below this tag is executable R code | ### Mustache syntax Components use [Mustache](https://mustache.github.io) — a simple, logic-less templating language. Inside `@code`, `{{{ }}}` are Mustache placeholders, not R syntax. They are text-replaced with concrete values before the code is parsed as R. Rendering is done by the [whisker](https://github.com/edwindj/whisker) R package. The three patterns used in components are: - **`{{ variable }}`** — replaced with the value supplied at render time. - **`{{{ variable }}}`** — unescaped replacement. Used when the value is literal R code (e.g., `{{{ value }}}` to insert `1`, `"text"`, or an expression). - **`{{#list}}...{{/list}}`** — repeats its body once for each element of a vector parameter. Mustache template variables in double braces `{{}}` are HTML escaped by default. Since mighty renders R code, we recommend using triple braces `{{{}}}`. See the [Mustache manual](https://mustache.github.io/mustache.5.html) for the full syntax reference. ### Conventions 1. The input data set is always called `{{{ domain }}}`. 2. The code must assign the result back to `{{{ domain }}}`. 3. Use explicit package namespaces (`dplyr::mutate()`, not `mutate()`). 4. Joins must always specify an explicit `by` argument — this is enforced by automatic validation (see [Automatic code validation]). ## Retrieve and inspect a component List the example components shipped with the package: ```{r list-components} path <- system.file("examples", package = "mighty.component") list_components(path) ``` Retrieve one by file path: ```{r get-ady} ady <- get_component( system.file("examples", "ady.mustache", package = "mighty.component") ) ady ``` Access individual fields through the active bindings: ```{r inspect-fields} ady$title ady$type ady$params ady$depends ady$outputs ady$origin ``` ## Render a component Rendering fills in the Mustache placeholders with concrete values. The `$render()` method takes parameters as named arguments and returns a `mighty_component_rendered` object. Notice every `{{{ }}}` placeholder is now a concrete name: Note that rendering is purely textual — mighty.component replaces placeholders with the values you supply but does not check whether the resulting code is valid R or whether the referenced columns exist. Runtime correctness is your responsibility; use `get_test_component()` (see [Testing components]) to verify components against real data. ```{r render-ady} ady_rendered <- ady$render(domain = "ADAE", variable = "ASTDY", date = "ASTDT") ady_rendered ``` A convenience function combines retrieval and rendering in one step. It returns the same rendered component as above. Note that `get_rendered_component()` takes parameters as a named `list`, unlike `$render()` which takes `...`: ```{r shortcut} get_rendered_component( system.file("examples", "ady.mustache", package = "mighty.component"), list(domain = "ADAE", variable = "ASTDY", date = "ASTDT") ) ``` If you omit a required parameter, you get an informative error: ```{r render-error, error=TRUE} ady$render(domain = "ADAE") ``` ## Evaluate rendered code Once rendered, call `$eval()` to execute the code in your current environment. The component code contains an assignment (e.g., `ADAE <- ADAE |> ...`), and `$eval()` evaluates that code in the calling environment via `eval(envir = parent.frame())`. This means `$eval()` modifies the domain variable in place — no assignment of the return value is needed. ```{r eval-setup} ADAE <- pharmaverseadam::adae |> dplyr::select(USUBJID, ASTDT, TRTSDT) names(ADAE) ``` The `ASTDY` column does not exist yet. Run the rendered component: ```{r eval-run} ady_rendered$eval() names(ADAE) head(ADAE) ``` `$eval()` executes the rendered code in the calling environment by default. You can pass a different environment via the `envir` argument if needed. If you want to save the rendered code to a script file instead of evaluating it interactively, use `$stream(path)` to append the code to an R file: ```{r stream-example} script_file <- tempfile(fileext = ".R") ady_rendered$stream(script_file) readLines(script_file) ``` ## Writing a custom component You can author your own components as `.mustache` files. Here is a realistic example that derives a ratio of the current value to baseline (`R2BASE`) for a lab parameter. Save the following template to a `.mustache` file: ```r #' @title Ratio to baseline #' @description #' Derives the ratio of the analysis value to the baseline value. #' #' @param domain `character` Name of the domain #' @param variable `character` Name of the new ratio variable #' @type column #' @origin Derived #' @depends {{{domain}}} AVAL #' @depends {{{domain}}} BASE #' @outputs {{{variable}}} #' @code {{{domain}}} <- {{{domain}}} |> dplyr::mutate( {{{variable}}} = dplyr::if_else(BASE != 0, AVAL / BASE, NA_real_) ) ``` ```{r custom-write, include = FALSE} r2base_file <- tempfile(fileext = ".mustache") writeLines(c( "#' @title Ratio to baseline", "#' @description", "#' Derives the ratio of the analysis value to the baseline value.", "#'", "#' @param domain `character` Name of the domain", "#' @param variable `character` Name of the new ratio variable", "#' @type column", "#' @origin Derived", "#' @depends {{{domain}}} AVAL", "#' @depends {{{domain}}} BASE", "#' @outputs {{{variable}}}", "#' @code", "{{{domain}}} <- {{{domain}}} |>", " dplyr::mutate(", " {{{variable}}} = dplyr::if_else(BASE != 0, AVAL / BASE, NA_real_)", " )" ), r2base_file) ``` After saving this template to a `.mustache` file, load, render, and run it: ```{r custom-load} r2base <- get_component(r2base_file) r2base ``` ```{r custom-render} r2base_rendered <- r2base$render( domain = "ADLB", variable = "R2BASE" ) r2base_rendered$code ``` ```{r custom-eval} ADLB <- pharmaverseadam::adlb |> dplyr::filter(PARAMCD == "ALB") |> dplyr::select(USUBJID, PARAMCD, AVISIT, AVAL, BASE) head(ADLB) r2base_rendered$eval() ADLB |> dplyr::select(USUBJID, PARAMCD, AVISIT, AVAL, BASE, R2BASE) |> head() ``` ## Automatic code validation When a component is rendered, the generated code is automatically validated. The package currently checks for **implicit joins** — any `dplyr::left_join()`, `dplyr::inner_join()`, or similar call without an explicit `by` argument triggers an error. This prevents a common source of bugs in clinical programming where join columns change between studies. Here is a component that fails validation: ```r #' @title Bad join example #' @description Implicit join that will fail validation. #' #' @param domain `character` domain name #' @type row #' @depends {{{domain}}} USUBJID #' @outputs NEWCOL #' @code {{{domain}}} <- {{{domain}}} |> dplyr::left_join(other_data) ``` ```{r validation-fail-setup, include = FALSE} bad_template <- c( "#' @title Bad join example", "#' @description Implicit join that will fail validation.", "#'", "#' @param domain `character` domain name", "#' @type row", "#' @depends {{{domain}}} USUBJID", "#' @outputs NEWCOL", "#' @code", "{{{domain}}} <- {{{domain}}} |>", " dplyr::left_join(other_data)" ) bad_file <- tempfile(fileext = ".mustache") writeLines(bad_template, bad_file) ``` ```{r validation-fail, error=TRUE} get_rendered_component(bad_file, list(domain = "ADAE")) ``` The fix is to specify the join key explicitly: ```r #' @title Good join example #' @description Explicit join that passes validation. #' #' @param domain `character` domain name #' @type row #' @depends {{{domain}}} USUBJID #' @outputs NEWCOL #' @code {{{domain}}} <- {{{domain}}} |> dplyr::left_join(other_data, by = dplyr::join_by(USUBJID)) ``` ```{r validation-pass-setup, include = FALSE} good_template <- c( "#' @title Good join example", "#' @description Explicit join that passes validation.", "#'", "#' @param domain `character` domain name", "#' @type row", "#' @depends {{{domain}}} USUBJID", "#' @outputs NEWCOL", "#' @code", "{{{domain}}} <- {{{domain}}} |>", " dplyr::left_join(other_data, by = dplyr::join_by(USUBJID))" ) good_file <- tempfile(fileext = ".mustache") writeLines(good_template, good_file) ``` ```{r validation-pass} get_rendered_component(good_file, list(domain = "ADAE"))$code ``` ## Testing components `get_test_component()` creates a component that runs in an **isolated R session** with automatic code coverage tracking. This is useful both for interactive exploration and for formal unit tests with `testthat`. We set `check_coverage = FALSE` here because this code runs inside a vignette, not inside a `test_that()` block. The default (`TRUE`) uses `withr::defer()` to automatically verify coverage when a test finishes — use that default in your actual tests. ```{r test-create} ady_path <- system.file( "examples", "ady.mustache", package = "mighty.component" ) ady_test <- get_test_component( component = ady_path, params = list(domain = "ADAE", variable = "ASTDY", date = "ASTDT"), check_coverage = FALSE # set TRUE in real tests ) ady_test ``` Assign input data into the isolated session: ```{r test-assign} ADAE_input <- pharmaverseadam::adae |> dplyr::select(USUBJID, ASTDT, TRTSDT) ady_test$assign("ADAE", ADAE_input) ady_test$ls() ``` Execute the component and retrieve the result: ```{r test-eval} ady_test$eval() ady_test$get("ADAE") |> head() ``` Check coverage — every line of the component code should have been executed: ```{r test-coverage} # Normal print method ady_test # Percent coverage ady_test$percent_coverage # Line coverage in a data.frame ady_test$line_coverage ``` When `check_coverage = TRUE` (the default), coverage is verified automatically when the test object goes out of scope using `withr::defer()`. If any line was not executed, an error is raised. This integrates naturally with `testthat` test files: create the test component inside a `test_that()` block, assign data, evaluate, and assert on the results — coverage checking happens automatically when the test finishes.