B.1 R targets package
The R targets package, a set of pipeline implementation and management tools, forms the basis of the Air health Scientific Workflow System. Using targets aids the reproducibility of analyses, tracking input data, parameters, code and dependencies to determine which steps need to be rerun when a change is detected.
A targets pipeline is structured as a list of targets, each of which has a name and associated code block. The result of each target is saved and can be used in other targets by referring to it by name.
On running the pipeline, each target is checked for changes to format, metadata and data, and is rerun if a change is detected. If there has been no change, the target is skipped (code is not run). Note that if a target changes, all dependent downstream targets will be run as the dependencies are recorded in target metadata.
More complex pipelines can be set up with targets branching functionality and tarchetypes package. See the targets manual and documentation for further information.
B.1.1 Function-oriented programming
Targets is designed to be function-oriented, writing and calling functions. This is in contrast to the style of programming that runs step-by-step, top to bottom.
An example of the latter:
<- 2
x <- 3
y <- x*y z
In targets:
<- function(a, b){
do_multiply *b
a
}
list(
tar_target(x, 2),
tar_target(y, 3),
tar_target(z, do_multiply(x, y))
)
While for this example, there seems to be little benefit from creating a function (indeed one could simply write tar_target(z, x*y)
), it aids code clarity and efficiency for more complex workflows. Clearly named functions can self-describe their intended purpose, with defined inputs (arguments) and output(s) (return value). Carefully defined generalised functions may be reused as needed within or across projects.