3 Getting started

3.1 Software

This pipeline was developed on R version 4.5.1 “Great Square Root” and RStudio version 2025.05.1 “Mariposa Orchid”. You will need to have R installed and it is strongly recommended to also install RStudio.

3.2 Download the repository code

Go to the GitHub repository for ResPrj_HIA_Indonesia_pm25_mortality_1998_2020. Download the repository by clicking the green < > Code button, then “Download ZIP”. Extract the downloaded zip file to an appropriate location on your computer.

Alternatively, if you are comfortable with using Git, you may clone the repository.

3.3 Download the input data

All datasets are publicly available from the following sources:

Modelled mortality data: Institute of Health Metrics and Evaluation (IHME) via the GBD Results Tool
Modelled PM_2.5 exposure data: Atmospheric Composition Analysis Group (ACAG) as part of the SatPM2.5 (Satellite-derived PM2.5) dataset (older versions archived)
Geographical boundaries: Database of Global Administrative Areas

Please note the licencing and conditions of use for each dataset before using them in your work.

The specific dataset versions used in the development of the pipeline are available in the Cloud CARDAT repository to approved researchers.

IHME data for Southeast Asia, as of 2025-06-03
ACAG SatPM2.5 Global V5.GL.02
GADM v4.0.4

3.3.1 Using the GBD Results Tool

You must register to search and download data from IHME’s GBD Results Tool. To retrieve the required variables for this pipeline, use the following search parameters:

GBM Estimate: Cause of death or injury
Measure: Deaths
Metric: Number, Rate
Cause: All causes
Location: Select all under Southeast Asia (including sub-national areas for Indonesia)
Age: Select all
Sex: Male, Female
Total percentage change: Off
Year: All years from 1998 to 2021

3.4 Run the pipeline

3.4.1 Open the project

Open up the R project (ResPrj_Indonesia_HIA_pm25_mortality_1998_2020.Rproj) in RStudio. You should be able to see the project name in the upper-right corner of the RStudio window. If you see “Project: (None)” instead, you have not opened the project file correctly and the targets code will not work.

3.4.2 Install the required packages

RStudio should prompt you to install the libraries required if not already installed.

If not automatically prompted, you can install libraries by running the command install.packages("targets") in the Console. Additional packages used by the pipeline are tarchetypes, sf, terra, data.table, tmap - these can be installed with:

install.packages(c("targets", "tarchetypes", "sf", "terra", "data.table", "tmap"))

The iomlifetR package is also required to calculate HIA life tables and other metrics but is not available through CRAN. It can be installed from GitHub by running the following:

devtools::install_github("richardbroome2002/iomlifetR", build_vignettes = TRUE)

(“sf”, “terra”, “data.table”, “iomlifetR”, “tmap”)

3.4.3 Set the inputs

The pipeline is defined in the _targets.R file, while global objects are defined in config.R. Helper code designed to run interactively (line-by-line) is provided in main.R and can be used to examine, run and visualise the pipeline and its outputs.

You must define the paths to the input files in config.R. All variables

indir.geography and infile.geography (GADM data),
indir.mort and infile.mort (IHME data), and
indir.pm25 and infile.pm25 (ACAG data)

must be provided and point to the corresponding input data files you have downloaded.

The mapping of province names is included in this repository under the metadata/ directory. Ensure the infile.locname_map variable is correctly defined for this mapping file.

3.4.4 Run targets

Open the main.R file. This code should be run interactively, line-by-line. You can run code one line at a time in RStudio by clicking the button “Run” at the top-right of the code panel (known as the Source panel). Alternatively you may use the shortcut Ctrl + Enter on a Windows machine.

Starting from the top, run the code and examine the output.

library(targets): Loads the targets package.
tar_manifest() and tar_visnetwork(targets_only = T): Check the targets of the pipeline have been defined correctly and dependencies are linked correctly.
tar_make(): Run all invalidated targets. Progress shown in console.
tar_read(data_attributable_number): Read the output of the specified target (may be a data.frame, file path, etc.)
browseURL(tar_read(report)[1]): Open up the rendered report in a browser.
tar_meta(): View the target metadata, including warnings and errors, number of seconds to run, time target was last run, size of target.

You can also open the output files from the pipeline which are located in the data_derived/ (output data files), figs_and_tabs/ (figures and tables output) and report/ (rendered Rmarkdown documents) directories.