2 Metadata

These datasets consists of daily PM2.5 predictions across Australia using the revised random forest model of the Bushfire Smoke project V1.3. It corrects the input predictors of and supersedes V1.2.

Data is provided in NetCDF format in the data_derived/ directory of each dataset folder. Spatial resolution and extent of the raster is a 5km grid across Australia (mainland and Tasmania) in GDA94 / Australian Albers projection (EPSG:3577). Variables consist of:

  • daily estimated total PM2.5 (µg/m3) from the random forest model
  • STL decomposition components of total PM2.5
  • Daily firesmoke flags for indication of bushfire events (for the original 2001-2020 dataset only)

Further description of the methodology can be found in the section Methodology.

2.1 Component datasets

2.1.1 Bushfire_specific_PM25_Aus_2001_2020_v1_3

This is the original dataset as described in the published paper and covers days from 2001-01-01 to 2020-06-30. The random forest model was trained on observed daily data from regulatory monitors then used to predict over the 5km grid. The total PM2.5 predictions were then broken down into trend, seasonal and remainder components for each pixel through STL decomposition over 2001-2019 (see Methodology - STL decomposition), excluding 2020 due to the exceptional bushfire events of 2019-2020. STL components for 2020 were filled by assuming identical daily seasonal and trend components for 2019 and 2020, calculating the 2020 remainder component as the difference between total PM2.5 (2020) and seasonal + trend (2019).

Statistical thresholds for extreme PM2.5 were calculated for each pixel—both the 95th percentile of predicted PM2.5 and standard deviation of the trimmed (to 99th percentile) remainder.

Binary flags for identification of bushfire events were then produced from the statistical thresholds and other external data sources (see Table 2.2 for full listing of flags):

  • active fire MODIS product
  • PM2.5 dust from MERRA-2 reanalysis
  • dust AOD from CAMS reanalysis
  • temperature from Bureau of Meteorology’s AWAP grids

Daily predicted PM2.5, STL components and binary flags are stored as a NetCDF file by year in the data_derived/ directory. The calculated statistical thresholds are stored in data_derived_raw_flags/, also as NetCDFs. Note each threshold is a single layer (no time component) as both have been calculated across all years 2001-2020.

2.1.2 Bushfire_specific_PM25_Aus_2020_2023_v1_3

The 2020-2023 update uses the original random model (trained on 2001-2020 data) to predict daily PM2.5 from 2020-01-01 to 2023-12-31. STL decomposition was performed with only 2021-2023 data, omitting 2020 data as in the original dataset. 2020 STL components were infilled by assuming identical trend and seasonal components as 2021, and taking the difference between total PM2.5 (2020) and seasonal + trend (2021) to be the remainder component.

Consequently the STL components for the overlapping period of the two datasets (2020-01-01 to 2020-06-30) are not identical. There may also be visible discontinuity of STL components between the original 2001-2020 and 2020-2023 update.

Both predicted PM2.5 and STL components are available in the data_derived/ folder as NetCDF files by year. Statistical thresholds and other flags were not produced.

2.2 Data dictionary

Table 2.1: Description of non-firesmoke flag variables available for both 2001-2020 and 2020-2023 datasets.
Variable Description
pm25_pred Total predicted PM2.5 (µg/m3)
seasonal Seasonal component of PM2.5 STL decomposition (µg/m3)
trend Trend component of PM2.5 STL decomposition (µg/m3)
remainder Remainder component of PM2.5 STL decomposition (µg/m3)
extrapolated Flag indicating if raster value for flag variables were spatially extrapolated (1 if True, 0 if False)
prediction_out_range Flag indicating if pm25_pred value was beyond the range of PM2.5 used to train model (1 if True, 0 if False)
predictor_out_range Number of predictors which had a value beyond the range of that used to train the model
Table 2.2: Description of firesmoke flags for V1.3, produced for 2001-2020. Timepoints are daily from 2001 to 2020.
Flag Description
dust_cams_p50 1 if CAMS AOD dust for pixel-timepoint > 50th percentile of CAMS AOD dust (all pixels and timepoints), otherwise 0
dust_cams_p75 1 if CAMS AOD dust for pixel-timepoint > 75th percentile of CAMS AOD dust (all pixels and timepoints), otherwise 0
dust_cams_p95 1 if CAMS AOD dust for pixel-timepoint > 95th percentile of CAMS AOD dust (all pixels and timepoints), otherwise 0
dust_merra_2_p50 1 if MERRA-2 PM2.5 dust for pixel-timepoint > 50th percentile of MERRA-2 PM2.5 dust (all pixels and timepoints), otherwise 0
dust_merra_2_p75 1 if MERRA-2 PM2.5 dust for pixel-timepoint > 75th percentile of MERRA-2 PM2.5 dust (all pixels and timepoints), otherwise 0
dust_merra_2_p95 1 if MERRA-2 PM2.5 dust for pixel-timepoint > 95th percentile of MERRA-2 PM2.5 dust (all pixels and timepoints), otherwise 0
smoke_p95_v1_3 1 if daily predicted PM2.5 for pixel-timepoint > 95th percentile of daily PM2.5 of pixel (all timepoints), otherwise 0
trimmed_smoke_2SD_v1_3 1 if remainder for pixel-timepoint > 2 standard deviations (SD) of the trimmed remainder (excluded values above 99th percentile) of pixel (all timepoints), otherwise 0
whs_18degreeC 1 if mean daily temperature for pixel-timepoint < 18°C, otherwise 0
whs_15degreeC 1 if mean daily temperature for pixel-timepoint < 15°C, otherwise 0
whs_12degreeC 1 if mean daily temperature for pixel-timepoint < 12°C, otherwise 0
active_fires_10000 1 if active fires present for pixel-timepoint within 10km buffer, otherwise 0
active_fires_25000 1 if active fires present for pixel-timepoint within 25km buffer, otherwise 0
active_fires_50000 1 if active fires present for pixel-timepoint within 50km buffer, otherwise 0
active_fires_100000 1 if active fires present for pixel-timepoint within 100km buffer, otherwise 0
active_fires_500000 1 if active fires present for pixel-timepoint within 500km buffer, otherwise 0