High-resolution modeling and projection of heat-related mortality in Germany under climate change

Data sourcesMortality

We acquired all-cause mortality counts from the Federal Office of Statistics of Germany28,29. We used three statistics as the targets for our training process: Daily mortality by state and gender; weekly mortality by state, gender, and broad age groups (0–65, 65–75, 75–85, 85+); as well as weekly mortality data for Germany as a whole, categorized by gender and detailed age group (0–30, five-year age groups from 30–35 to 90–95, 95+). Ethics approval was not required for the use of this data because sufficient anonymization is achieved through aggregation. The data are aggregated at the district level, ensuring that individual identities cannot be traced or identified, in compliance with data protection regulations.

Population

We obtained population data information for each district (total of 400) as of December 31st annually from the regional database of Germany (population estimate30). We selected data from the year 2011 onward because the counting methods changed that year. The population of each district was categorized by gender and age group (five-year age groups from 0–5 to 90–95, 95+). We interpolated the population data to estimate daily population figures.

The projection of demographic changes until 2070 is available from the German Federal Statistical Office31. To estimate the district-level population, we assumed that the geographical distribution of each age group remains the same as in 2021.

Temperature

We utilized three sources to collect past and present temperature data for Germany. First, we acquired gridded temperature data from the Copernicus European Regional ReAnalysis (CERRA), covering the years 2011 to 202032. The data were used to compute 6-hourly average temperatures for each district in Germany. Second, we used daily minimum, average, and maximum temperatures at a spatial resolution of 1 × 1 kilometer provided by Helmholtz Munich33. This dataset was obtained under a Data Transfer Agreement between our research institution and Helmholtz Munich, ensuring secure and authorized access to the data. We noticed some missing values in the dataset from Helmholtz Munich and supplemented it with data from the CERRA dataset while considering the mean difference between the two datasets. We also noticed the irregularity of the mean temperature in year 2013 from Helmholtz Munich, which are much lower than other years. However, we retained the data as-is for training purposes in this paper, but excluded it from metric evaluation. Data beyond 2020 was excluded from training and validation due to the impact of COVID-19 on mortality.

The above mentioned datasets are available on a yearly basis and are not suitable for real-time estimation of the heat-related risk. To address this issue, we utilized temperature station data from the Deutscher Wetterdienst (DWD)34. The station data are available on a daily basis. We gathered information from 537 weather stations. These stations have been measuring the average daily air temperature 2 meters above the ground since 2010 and have continued until at least 2022. The location of the stations is illustrated in Supplementary Fig. 1. In order to unify the spatial coverage in accordance with the reanalysis data, we trained an attention model to map the daily average temperature from the DWD stations to the district-level average temperature.

Geodata

The geographic information including the boundary and reference point of each district in Germany was provided by Federal Agency for Cartography and Geodesy (BKG). The boundary of each district was used to calculate the temperature in each district. The reference points were used to initiate the coordinate of each district during the training of the temperature attention model.

Climate projections

We obtained climate projections from the EC-Earth3 model for three different shared socio-economic pathways (SSPs) as defined by the Intergovernmental Panel on Climate Change (IPCC): SSP126, SSP245, and SSP37035. The dataset includes daily mean temperatures spanning from 2015 to 2100, with a spatial resolution of 100 km. The projection data of each scenario is a climate ensemble of 50 members. To map the temperature from the grid to the district level, we obtained daily average temperature with the ERA5 daily statistics calculator provided by the Copernicus Climate Data Store36 with a grid of 0.5° × 0.5° for training a down-scaling model. We interpolated the ERA5 data to match the grid of the EC-Earth3 model and used the interpolated values to train a machine learning model for climate downscaling.

Machine learning model for mortality predictions

Our objective is to predict the number of heat-related excess death cases in each district for each age group and sex. The estimation of all-cause mortality operates as follows:

$${E}_{{{{\rm{district}}}},{{{\rm{age}}}},{{{\rm{sex}}}}}({{{\rm{mort}}}})={{{{\rm{baseline}}}}}_{{{{\rm{age}}}},{{{\rm{sex}}}}}\times {{{{\rm{population}}}}}_{{{{\rm{district}}}},{{{\rm{age}}}},{{{\rm{sex}}}}}\times {f}_{{{{\rm{age}}}},{{{\rm{sex}}}}}({T}_{t}).$$

(1)

We used temperature data (Tt) of each district at time t as input for our model to obtain the result fage, sex(Tt). The function f is used to describe the temperature dependent variability of temperature-related risks. The model accounted for the lag effects of temperature by using different convolutional kernel size. By multiplying fage, sex(Tt) with the baseline death rate and population data, we obtained the final results, comprising daily death cases for each district, categorized by gender and age groups. These results were subsequently aggregated and compared to the registered death cases to calculate the model’s loss, which was used to train the model.

More specifically, we predicted multiplying factors for the baseline mortality rate using temperature as input. We assumed a uniform baseline mortality rate for each gender and age group across all districts within Germany, with the baseline depending solely on the time. Multiplying these factors with the baseline provided the predicted daily mortality rates for each district. Unlike conventional statistical models, we did not explicitly account for seasonal trends, as these were implicitly encapsulated within the temperature variable. Along with the population data for each district, we predicted the daily death cases for each district according to genders and age groups. We aggregated the predicted daily death cases to align with the temporal and spatial scope of the data from the Federal Office of Statistics of Germany. We employed the Poisson loss function to calculate the loss between the aggregated prediction and the registered death cases and utilize this loss to train our model. If correction factors were applied for different days of the week, the correction was performed before aggregating the predicted death cases.

Throughout our analysis, we consistently partitioned the data into training and validation sets, maintaining an 80:20 ratio with data from 2011 to 2018 used as training data and data from 2019–2020 as validation data. We evaluated two distinct network architectures to fulfill our research objectives.

Linear model

In the linear model, we employed a fully connected network with ReLU (rectified linear unit) as the activation function. To investigate the temperature-lag-mortality relationship, we incorporated a 1-dimensional convolutional layer with kernel sizelas the initial layer within the neural network architecture to capture the lag effect of the temperature on the mortality.

Exponential model

The exponential model employed a shallow network to extend the statistical model and permitted the summation of exponential terms, a feature absent in GAMs and DLMs. The structural principle of the network can be described by the following function:

$$f(T)=g\left(\exp \left({g}_{1}(T)\right.\right.,\ldots ,\exp ({g}_{d}(T)),\quad T=[{T}_{0},{T}_{1},\ldots ,{T}_{l}].$$

(2)

Here, T is the input temperature vector, l represents the number of lag days under consideration, and d denotes the network’s hidden dimension. The function g, g1, ⋯  , gd embodies the affine transformations executed by the network. The scaling factor within the function g is non-negative. The resulting output, denoted as f, is the multiplying factor of the baseline and is then used to calculate the predicted death cases for each district, classified by gender and age group.

The performance of a single trained model depends strongly on the initial value. There can be significant differences among models trained with different initial values. Therefore, we trained multiple instances of the temperature-mortality models and used the average output of the ensembles as the final result to reduce prediction variance.

Machine learning model for district level temperature estimation

We used an attention-like model to interpolate the data from the DWD stations or climate projections. After this interpolation step, we obtained the data that serves as the input for our temperature-mortality model. In the attention-like model, each district was assigned a trainable geographical position Xi = (loni, lati), where i identifies different districts. Similarly, a geographical position Yj = (lonj, latj) was assigned to each DWD station or grid point in the climate projections according to their location. The daily average temperature for a district, denoted as Ti, was calculated as follows

$${T}_{i}={b}_{i}+{\sum}_{j}{w}_{ij}{T}_{j},\,{{{\rm{where}}}}\,{w}_{ij}=\frac{{e}^{-{\alpha }_{i}d({X}_{i},{Y}_{j})}}{{\sum }_{k}{e}^{-{\alpha }_{i}d({X}_{i},{Y}_{k})}}.$$

(3)

Here, bi is the trainable bias, Tj is the temperature at the corresponding DWD station or grid point in climate projections, wij is the attention of district i on station or point j, and d calculate the distance between Xi and Yj, and αi is a trainable district specific scaling factor.

The temperatures acquired this way capture the local weather exactly (RMSE compared to the Helmholtz Munich dataset, 0.23 °C for training dataset, 0.25 °C for validation dataset) and include the impact of the heat island effect implicitly. The district level temperatures can then be used as input for our model to provide accurate mortality estimation.

Statistics and Reproducibility

To assess the stability of the machine learning models, multiple instances of each setup were trained with different random initial values. The number of instances for each model setup is listed in the corresponding Supplementary Tables 2–7. Performance was characterized using mean square errors and coefficients of determination (R2-values). For the final setup, 20 instances were trained, and the average of these 20 instances is reported as the final result.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Leave a Reply