6  Model comparison

We constructed the demographic models incrementally, starting from the simple intercept model and gradually adding plot random effects, competition, and climate covariates. While the intercept-only model represents the most basic form, we opted to discard it and use the intercept model with random effects as the baseline or null model for comparison with more complex models. We ensured the convergence of all these model forms (described in Chapter 4), and comprehensive diagnostic details are available at https://github.com/willvieira/TreesDemography.

Our primary objective is to select the model that has learned the most from the data. We used complementary metrics to quantify the gain in information achieved by adding complexity to each demographic model. One intuitive metric involves assessing the reduction in variance attributed to likelihood and the variance associated with plot random effects. A greater reduction in their variance implies a greater information gain from model complexity. The following metrics are all derived from the idea of increasing predictive accuracy. Although we focus on inference, measuring predictive power is crucial for quantifying the additional information gained from including new covariates. The first two classic measures of predictive accuracy are the mean squared error (MSE) and the pseudo \(R^2\). We base these metrics on the linear relationship between observed and predicted demographic outputs. Finally, we used Leave-One-Out Cross-Validation (LOO-CV), which uses the sampled data to estimate the model’s out-of-sample predictive accuracy (Vehtari, Gelman, and Gabry 2017). LOO-CV allows us to assess how well each model describes the observed data and compare competing models to determine which has learned the most from the data.

Parameter variance

This section describes how the variance attributed to plot random effects changes with increasing model complexity. As we introduce covariates, it is expected that part of the variance in demographic rates, initially attributed to random effects, shifts towards the covariate fixed effects. Therefore, the larger the reduction in variance associated with plot random effects, the more significant the role of covariates in explaining demographic rates. The Figure 6.1 shows the \(\sigma_{plot}\) change with increased model complexity for growth, survival, and recruitment vital rates.

Figure 6.1: Boxplot shows the change in the posterior distribution of the parameter \(\sigma_{plot}\) across the 31 tree species between the competing models. For each growth, survival, and recruitment vital rate, the simplest model (plot random effects only) increases in complexity with the addition of fixed size, competition, and climate covariates. Each colored dot represents the species’ average posterior distribution.

Model predictive accuracy

We used pseudo \(R^2\) and MSE metrics derived from comparing observed and predicted values to evaluate the predictive accuracy of growth and recruitment demographic rates. Higher \(R^2\) values and lower MSE indicate better overall model accuracy. The Figure 6.2 and Figure 6.3 compare the growth and recruitment models using \(R^2\) and MSE, respectively.

Figure 6.2: Posterior distribution of pseudo \(R^2\) across the 31 tree species between the competing models. For each growth, survival, and recruitment vital rate, the simplest model (plot random effects only) increases in complexity with the addition of fixed competition and climate covariates. Each colored dot represents the species’ average posterior distribution.

Figure 6.3: Posterior distribution of Mean Squared Error (MSE) across the 31 tree species as models become more complex.

We used three complementary metrics for the survival model to assess model predictions. While the accuracy of classification models is often evaluated through the fraction of correct predictions, this measure can be misleading for unbalanced datasets such as mortality, where dead events are rare. To address this issue, we calculated sensitivity, which measures the percentage of dead trees correctly identified as dead (true positives). We also computed specificity, which measures the percentage of live trees correctly identified as alive (true negatives). The combination of sensitivity and specificity allows us to calculate corrected accuracy, considering the unbalanced accuracy predictions of positive and negative events (Figure 6.4).

Figure 6.4: Comparing the posterior distribution of sensitivity, specificity, and accuracy across the 31 tree species between the competing models. Each colored dot represents the species’ average posterior distribution.

Leave-one-out cross-validation

Finally, we evaluated the competing models using the LOO-CV metric, where models are compared based on the difference in the expected log pointwise predictive density (ELPD_diff). In cases involving multiple models, the difference is calculated relative to the model with highest ELPD (Vehtari, Gelman, and Gabry 2017). Consequently, the model with ELPD_diff equal to zero is defined as the best model. In contrast, the performance of the other models is assessed based on their deviation from the reference model in pointwise predictive cross-validation. Given the large number of observations in the dataset, we approximated LOO-CV using PSIS-LOO and subsampling. For each species, we approximated LOO-CV by sampling one-fifth of the total number of observations.

Figure 6.5: Boxplot shows the LOO-CV compare between the competing models based on the expected log pointwise predictive density (ELPD_diff) difference across the 31 tree species. The sd_diff is the standard error of the ELPD difference between the model and the reference model (ELPD_diff equal to zero).

Size effect in survival

We initially incorporated the size effect into the survival models due to the structured-population approach. However, we observed that the effect of size on mortality probability was generally weak and variable among species, with no clear pattern of increased mortality probability with larger individual size. All models that included the size effect performed worse than the null model, which contained only plot random effects Figure 6.6.

Figure 6.6: Boxplot shows the LOO-CV compare between the competing models based on the expected log pointwise predictive density (ELPD_diff) difference across the 31 tree species. The sd_diff is the standard error of the ELPD difference between the model and the reference model (ELPD_diff equal to zero).

Conclusion

Our analysis revealed that incorporating competition into the growth, survival, and recruitment models proved more effective in gaining individual-level information than climate variables. The parameter \(\sigma_{plot}\), interpreted as spatial heterogeneity, was lowest in the growth model, followed by recruitment and survival. As the models became more complex with the inclusion of covariates, recruitment exhibited the most significant reduction in spatial variance, followed by growth, with no clear pattern in the case of survival.

Regarding predictive performance, competition contributed more to the overall predictive capacity (\(R^2\), MSE, and corrected accuracy) in the growth and survival models compared to climate variables. Although recruitment had the largest reduction in \(\sigma_{plot}\), it had minimal impact on prediction accuracy.

Finally, the LOO-CV indicates a clear trend where the complete model featuring plot random effects, competition, and climate covariates outperformed the other competing models. Furthermore, the absolute value of the ELPD shows that the growth model gained the most information from including covariates, followed by recruitment and survival models. Consequently, we selected the complete model with plot random effects, competition, and climate covariates as the preferred model for further analysis.