Prediction-oriented Model Comparison

Prediction-oriented Model Comparisons using Information Criteria

Comparing alternative models that describe a phenomenon of interest is a crucial step in scientific theory development. Traditionally, such model comparisons in PLS-SEM were based on the explained variance metrics (e.g., R², Adjusted R², and GoF) that do not account for model complexity, and are thus prone to fitting noise along with the signal in the data (Sharma et al. 2019). For example, increasing the model complexity by incorporating more structural paths and variables will inevitably result in an increase in the R² value, regardless of the theoretical validity of the added variables. Furthermore, the use of path significances (p-values) also offers no objective basis with which to judge which model is the best in the PLS-SEM framework, and is not recommended for selecting a model among alternatives. In fact, the reliance on path significances to compare and select a model can induce publication bias via “p-value hacking” (Sharma et al. 2019, pp. 381).

Recent developments in PLS-SEM have introduced Information Theory based model selection criteria that trade-off model complexity and fit (Sharma et al. 2019; Sharma et al. 2020). These studies found that the Bayesian Information Criterion (BIC) and the Geweke-Meese criterion (GM) are particularly useful in selecting correctly-specified models with low prediction error. These (and other criteria such as the Akaike Information Criterion or AIC) have been fully incorporated in SmartPLS. When the goal is to compare several models and select a single correctly-specified model with low prediction error the researcher can select the model with the lowest BIC and GM value.

Model-averaging using Information Criteria

In certain practical situations it is also possible that the differences in the model selection criteria values are too small (e.g., less than 2), so that a single superior model does not emerge, thereby leading to model selection uncertainty. In such cases, rather than relying on the predictions of a single model the researcher can consider creating model-averaged predictions based on the Akaike Weights that capture the relative evidence in favor of each model. Danks et al. (2020) find that AIC-based Akaike weights are particularly useful for creating model-averaged predictions under conditions of model selection uncertainty. They show that AIC-based model-averaged predictions generally outperform the predictions of individual models.

Summary

Information Theory based model selection criteria are useful in comparing alternative models to select a correctly-specified model with low prediction error, and in the case of model selection uncertainty creating model-averaged predictions. SmartPLS 3 fully incorporates a broad range of model selection criteria that can be used by PLS-SEM researchers. R-users can utilize the semplsic or model selection uncertainty packages. An Excel spreadsheet is also available for calculating the model selection criteria.

References

Links

Link to More Literature