PLSpredict

Abstract

Partial least squares (PLS) has been introduced as a “causal-predictive” approach to structural equation modeling (SEM), designed to overcome the apparent dichotomy between explanation and prediction. However, while researchers using PLS-SEM routinely stress the predictive nature of their analyses, model evaluation assessment relies exclusively on metrics designed to assess the path model’s explanatory power. Recent research has proposed PLSpredict, a holdout sample-based procedure that generates case-level predictions on an item or a construct level (Shmueli et al. 2016; Shmueli et al. 2019).

PLSpredict in PLS-SEM parameters

Before initiating the PLSpredict procedure, all measurement models must meet the relevant quality criteria (Hair et al., 2019). When running PLSpredict, researchers need to make a series of choices. Most importantly, the key target construct in the PLS path model for which researchers want to assess the model’s predictive relevance, usually has a reflectively specified measurement model to support the prediction of its items, even though PLS-SEM technically also allows to assess the prediction of a target construct’s formatively specified items. Relevant choices when using PLSpredict include:

  • Number of folds: Use ten folds (i.e., k = 10), but ensure that the training sample in a single fold still meets the model’s minimum sample size requirements. If not, choose a higher value for k
  • Number of repetitions (r): Use ten repetitions (i.e., r = 10) when the aim is to predict a new observation using the average of predictions from multiple estimated models. In addition, setting r to 10 generally offers a good trade-off between increase in precision and runtime. Alternatively, use one repetition (i.e., r = 1) when the predictions should be based on a single model.
  • Selection of an adequate prediction statistic to quantify the degree of prediction error: The mean absolute error (MAE) and the root mean squared error (RMSE)

The SmartPLS 3 software fully support running a PLSpredict analysis by selecting these parameters.

PLSpredict analysis steps and results interpretation

  • Assessment of a model’s predictive power should primarily rely on one key target construct.
  • To assess the degree of prediction error, use the RMSE unless the prediction error distribution is highly non-symmetric (RMSE is typically used by default). In this case, the MAE is the more appropriate prediction statistic
  • Examine each indicator’s Q²predict value from the PLS-SEM analysis. A negative Q²predict value indicates that the model lacks predictive power.
  • Compare the RMSE (or the MAE) value with the LM value of each indicator. Check if the PLS-SEM analysis (compared to the LM) yields lower prediction errors in terms of RMSE (or MAE) for all (high predictive power), the majority (medium predictive power), the minority (low predictive power), or none of the indicators (lack of predictive power)
  • Examine the distribution of the prediction errors. PLS-SEM-based residuals should be normally distributed; a left-tailed distribution indicates over-prediction, a right-tailed distribution indicates under-prediction. Also compare the distributions of the prediction errors from PLS-SEM with those from LM. The distributions should correspond closely.

Summary

PLSpredict enables researchers to address long-standing calls for a stronger focus on predictive model assessment, most notably a model’s out-of-sample predictive power. To reap the benefits of PLS-SEM’s predictive capabilities, the procedure also offers value in other contexts, such as validation process in terms of scale development and index construction studies.

References

Links

Link to More Literature