Cross-validated Predictive Ability Test (CVPAT)

Abstract

Recent debates in social science and management research have highlighted the critical need for out-of-sample predictive assessments of models that simultaneously offer theoretical explanation of the phenomena under study (Shmueli and Koppius, 2011; Hofman et al., 2017; Yarkoni and Westfall, 2017). PLS-SEM is considered well-suited to bridge the gap between explanatory and predictive model assessments (Jöreskog and Wold, 1982; Chin, 1995). However, PLS-SEM analyses have thus far focused on in-sample explanatory assessments (e.g., explained variance, model fit, and path significances) while ignoring out-of-sample prediction-oriented assessments due to a lack of suitable statistical tools (Shmueli et al., 2016; Shmueli et al., 2019). In particular, PLS-SEM has lacked a statistical inference test to assess whether a proposed or alternative theoretical model offers significantly better out-of-sample predictive power than a benchmark or an established model.

In a critical new development, Liengaard et al. (2020) have introduced the cross-validated predictive ability test (CVPAT) that allows conducting pairwise comparisons between theoretically-motivated competing models for their out-of-sample predictive power based on a pre-specified statistical significance level (e.g., α = 0.05). Their simulation study shows that CVPAT exhibits high power levels with sample sizes 250 and more, and when the measurement model loadings are 0.8 or higher. With lower sample sizes, researchers can still expect to achieve satisfactory power levels when the structural model R² values are 0.4 or higher. However, CVPAT power levels are generally below the recommended threshold of 0.8 when sample sizes are 100 or smaller. A stepwise procedure is presented to guide researchers when comparing theoretically motivated models for their out-of-sample predictive abilities.

Stepwise procedure

Step 1

The purpose of the CVPAT procedure is to investigate if a proposed or alternative model (AM) provides a significantly better out-of-sample predictive ability compared to a benchmark or theoretically established model (EM) based on a pre-specified significance level. Because PLS-SEM focuses on providing theoretical explanation, the models selected for comparison in Step 1 must be based on valid theoretical reasoning and should not be purely empirically motivated (Sharma et al., 2019). The EM therefore represents the current state of theoretical knowledge while the AM represents a new model proposed by the researcher.

Step 2

In order to conduct a fair model comparison the researcher should pay attention to the following issues:

  1. The sample should be representative of the population under study (Hair et al., 2017).
  2. The sample should not a priori favor one model over another (Cooper and Richardson, 1986).
  3. The constructs should have the same measurement model (Mode A or B) in both the EM and the AM.
  4. The settings for both the EM and AM must be the same (e.g., missing data treatments and PLS algorithm settings).
  5. Both the EM and AM should meet the established model evaluation criteria (Hair et al., 2017).

Step 3

The researcher compares the out-of-sample predictive performance of the EM and the AM by using CVPAT.

Step 4

In accordance with the null hypothesis testing practice, the EM represents the null hypothesis. Hence, only when the EM has a significantly higher loss than the AM the researcher has substantial grounds to reject EM in favor of the AM. On the contrary, if the EM does not have a statistically higher loss than the AM, the researcher should retain the EM.

Summary

PLS-SEM has gained widespread popularity in social science and management fields due to its causal-predictive stance (Hair et al., 2019; Sarstedt et al., 2017). Despite this, it has lacked tools to reliably conduct and compare the out-of-sample predictive abilities of models while retaining its explanatory strengths. Liengaard et al. (2020) fill this critical need by introducing CVPAT that can help researchers create robust theories and policies by incorporating predictive model assessments in their studies. SmartPLS may incorporate CVPAT in a user-friendly format to be used by PLS-SEM analysts in the future. In any case, R users can employ the CVPAT package and its technical instructions for applying the CVPAT by using the statistical software R.

References

  • Chin, W. W. (1995). Partial Least Squares is to LISREL as Principal Components Analysis is to Common Factor Fnalysis. Technology Studies, 2(2), 315-319.
  • Cooper, W. H. and Richardson, A. J. (1986). Unfair Comparisons. Journal of Applied Psychology, 71(2), 179-184.
  • Hair, J. F., Hult, G. T. M., Ringle, C. M., and Sarstedt, M. (2017). A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) (2nd ed.). Thousand Oaks, CA: Sage.
  • Hair, J. F., Risher, J. J., Sarstedt, M., and Ringle, C. M. (2019). When to Use and How to Report the Results of PLS-SEM. European Business Review, 31(1), 2-24.
  • Hofman, J. M., Sharma, A., and Watts, D. J. (2017). Prediction and Explanation in Social Systems. Science, 355(6324), 486-488.
  • Jöreskog, K. G. and Wold, H. O. A. (1982). The ML and PLS Techniques for Modeling with Latent Variables: Historical and Comparative Aspects. In H. O. A. Wold and K. G. Jöreskog (Eds.), Systems Under Indirect Observation: Part I. Amsterdam: North-Holland, 263-270.
  • Liengaard, B., Sharma, P. N., Hult, T., Jensen, M., Sarstedt, M., Hair, J., and Ringle, C. M. (2020). Prediction: Coveted, Yet Forsaken? Introducing a Cross-Validated Predictive Ability Test in Partial Least Squares Path Modeling. Decision Sciences, forthcoming.
  • Sarstedt, M., Ringle, C. M., and Hair, J. F. (2017). Partial Least Squares Structural Equation Modeling. In C. Homburg, M. Klarmann, amd A. Vomberg (Eds.), Handbook of Market Research. Cham: Springer, 1-40.
  • Shmueli, G. and Koppius, O. R. (2011). Predictive Analytics in Information Systems Research. MIS Quarterly, 35(3), 553-572.
  • Shmueli, G., Ray, S., Velasquez Estrada, J. M., and Chatla, S. B. (2016). The Elephant in the Room: Evaluating the Predictive Performance of PLS Models. Journal of Business Research, 69(10), 4552-4564.
  • Shmueli, G., Sarstedt, M., Hair, J. F., Cheah, J., Ting, H., Vaithilingam, S., and Ringle, C. M. (2019). Predictive Model Assessment in PLS-SEM: Guidelines for Using PLSpredict., European Journal of Marketing, 53(11), 2322-2347.
  • Sharma, P. N., Sarstedt, M., Shmueli, G., Kim, K. H., and Thiele, K. O. (2019). PLS-based Model Selection: The Role of Alternative Explanations in Information Systems Research. Journal of the Association for Information Systems, 20(4).
  • Yarkoni, T. and Westfall, J. (2017). Choosing Prediction Over Explanation in Psychology: Lessons from Machine Learning. Perspectives on Psychological Science, 12(6), 1100-1122.

Links

Link to More Literature