PLS Predict
Abstract
The PLS predict algorithm has been developed by Shmueli et al. (2016). The method uses training and holdout samples to generate and evaluate predictions from PLS path model estimations.
Description
The research by Shmueli et al. (2016) proposes a set of procedures for prediction with PLS path models and the evaluation of their predictive performance. These procedures are combined in the PLSpredict package https://github.com/ISSAnalytics/plspredict for the statistical software R. They allow generating different outofsample and insample predictions (e.g., casewise and average predictions), which facilitate the evaluation of the predictive performance when analyzing new data (that was not used to estimate the PLS path model). The analysis serves as a diagnostic for possible overfitting of the PLS path model to the training data.
Based on the procedures suggested by Shmueli et al. (2016), the current PLS predict algorithm implementation in the SmartPLS software allows researchers to obtain kfold crossvalidated prediction errors and prediction error summaries statistics (e.g., RMSE and MAPE) to assess the predictive performance of their PLS path model.
Additional procedures and extensions are under development and may become part of future SmartPLS releases.
PLS Predict Settings in SmartPLS
Number of Folds
Default: 10
In kfold crossvalidation the algorithm splits the full dataset into k equally sized subsets of data. The algorithm then predicts each fold (holdout sample) with the remaining k1 subsets, which, in combination, become the training sample. For example, when k equals 10 (i.e., 10folds), a dataset of 200 observations will be split into 10 subsets with 20 observations per subset. The algorithm then predicts ten times each fold with the nine remaining subsets.
Number of Repetitions
Default: 10
The number of repetitions indicates how often PLS predict algorithm runs the kfold cross validation on random splits of the full dataset into k folds.
Traditionally, crossvalidation only uses one random split into kfolds. However, a single random split can make the predictions strongly dependent on this random assignment of data (observations) into the kfolds. Due to the random partition of data, executions of the algorithm at different points of time may vary in their predictive performance results (e.g., RMSE, MAPE, etc.).
Repeating the kfold crossvalidation with different random data partitions and computing the average across the repetitions ensures a more stable estimate of the predictive performance of the PLS path model.
Links
References

Evermann, J. & Tate, M. 2016. Assessing the Predictive Performance of Structural Equation Model Estimators, Journal of Business Research, 69(10): 45654582.
 Shmueli, G., Ray, S., Velasquez Estrada, J. M., and Chatla, S. B. 2016. The Elephant in the Room: Evaluating the Predictive Performance of PLS Models, Journal of Business Research, 69(10): 45524564.