Blindfolding

Abstract

Blindfolding is a sample re-use technique. It allows calculating Stone-Geisser's Q² value (Stone, 1974; Geisser, 1974), which represents an evaluation criterion for the cross-validated predictive relevance of the PLS path model.

Description

Besides evaluating the magnitude of the R² values as a criterion of predictive accuracy, researchers may desire to also examine Stone-Geisser’s Q² value (Stone, 1974; Geisser, 1974) as a criterion of predictive relevance. The Q² value of latent variables in the PLS path model is obtained by using the blindfolding procedure.

Blindfolding is a sample re-use technique, which systematically deletes data points and provides a prognosis of their original values. For this purpose, the procedure requires an omission distance D. A value for the omission distance D between 5 and 12 is recommended in literature (e.g., Hair et al., 2017). An omission distance of seven (D=7) implies that every fifth data point of a latent variable's indicators will be eliminated in a single blindfolding round. Since the blindfolding procedure has to omit and predict every data point of the indicators used in the measurement model of the selected latent variable, an omission distance of D=7 results in seven blindfolding rounds. Hence, the number of blindfolding rounds always equals the omission distance.

In the first blindfolding round, the procedure starts with first data point and omits every D-th data point of a latent variable's indicators. Then, the procedure estimates the PLS path model by using the remaining data points. The omitted data represent missing values and are treated accordingly (e.g., by mean value replacement or pairwise deletion). The PLS-SEM results are then used to predict the omitted data points. The difference between the omitted data points and the predicted ones are the prediction error. The sum of squared prediction errors is used to calculate the Q² value. Blindfolding is an iterative process. In the second blindfolding round, the algorithm starts with the second data point, omits every D-th data point and continues as described before. After D blindfolding rounds, every data point has been omitted and predicted.

When PLS-SEM exhibits predictive relevance, it well predicts the data points of indicators. A Q² value larger than zero for a certain endogenous latent variable indicates the PLS path model model has predictive relevance for this construct. For detailed explanations of the blindfolding procedure, see Hair et al. (2017).

Blindfolding Settings in SmartPLS

Default value of the omission distance: 7

The systematic pattern of data point elimination and prediction in the blindfolding procedure depends on the omission distance (D). The user must select a value for D when running the blindfolding procedure. Suggested values of D are between 5 and 12.

An omission distance of five, for example, implies that every fifth data point of the target construct's indicators are eliminated in a single blindfolding round. Since the blindfolding procedure has to omit and predict every data point of the indicators used in the measurement model of a certain latent variable, it comprises five blindfolding rounds. Hence, the number of blindfolding rounds always equals the omission distance D.

It is important to note that the omission distance has to be chosen so that the number of observations in the data set divided by the omission distance D is not an integer. If the number of observations divided by D results in an integer, the procedure would delete full observations (i.e., entire rows of the data set). Hence, the number of observations used per blindfolding round would be smaller than the number of observations in the original data set. However, the goal of the blindfolding procedure is to use all observations for prediction and, thus, not to delete entire observations per blindfolding round. For this reason, the number of observations used in the original data set divided by the omission distance D must not be integer.

Links

References

  • Geisser, S. (1974). A Predictive Approach to the Random Effects Model, Biometrika, 61(1): 101-107.

  • Hair, J. F., Hult, G. T. M., Ringle, C. M., and Sarstedt, M. (2017). A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM), 2^nd^ Ed., Sage: Thousand Oaks.

  • Stone, M. (1974). Cross-Validatory Choice and Assessment of Statistical Predictions, Journal of the Royal Statistical Society, 36(2): pp 111-147.

Link to More Literature