Image

Necessary Condition Analysis (NCA)

Abstract

The necessary condition analysis (NCA) is a data analysis technique for identifying necessary (but not sufficient) conditions in data sets. It complements traditional regression-based data analysis including partial least squares structural equation modeling (PLS-SEM) as well as methods like qualitative comparative analysis (QCA).

Brief Description

Originally developed by Dul (2016), the necessary condition analysis (NCA) is a widely adopeted data analysis technique that enable the identification of necessary conditions in data sets (Bokrantz & Dul, 2023; Dul, 2020; Dul, 2025). The NCA is a methodological approach that focuses on identifying necessary conditions (i.e., those that must be present for a particular outcome to occur). This stands in contrast to sufficiency logic, which emphasizes conditions that should be present to produce the outcome but are not strictly required (e.g., coefficients obtainded by the regression analysis or a path model). In necessity logic, a condition is considered necessary if the outcome cannot happen without it. In other words, if the necessary condition is missing, the outcome will definitely not occur. However, having the necessary condition alone does not guarantee the outcome; other factors may also need to be in place (Dul, 2024; Dul et al., 2023). Thus, necessary conditions represent "bottlenecks" or prerequisites that set the upper boundary for the outcome: no outcome beyond a certain level is possible unless the necessary condition is met. On the other hand, sufficiency logic deals with conditions that, when present, should lead to the outcome. A sufficient condition means that the presence of this condition guarantees the outcome, although the outcome may also occur via other pathways where that specific condition is absent. To put it simply:
  • Necessary (must-have) conditions: Without these, the outcome cannot occur. They are prerequisites or non-negotiable requirements.
  • Sufficient (should-have) conditions: With these, the outcome is guaranteed to occur, but they are not mandatory because the outcome might still happen through alternative means.
NCA’s unique contribution lies in systematically uncovering these must-have conditions, which traditional sufficiency-based approaches (like regression or fsQCA) may overlook because they focus primarily on whether certain factors increase the likelihood or guarantee the outcome, rather than whether the outcome is impossible without them. By identifying and quantifying such necessary conditions, NCA provides critical insights into the constraints and minimal requirements governing an outcome.
The NCA can be used for regression models and partial least squares structural equation modeling (PLS-SEM), as explained in detail by Richter et al. (2020) and Richter et al. (2023a) -- see also Hair et al., (2024) and Richter et al. (2023b). SmartPLS fully supports the NCA for regression models (i.e., for regression models via the algorithm Necessary condition analysis (NCA)). By using the unstandardized latent variable scores or latent variable scores on a scale from 0 to 100 as obtained by an importance-performance map analysis (IPMA), it is possible to run the NCA in PLS-SEM for the partial regression models of the structural model. Note: The combined IPMA and NCA (cIPMA) represents a recent extension of the PLS-SEM's results use for an NCA. For further details on the cIPMA see Hauff et al. (2024) and the SmartPLS software tutorial by Sarstedt et al. (2024).
The figure below shows a model example for an NCA; for further explanations see the NCA chapter in Hair et al. (2024) and the case study on the NCA in PLS-SEM using SmartPLS:
NCA Model
Instead of analyzing the average relationships between dependent and independent variables, NCA aims to reveal areas in scatter plots of dependent and independent variables that may indicate the presence of a necessary condition. While ordinary least squares (OLS) regression-based techniques establish a linear function in the center of the relevant data points, NCA determines a ceiling line on top of the data. The figure below shows two default ceiling lines: (1) The ceiling envelopment - free disposal hull (CE-FDH) line, which is a non-decreasing step-wise linear line (step function); and (2) the ceiling regression - free disposal hull (CR-FDH) line, which is a simple linear regression line through the CE-FDH line.
NCA Model
The ceiling line separates the space with observations from the space without observations. The larger the empty space, the larger the constraint that X puts on Y. The ceiling line also indicates the minimum level of X that is required to obtain a certain level of Y. This NCA outcome differs from the interpretation of linear regression where an increase of X leads, on average, to an increase of Y. Alternatively, the bottleneck table presents the ceiling line results in a tabular form (see figure below). The first column of the table shows the outcome, whereas the next column represents (and additional columns represent) the condition(s) that must be satisfied to achieve the outcome. The results of both the outcome and the condition(s) may refer to the actual values, percentage values of the range, and percentiles.
NCA Model
Two key NCA parameters are the ceiling accuracy and necessity effect size d. The ceiling accuracy represents the number of observations that are on or below the ceiling line divided by the total number of observations, multiplied by 100. While the accuracy of the CE-FDH ceiling line is per definition 100%, the accuracy of the other lines, for instance the CR-FDH, can be less than 100%. There is no specific rule regarding the acceptable level of accuracy. However, a comparison of the estimated accuracy with a benchmark value (e.g., 95%) can assist to assess the quality of the solution generated. The necessity effect size d and its statistical significance indicate whether a variable or construct is a necessary condition (Dul et al., 2020). d is calculated by dividing the 'empty' space (called the ceiling zone) by the entire area that can contain observations (called the scope). Thus, by definition, d ranges between 0 ≤ d ≤ 1. Dul (2016) suggested that 0 < d < 0.1 can be characterized as a small effect, 0.1 ≤ d < 0.3 as a medium effect, 0.3 ≤ d < 0.5 as a large effect, and d ≥ 0.5 as a very large effect. However, the absolute magnitude of d is only indicative of the substantive significance, that is the meaningfulness of the effect size from a practical perspective. Therefore, NCA also enables researchers to evaluate the statistical significance of the necessity effect size yielded by a permutation test, which should also be considered when deciding about a necessity hypothesis (Dul, 2020). SmartPLS support the permutation-based significance testing of the effect sizes d (i.e., for regression models via the algorithm NCA permutation).

NCA Settings in SmartPLS

The Necessary condition analysis (NCA) algorithm for regression models in SmartPLS only requires a single parameter setting, which is the Number of steps for bottleneck tables. Thereby, you can specify the number of steps into which the dependent variable is divided for NCA's bottleneck tables. The default value is 10, which divides the displayed results into 10% steps from 0% to 100%. However, you may also choose a more detailed result display. For example, a value of 20 divides the results into 5% steps from 0% to 100%.
Moreover, when executing the NCA, make sure that the scale of your indicators is as expected. For example, an interval scale from 1 to 7 should show the minimum value of 1 and the maximum value of 7 (e.g., this may not be the case if no respondent selected the 1). This is important to ensure the accuracy of results.

NCA Permutation Settings in SmartPLS

The NCA permutation algorithm for regression models in SmartPLS requires setting that are described in the following.

Subsamples

The number of permutations determines, how often the overall dataset is randomly permuted into the two groups to approximate the non-parametric reference distribution for the null hypothesis.
To ensure stability of the results, the number of permutations should be sufficiently large. For a quick initial assessment, one may choose a smaller number of permutation subsamples (e.g., 1,000). For obtaining the final results, however, one should use a large number of permutations (e.g., at least 5,000).

Do parallel processing

If chosen the bootstrapping algorithm will be performed on multiple processors (if your computer offers more than one core). As each subsample can be calculated individually, subsamples can be computed in parallel mode. Using parallel computing will reduce computation time.

Significance level

Specifies the significance level of the test statistic.

Random number generator

The algorithm randomly generates subsamples from the original data set, which requires a seed. You have the option to choose between a random seed and a fixed seed.
The random seed produces different random numbers and therefore results every time the algorithm is executed (this was the default and only option in SmartPLS 3).
The fixed seed uses a pre-specified seed that is the same for every execution of the algorithm. Thus, it produces the same results if the same number of subsamples are drawn. It thereby addresses concerns about the replicability of research findings.

References

Cite correctly

Please always cite the use of SmartPLS!

Ringle, Christian M., Wende, Sven, & Becker, Jan-Michael. (2024). SmartPLS 4. Bönningstedt: SmartPLS. Retrieved from https://www.smartpls.com