Multicollinearity in Zero Intercept Regression: They Are Not Who We Thought They Were

Asked about the identity of the members the opposing team after being defeated by them, a famous football coach once said, “They are…who we thought they were.” Unfortunately, this is not the case when it comes to multicollinearity in zero intercept linear regression. It is not what we think it is, and the consequences of ignoring the distinction between it and its nonzero intercept counterpart can be devastating.

This presentation addresses the issue, viewed in the context of both of conventional wisdom (“multicollinearity is correlation among the regressors… can be checked with a correlation matrix”) and our earlier paper on the same general topic (Muticollinearity: Coping with the Persistent Beast, 2007 DoDCAS). As we stated in that paper, high correlation among regressors is sufficient, but not necessary, for multicollinearity to occur in standard regression, and true multicollinearity is revealed through variance inflation factors (VIFs).

Here, we show that zero intercept regression presents the opposite problem: high correlation among regressors, is necessary, but not sufficient, for multicollinearity to occur. While VIFs are again the best measure, the standard formula for calculating VIFs does not apply in zero intercept regression. The consequences of incorrectly using the standard formula (which implicitly assumes the existence of a nonzero intercept term) are enormous. VIFs can be overstated by 1,000% or more, potentially leading analysts to needlessly drop explanatory variables from cost estimating relationships, rework regressions that needn’t be reworked, and worry about a problem that in fact doesn’t exist.

We present a revised VIF formula that works in all cases (zero intercept or traditional) and show that (in the traditional case) the two formulas are equivalent. Unfortunately, a major regression-based cost estimating software tool does not use the revised VIF formula, and dramatically overstates zero-intercept multicollinearity statistics as a result. We give examples of this problem, and offer techniques to adjust the tools output.


Kevin Cincotta
