R2 vs. r2
Cost estimating relationships (CER) with multiplicative error assumptions are commonly used in cost analysis. Consequently, we need to apply appropriate statistical measures to evaluate a CER’s quality when developing multiplicative error CERs such as the Minimum-Unbiased-Percentage Error (MUPE) and Minimum-Percentage Error under the Zero-Percentage Bias (ZMPE) CERs.
Generalized R-squared (GRSQ, also denoted by the symbol r2) is commonly used for measuring the quality of a nonlinear CER. GRSQ is defined to be the square of Pearson’s correlation coefficient between the actual observations and the CER predicted values (see Reference 6). Many statistical analysts believe GRSQ is an appropriate analog to measure the proportion of the variation explained by a nonlinear CER (see Reference 5), including the MUPE and ZMPE CERs; some even use it to measure the appropriateness of shape of a CER.
The adjusted R2 in unit space is a frequently used alternative measure for CER quality. This statistic translates the error sum of squares (SSE) from the absolute scale to the relative scale. This metric is used to measure how well the CER-predicted costs match the actual data set.
There have been academic concerns over the years about the relevance of using adjusted R2 and Pearson’s r2. For example, some insist that adjusted R2, calculated by the traditional formula, has no value as a metric except for ordinary least squares (OLS); others argue that Pearson’s r2 does not measure how well the estimate matches the database actuals for nonlinear CERs. This paper discusses these concerns and examines the properties of these statistics, along with the pros and cons of using each for CER development. In addition, this paper proposes (1) a modified adjusted R2 for evaluating MUPE CERs and (2) a modified GRSQ to correct for degrees of freedom.
Educated at National Taiwan University (B.S., Mathematics) and University of California, Santa Barbara (M.S., Mathematics, and Ph. D., Statistics). Dr. Hu is a Chief Statistician at Tecolote Research, Inc. She joined Tecolote in 1984 and has served as a company expert in all statistical matters. She has over 12 years of experience in Unmanned Space Vehicle Cost Model (USCM) CER development and the related database. She also has 20 years of experience in designing, developing, modifying, and integrating statistical software packages for fitting various types of regression equations, learning curves, cost risk analysis, and other PC-based models.