Testing for the Significance of Cost Drivers Using Bootstrap Sampling
Most regression-based statistical modeling aims to derive, on the basis of historical data, an algebraic expression for estimating a quantity of interest, such as cost. Such an algebraic expression is characterized by one or more “fit parameters,” which are typically coefficients or exponents. Once these fit parameters are determined, it is a logical next step to test for their accuracy by a process called “statistical inference.” The fit parameters of a cost-estimating relationship (CER) are each directly linked to a candidate cost driver. The inference process may lead to some fit parameters being judged “significant” and others “not significant,” where “significance” refers to the extent to which their respective associated cost drivers impact the cost estimates made using the CER. Cost drivers associated with fit parameters that are judged to be not significant can (and should) be removed from the algebraic expression for the CER without sacrificing any estimating capability. Furthermore, the ability to remove insignificant cost drivers from a CER can be valuable, because fewer fit parameters means more degrees of freedom, and that can be important when only a small number of data points are available. Thus it is important to be able to eliminate the insignificant cost drivers from a CER.
Because of the strict mathematical assumptions underlying ordinary least squares (OLS) linear regression, explicit formulas for fit parameters, significance testing and confidence bounds can be established. However, if any of the OLS assumptions do not apply, inferences based on those formulas are unreliable. This situation has led cost analysts to understand the need for general-error regression. Up to now, analysts have been concerned only with the quality of the estimating process itself, and have been ignoring the need to assess the significance of the fit parameters and cost drivers. Recently, however, researchers have begun to slowly realize this shortcoming and have started to explore avenues for filling the gap. In a paper entitled, “A Distribution-Free Measure of the Significance of CER Regression Fit Parameters Established Using General Error Regression Methods,” (Journal of Cost Analysis and Parametrics, Vol. 2, Issue 1, Summer 2009, pages 7-22), T.P. Anderson proposed some “heuristic” techniques of assessing the significance of the fit parameters.
Anderson calls his method heuristic, because it is not determined by strict mathematical calculations. Instead, we will apply bootstrap sampling, a method that can approximate the mathematics of statistical inference procedures and that was recently applied to the case of prediction bounds for CERs derived by general-error regression, another situation where explicit formulas exist for OLS CERs but not for general-error CERs. (See the briefing by S.A. Book, “Prediction Bounds for General-Error-Regression CERs,” 39th Annual DoDCAS, Williamsburg VA, 14-17 February.) The bootstrap is a data-based method that does not require any OLS-like distributional assumptions or explicit formulas. In this paper, we specifically look at constructing approximate bootstrap-based statistical tests to assess the significance of CER fit parameters and their associated cost drivers, allowing conclusions to be drawn regarding which candidate cost drivers are significant and which are not.
Daniel I. Feldman is a Cost Analyst at MCR, LLC. Since joining MCR in early September 2005, he has worked on developing new techniques for deriving CERs and for examining their significance. He has also worked on launch and space vehicle modeling and trade-study analysis. Mr. Feldman earned his B.S. in mathematics in June 2005, with concentration in statistics, at the University of California, Irvine, and his M.S. in applied statistics at California State University, Long Beach. He wrote his master’s thesis on the subject of bootstrap sampling.