Improving CER building:
Getting rid of the R² tyranny – Building a CER with the Median
The purpose of the author is always to get the maximum – for prediction purposes – of the data which are available to the estimator (assuming their reliability has been already checked). This paper is a contribution to this effort. We will start by a classification of the cost estimating techniques, showing that parametric – of which a clear definition will be given – is just a technique among others (which should deserve more consideration). Nevertheless this paper deals with parametric studies, because it is still the most used technique. In order to find the coefficients which will define the function that will represent (and replace in the prediction process) the available data, the estimator needs to use a “metric” which means here how the distance between two points of the data space is defined. Many metrics could be proposed. But in order to be practically used, a metric must satisfy a few properties which will be briefly recalled. A very important concept for the metric user is its breakdown; this concept will be used for introducing a new metric.
As everybody knows it, the most common metric (the only one used in fact …) is the square of the coordinates’ difference. This metric was introduced by Carl Friedrich Gauss for the reasons which will be quickly recalled. This metric has, for the cost estimator, several drawbacks.
For this reason I investigated another metric, more natural, which is the absolute value of the coordinates’ difference. This is generally called the “median” although this term can be misleading ; the median is rather intuitive when dealing with non parametric studies. But using it with several variables is another story … Computations with absolute values are certainly more complex, but the computer can manage that, even when several variables are taking into account. The results of the computations will be given. But do not expect that the coefficients of the function are given by formulae ; such a thing is not possible with absolute values … For any given set of data, the values of the formula coefficients are computed by iterations.
The most important result about using the “median” is that the sum of the deviations (more exactly their absolute values) is always smaller than with the ordinary least squares. Consequently one can expect more precise estimates … which is what we are looking for ! We will observe that the R² (generally used for expressing the quality of a CER) is completely irrelevant with the median : forget about it ! In order to be able to compare several formulae computed with the median, we had to use a different algorithm.
Assuming some hypotheses about the deviations (by the way these hypotheses are much more natural with the absolute values of the deviations than with their squares) one can compute the accuracy of these coefficients and consequently the confidence interval of any estimate.
A last word: are you really interested in minimizing the squares of the deviations or the deviations themselves? For the author the answer is obvious.
Pierre Foussier is the founder and the manager of 3f, a French company dedicated to the improvement of cost estimating tools. He develops models and/or models generators, trains users and help them progress.
He previously worked for major French organizations in the aeronautics and space industry, first as an engineer, then – during more than 30 years – as a cost estimator. Eventually he had to appreciate cost estimates prepared by the industry or the projects managers.
He is a co‐founder of the French society AFITEP, a non‐profit organization dedicated to the progress of cost estimating, planning and more generally project management. He is also the founder of XPAR, a non‐profit organization aiming at the promotion of the parametric method ; XPAR eventually merged with AFITEP.
He has a degree in mathematics, a degree in theoretical physics and a third one in business administration. He also graduated, as an engineer, from the French school “Ecole Supérieure d’Electricité” dedicated to the training of engineers in electronics for the industry.
He published the book “From Product Description to Cost : a Practical Approach” of which purpose was to deliver to young engineers the basics of cost estimating based on scientific analogy.