Evaluating Cost Relationships with Nonparametric Statistics
Parametric statistics are often used to analyze data, evaluate hypotheses, and determine the significance of a given set of inputs. Most commonly, cost estimators use the following parametric testing measures for significance: Pearson’s product-moment correlation, z-tests, t-tests, and Analysis of Variance (ANOVA) tests.
While each of these tests is subjectively appropriate to evaluate varying scenarios, each test also objectively depends on the validity of a fixed set of assumptions. Though the assumptions differ by test, all root themselves in the general idea that the data is contained within an adequately large sample and follows a particular probability distribution with estimable parameters.
When these assumptions are met, parametric tests are relatively simple to calculate and generate conclusions and estimates with high levels of accuracy. However, if the assumptions are incorrect or qualitative data is presented, the aforementioned tests yield invalid results viewed with skepticism. The accuracy of a parametric statistical test ultimately depends entirely on the validity of a specified set of underlying assumptions.
Cost estimators rarely operate within the parameters of a best-case-scenario, especially with regard to data acquisition. Practically, the data obtained frequently contains substantial gaps that do not follow given distributions with estimable parameters. Cost estimates are also not exclusively quantitative. In these instances, the estimator can look to non-parametric statistics to evaluate relationships and determine significance.
Non-parametric statistics are useful for studying nominal and ordinal datasets not following a particular distribution. Data inputs for non-parametric tests can be qualitative, as dummy variables enable quantitative regression analyses to be performed. Non-parametric tests offer results considered more robust, though less powerful, and may present realistic alternatives to estimating exclusively using parametric tests.
The non-parametric tests and measures addressed in this paper include: Spearman’s rank correlation coefficient, Mann-Whitney tests, Wilcoxon Signed-Rank tests, and non-parametric Chi-Squared tests. These tests vary in purpose from identifying the correlation coefficient of two related but nonlinear variables to testing the difference between existing and hypothetical samples that follow different distributions.
This paper examines the deficiencies in parametric tests while emphasizing the advantages of non-parametric statistics and the aforementioned non-parametric tests. This paper provides practical examples where a non-parametric test can determine statistical significance otherwise regarded as invalid by parametric testing.
Kalman & Company, Inc.
Caleb Fleming is a cost analyst supporting various DoD clients. Mr. Fleming is well-versed in the development of life-cycle cost analysis, including the collection and analyses of data, generation of estimate assumptions and methodologies, and creation of cost estimating relationships (CERs). Mr. Fleming is experienced in utilizing parametric and nonparametric statistics in analyses and has functionally applied such concepts — including those addressed in the subsequent paper — to aid in the development of life-cycle cost estimates (LCCEs). Mr. Fleming has supported numerous clients within the Marine Corps Systems Command (MCSC) and the Joint Program Executive Office for Chemical and Biological Defense (JPEO-CBD). His programmatic experience extends from tactical wheeled vehicles to counter-fire radars and chemical and biological detection devices. Mr. Fleming was awarded a bachelors degree in Economics from Virginia Tech in 2011. Additionally, he minored in Statistics and was the winner of several national writing awards.