Cost Analysis using Random Forest Prediction
Models and Methods Track
What to you do when your cost data is not well suited for linear regression analysis? As an alternative, we experimented with Random Forests, an example of using a computer intensive learning algorithm instead of assuming a parametric model. Random Forests are a collection of Decision Trees. Trees can capture non-linear relationships and interactions among predictors. Individual decision trees are interpretable, however, not necessarily great predictors. By suitably fitting a collection of trees, and averaging the tree predictions, Random Forests results compete with the best machine learning predictors.
Because our dataset has a small number of observations relative to the number of mostly categorical predictors, creating a design matrix for linear regression results in more columns than rows. This forces a strategy of selecting variables, and artificially converting categories to numeric values. An alternative is to consider adaptive nonparametric procedures such as Support Vector Machines and Random Forests. We chose Random Forests, which can handle small datasets with a large numbers of predictors, as well as a mix of categorical and numerical data. We used a model selection strategy, evaluated the goodness of fit using R-squared, and assessed performance on future data sets using prediction error and prediction intervals.
This paper will discuss how we applied Random Forests to estimate costs of simulation-based experimentation projects. First we will discuss the dataset and challenges that lead us to explore Random Forests. Next, we will introduce the concept of Random Forests, followed by a discussion of our approach to estimate costs of these types of projects. We will discuss data preparation, model development and validation, and evaluation of results. We will close with management and end-user perceptions and experiences with the model and results.
The Boeing Company
Karen Mourikas is an Operations and Systems Analyst at The Boeing Company in Huntington Beach, CA. Currently, she supports the Strategic Development and Experimentation (SD&E) group in Phantom Works focusing on Experimentation. In the Experimentation group, Karen manages Space Situational Awareness experiments, from initial development through execution and analysis, in the Space Experimentation & Analysis Center (SEAC) in Seal Beach. Previously, Karen worked as an Affordability Analyst in the Systems Engineering Affordability group. In that role, she worked on various programs estimating Life Cycle Costs, analyzing Cost Uncertainty, integrating Cost Risk with Risk Management processes, and performing Cost Effectiveness Trades. Combining Experimentation and Affordability, Karen proposed and manages the development of the simulation-based experimentation estimating toolkit. Prior to Boeing, Karen worked as an analyst in a statistical organization in Boston, as a software programmer at Boston University, and as a computer consultant at Arthur Anderson in Madrid, Spain. She also spent some time teaching college math classes in Southern California. Karen holds a BA in Mathematics and Computer Science from Connecticut College, and two MS degrees from the University of Southern California, one in Applied Mathematics and the other in Operations Research Engineering. On a personal side, Karen loves to travel and has spent time living and working in Spain and in Scotland.
The Boeing Company
Denise graduated from Cal Poly Pomona with a BS in Statistics and a MS in Pure Math. She taught college math before entering the Aerospace field. Denise has been employed with Boeing Company since 2004. Her expertise is with Parametric Estimating and Affordability/Cost effective trades. She specializes in software estimating and cost-risk analysis.
Under Boeing’s Research and Technology division, Denise works various programs and special studies. Though she has worked a multitude of platforms, her main systems experience is in satellites, space ground systems and related user equipment.
Denise leads Affordability Enterprise training activities including course development and instruction throughout the company. She is an active member of ISPA.
The Boeing Company
James Schimert received his Ph.D. in Statistics from University of Washington, Seattle, Washington in 1992.
He is an Advanced Computing Technologist in the Network Systems Technology group, Boeing Research and Technology, The Boeing Company, Bellevue, Washington. His research interests include statistical algorithms in data mining and machine learning including tree ensembles, in a variety of applications including predicting the cost of simulation based experiments. Previously, he worked in the Research group at Insightful Corp., where he served as consultant and as principal investigator on NIH and NSF SBIR funded research efforts to develop commercial software for advanced statistical computing.