A Methodology for Multivariate Regression on Large Datasets

Large datasets can present a formidable challenge to the analyst. Proper analysis of a long list of potential cost drivers can require enough regression computations and scatter plots to make discovering good relationships difficult. A methodical approach helps to keep the analysis organized and ensures that nothing is overlooked. This presentation will put forth a standardized way to approach analyzing independent variables in regression and an application of automating this approach in VBA. To facilitate comparison, this method includes choosing the order of the combinations of drivers to regress, entails summary sheets to compare regression equations side-by-side, and makes use of Excel’s conditional formatting. Additionally, in order to help identify obscure information in the data, a dynamic scatter plot allows the analyst to quickly view large numbers of scatter plots displaying up to five variables at a time.


Matt Pitlyk
Booz Allen Hamilton
Matt Pitlyk is a cost analyst at Booz Allen Hamilton. His experience includes CER development on space systems, aircraft, and software development for various agencies including the Air Force Cost Analysis Agency. His work centers on developing and evaluating regression equations, often employing VBA within Excel. He has also been a contributor to the Cost Estimating Body of Knowledge.
He graduated with an M.A. in Mathematics from St. Louis University in 2009. During the following year, he taught several math courses as St. Louis University including courses in statistics. He joined TASC, Inc. in July 2010 as a cost analyst. He joined Booz Allen Hamilton in April 2011.