Detecting Anomalous Cost Data in an Integrated Data Warehouse
Data quality is always a key issue when using historical data to predict future costs. Often a small data error in the historical data can lead to much more significant errors in predicted costs. The problem becomes more acute if data is integrated from several sources, the sources systems have evolved, the source systems were not designed to support analysis, or extensive business rules are required to allocate or assign attributes to the data. This paper discusses systematic techniques, used by the Air Force Total Ownership Cost (AFTOC) program, to find anomalous data. The AFTOC program is operated by the Air Force Cost Analysis Agency. AFTOC integrates data from a dozen sources to provide estimates of operating and support costs for weapon systems and provides this integrated view to variety of users. AFTOC is the primary source of Air Force weapon system data for operating and support cost analysis.
We integrate data from financial, budget, supply, fuel, personnel, maintenance, depot, and ammunition systems. We try to provide the highest quality data and warn our consumers when there are known problems that are impractical to correct. This paper focuses on the part of that quality effort that reviews the data for inconsistencies. We consider two different types of inconsistencies: data from different sources and time series of data. We know the data is not perfect so our primary concern is not to find individual errors, but to identify significant anomalies in patterns of data that could lead to the specious analysis and bad decisions. The data anomalies may be due to bad source data, errors in processing, or real changes in Air Force operations. Over the last few years, we have developed a comprehensive approach to detecting these anomalies. First, we devise a model that should be supported by the data (e.g. annual costs for replacement parts, adjusted for inflation, should be proportional to the hours flown.) Next, the warehouse is divided into subsets that represent a large enough statistical sample (e.g. the annual costs for a each type of aircraft at each base in each major cost category). Then, we define a quality metric to measure how well each subset fits the model (e.g. residuals from a linear regression, correlation of flying hours and fuel use). Finally, we analyze in detail the subsets with the worst metrics. This approach has provided several benefits to the AFTOC program. The quality of our data has improved, since we have found and fixed many issues due to incomplete source feeds, changes in source feeds, and processing mistakes. We have warned our users of issues that they should understand before they use our data. Perhaps most importantly, we have improved our understanding of the strengths and weaknesses of our data.
Northrop Grumman Information Systems
Mr. Brown has 36 years experience in information systems including software development, data analysis, and line/program management. The early part of his career focused on the development of error analysis software to support accuracy modeling for the Polaris, Poseidon and Trident weapon systems. This software implemented models of the guidance and control systems which modeled the propagation of errors throughout the system and the ability of the sensor systems to estimate and correct these errors.
Mr. Brown has managed the corporate data center and helped various clients plan and manage their information system resources. These clients include the Navy Strategic Systems Programs, the Air Force Flight Test Center (Edwards AFB), and the City of Chicago. He also managed the Telecommunications Information Management System project supporting the Federal Aviation Authority.
Over the last 9 years, Mr. Brown has led the Northrop Grumman effort in support of the Air Force Total Owner Cost program (AFTOC). This program is operated by the Air Force Cost Analysis Agency and supported by Battelle and Northrop Grumman. AFTOC is a decision support system for cost analysts based on the integration of several Air Force data systems. Northrop is responsible for the end user access to the data and analysis of the cost data for consistency. Mr. Brown has applied the signal processing and error estimation techniques from the early part of his career to the cost data in AFTOC.