Balancing truth error and manual processing in the PDQ system

Show full item record

Title: Balancing truth error and manual processing in the PDQ system
Author: Huang, Douglass
Abstract: Production Data Quality (PDQ) is a specialized pattern classifier whose main purpose is to assess independently the data quality of a production classifier. It accomplishes this by producing a high quality Truth from the source input, and then using the Truth to identify errors in the production classifier's output data. Previous studies have shown close agreement between PDQ processing outcomes and a particular mathematical model of the system. In this study we describe and analyze an expanded model that addresses the potential tradeoff between Truth error and manual processing in PDQ, with an eye towards informing operational decisions about precision and efficiency. Using statistical data from the 2010 Census PDQ system as input, we examine the predictions of the new model in order to understand their potential usefulness. The outcomes show strong agreement between two methods for estimating Projected Truth error rate, supporting the validity of both methods as well as the existing static model. In addition, the new Projector model gives tight bounds on the projected manual processing rate and reveals a characteristic relationship between Projected Truth error and projected manual processing. We explore a practical application of this model for tuning PDQ, and we find an opportunity to achieve a 60% efficiency increase for the selected sample, while maintaining an acceptable degree of precision.
Record URI: http://hdl.handle.net/1850/14474
Date: 2011

Files in this item

Files Size Format View
DHuangThesis8-25-2011.pdf 6.974Mb PDF View/Open

The following license files are associated with this item:

This item appears in the following Collection(s)

Show full item record

Search RIT DML


Advanced Search

Browse