Balancing truth error and manual processing in the PDQ system

Show simple item record

dc.contributor.advisor Gaborski, Roger en_US Huang, Douglass 2011-12-06T20:46:50Z 2011-12-06T20:46:50Z 2011
dc.description.abstract Production Data Quality (PDQ) is a specialized pattern classifier whose main purpose is to assess independently the data quality of a production classifier. It accomplishes this by producing a high quality Truth from the source input, and then using the Truth to identify errors in the production classifier's output data. Previous studies have shown close agreement between PDQ processing outcomes and a particular mathematical model of the system. In this study we describe and analyze an expanded model that addresses the potential tradeoff between Truth error and manual processing in PDQ, with an eye towards informing operational decisions about precision and efficiency. Using statistical data from the 2010 Census PDQ system as input, we examine the predictions of the new model in order to understand their potential usefulness. The outcomes show strong agreement between two methods for estimating Projected Truth error rate, supporting the validity of both methods as well as the existing static model. In addition, the new Projector model gives tight bounds on the projected manual processing rate and reveals a characteristic relationship between Projected Truth error and projected manual processing. We explore a practical application of this model for tuning PDQ, and we find an opportunity to achieve a 60% efficiency increase for the selected sample, while maintaining an acceptable degree of precision.
dc.language.iso en_US
dc.relation RIT Scholars content from RIT Digital Media Library has moved from to RIT Scholar Works, please update your feeds & links!
dc.subject Data capture en_US
dc.subject Data quality en_US
dc.subject Efficiency en_US
dc.subject Forms processing en_US
dc.subject Independent random variables en_US
dc.subject Truth en_US
dc.subject.lcc QA76.76.T48 B35 2011
dc.subject.lcsh Computer software--Testing
dc.subject.lcsh Computer software--Reliability
dc.subject.lcsh Debugging in computer science
dc.subject.lcsh Classification--Data processing
dc.subject.lcsh Pattern recognition systems
dc.title Balancing truth error and manual processing in the PDQ system
dc.type Thesis B. Thomas Golisano College of Computing and Information Sciences
dc.description.department Department of Computer Science

Files in this item

Files Size Format View
DHuangThesis8-25-2011.pdf 6.974Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record

Search RIT DML

Advanced Search