On Estimating the Size and Confidence of a Statistical AuditWorking Paper No.: 54
Date Published: 2008-11-30
Javed A. Aslam, Northeastern University
Raluca A. Popa, Massachusetts Institute of Technology
Ronald L. Rivest, Massachusetts Institute of Technology
We consider the problem of statistical sampling for auditing elections, and we develop a remarkably simple and easily-calculated upper bound for the sample size necessary for determining with probability at least c whether a given set of n objects contains b or more “bad” objects. While the size of the optimal sample drawn without replacement can be determined with a computer program, our goal is to derive a highly accurate and simple formula that can be used by election officials equipped with only a simple calculator. We actually develop several formulae, but the one we recommend for use in practice is: U3(n, b, c) = ln − (b − 1) 2 · 1 − (1 − c) 1/bm = ln − (b − 1) 2 · 1 − exp(ln(1 − c)/b) m As a practical matter, this formula is essentially exact: we prove that it is never too small, and empirical testing for many representative values of n ≤ 10, 000, and b ≤ n/2, and c ≤ 0.99 never finds it more than one too large. Theoretically, we show that for all n and b this formula never exceeds the optimal sample size by more than 3 for c ≤ 0.9975, and by more than (− ln(1−c))/2 for general c.