Conservative statistical post-election audits
There are many sources of error in counting votes: the apparent winner might not be the rightful winner. Hand tallies of the votes in a random sample of precincts can be used to test the hypothesis that a full manual recount would find a different ou…
Authors: Philip B. Stark
The Annals of Applie d Statistics 2008, V ol. 2, No. 2, 550–581 DOI: 10.1214 /08-A OAS161 c Institute of Mathematical Statistics , 2 008 CONSER V A TIVE ST A TISTICAL POST-ELECTION A UDITS By Philip B. St ark University o f Califor nia, Berkeley There are many sources of error in counting vo tes: the apparen t winner mig ht not b e th e rig htful winner. Hand tallies of th e votes in a random sample of p recincts can b e used to test the hyp othesis that a full man ual recount w ould find a different outcome. This pap er develops a conserv ative sequ entia l test based on the vote-coun t ing errors found in a hand tally of a simple or stratified random sam- ple of precincts. The proced ure includes a natural escalation: If the hypothesis that th e apparent outcome is incorrect is not rejected at stage s , more p recincts are aud ited . Ev entuall y , either the hypothesis is rejected—and the app aren t outcome is confi rmed—or all p recincts hav e been audited and the true outcome is k n o wn. The test u ses a pri- ori b oun ds on the o versta tement of the m argin that could result from error in eac h precinct. Such b ounds can b e derived from the rep orted counts in eac h precinct and upp er b ound s on the number of votes cast in eac h p recinct. The test allo ws errors in d ifferent p recincts to b e treated differently to reflect voting technolog y or p recinct sizes. It is not optimal, but it is conserv ative: th e chance of erroneously con- firming the outcome of a contest if a full manual recount would show a d ifferent outcome is no larger than the nominal significance lev el. The app roac h also gives a conserv ativ e P -v alue for th e hyp othesis that a full manual recount would find a different outcome, given th e errors found in a fixed size sample. This is illustrated with tw o con- tests from Nov ember, 2006: the U.S. Senate race in Minnesota and a school b oard race for the Sausalito Marin Cit y Sc hool District in Califo rnia, a small contest in which voters could vote for up to th ree candidates. 1. In tro d uction. V otes can b e miscounted b ecause of human error (by v oters or election w orkers), hardware or s oft wa re “bu gs” or delib erate fraud. P ost-election aud its—man u al tallies of v otes in individual precincts—are in- tended to detect miscoun t, esp ecially miscoun t large enough to alter the out- come of the election. 1 T o the b est of my kn o wledge, eight een states requ ire or Received Octob er 2007; rev ised March 2008. Key wor ds and phr ases. Hy p othesis test, sequential test, auditing, elections. This is an electronic reprint o f the orig inal article published by the Institute of Mathematical Statistics in The A nnals of Applie d S tatistics , 2008, V o l. 2 , No. 2 , 550–5 81 . This reprint differs from the original in pagination and typogr aphic de ta il. 1 P ost-election audits can also reveal p ro cess problems, p rogramming errors, equipment malfunctions and other issues th at sh ou ld b e addressed even if they do not change the 1 2 P . B. ST ARK allo w p ost-election aud its [ National Asso ciation of Secretaries of State ( 2007 ) and V erified V oting F oundation ( 200 7 )]. California is one. Since 1965, Cali- fornia Elections Cod e has requir ed a hand count of the b allots in a r andom sample of 1% of the pr ecincts in eac h count y , p lus one p recinct for eac h con test not repr esen ted in the 1% sample. 2 A p ost-electi on audit of 1% of precincts is a reasonable c hec k for gross error and malfunction. Ho w ever, to provide h igh confi d ence 3 that a full manual recoun t w ould confirm the apparen t outcome requires auditing a num b er of precincts that dep ends on the num b er of precincts in the con test, the num b er of b allots cast in eac h precinct, the apparen t margin of victory and the discrepancies the audit finds. No fl at p ercen tage, short of 100%, giv es high confidence in all circum- stances. 4 In August 2007, California S ecretary of S tate Debra Bo w en d e-certified and conditionally re-certified electronic v oting machines in California. On e condition of re-certificat ion is that electio ns b e aud ited u sing a samp le size that dep end s on “the apparen t margin of victory , the num b er of pr ecincts, the num b er of ballots cast in eac h precinct, and a desired confidence leve l that the winner of the election h as b een called correctly .” 5 The metho d outcome. And audits deter fraud. See Norden et al. ( 2007 ) and Jefferson et al. ( 2007 ). F or more on election monitoring, see Bjornlund ( 2004 ). A n alternative approach to detecting error and deterring fraud is the “quick count,” which monitors the coun ting p rocess at a random set of p olling stations or precincts. See Estok, Nev itt e and Co wan ( 2002 ). An adv antage of q uick counts is that t h ey can monitor the pro cess, not just t he outcome. A disadv antage is that p oll w ork ers and p otential fraudsters can know which precincts or p olling places are being monitored b efore the counts are official. The U.S. Go vernment Accountabilit y Office has published many rep orts on t he accuracy and reliabilit y of voting systems and election outcomes [e.g., Elections: F edera l efforts to impro ve security and reli- abilit y of electronic voting systems are u nder wa y , bu t key activities need to b e completed ( 2005 ), Elections: The nation’s evolving election system as reflected in the Nov ember 2004 general election ( 2006 ) and Hite ( 2007 )]. 2 See, for example, California Elections Cod e § 15360. 3 The meaning of “confidence” in the election audit comm unity differs from its meaning in statistics. The “confid ence” that t he apparent outcome is correct is 100% minus th e P - v alue of the hypoth esis that the apparent outcome differs from the outcome a full manual recount w ould fi n d. 4 Some aud it laws , such as California’s 1% la w, use the same precinct sampling frac- tion for every contest in an election. The amount of error required to make the apparent outcome of a contest wrong dep en ds on th e margin in the contest. The p rob ab ility d istri- bution of the miscoun t an audit uncov ers in a con test dep end s on h o w the sample is drawn and the sample size, and also on the num b er of precincts in the contest and the num b er of ballots and miscounted ballots in eac h contest in each precinct. And th e amount of error required to prod uce to make one of the losing candidates app ear to b e the winner dep ends on the margin in the contes t. Th u s, the decision of whether to confi rm an election outcome dep ends on v ariables that are sp ecific to a single contes t. The metho d developed here add resses one con test at a time. 5 See www.so s.ca.go v/elections/elections vsr.htm . CONSER V A TIVE ELECTION AUDITS 3 present ed here s olv es that p roblem. I am not a w are of an y other metho d that do es. New Jersey recen tly passed a b ill that requires p ost-election audits of randomly selected precincts, “to ensure with at least 99% statistic al p o wer that for eac h f ederal, gub ernatorial or other Statewide election h eld in the State, a 100% man u al r ecoun t of the vot er-v erifiable p ap er records would n ot alter the electoral outcome rep orted by the audit. F or eac h election h eld for State office, other than Go v ernor and Lieutenan t Go vernor, and f or count y and m u nicipal elections held in 100 or more electio n districts (the pro cedur e will) ensur e with at least 90% statistical p o wer that a 100% manual recount of the vo ter-v erifiable pap er r ecords w ould not alter the electo ral outcome rep orted by the audit.” 6 Again, the metho d p r esen ted here is the only one I am a wa re of that meets this requiremen t. The U.S. House of Repr esen tativ es is considering a b ill, H.R. 811, The V oter Confidence and In cr eased Acce ssibilit y Act of 2007 (Holt), 7 whic h requires p ost-elec tion audits of fed er al elections. The samp ling p ercenta ge dep end s on the app aren t margin of victory . Because the sampling p ercen tage do es not take into account pr ecinct sizes, the num b er of p recincts in a con test or the errors uncov ered during th e au d it, it do es not guaran tee any particular lev el of confid ence that the app aren t outcome agrees with the outcome a fu ll man u al recount wo u ld fi nd. The Massac husetts legislature is also consid ering a bill that would r equire p ost-election audits of 5% of precincts, H671. The b ill demands a complete recoun t if th e d iscrepancy b et w een the man ual count and the rep orted vo te exceeds certain thr esholds. Lik e H.R. 811, H671 requir es sampling a p er- cen tage of p recincts that do es not dep end on the num b er of precincts in the con test, so it do es not guaran tee an y particular lev el of confidence that the correct candidate was named the win ner—unless a full recount is triggered. Minnesota has an audit la w (SF 2743) that r equires audits of elections for President , gov ernor, U.S. Senator and U.S. Represent ativ e. T he samp le size in eac h count y is related to the num b er of r egistered v oters in the count y , rather than the num b er of p r ecincts in th e count y . T h e samp ling p ercent age the law requires do es not tak e into accoun t the num b er of pr ecincts in the con test or the margin, but it has pr ovisions for increasing the sample size if discrepancies are found; large discrepancies can trigger a recoun t of a coun ty or an entire congressional district. Like the bills men tioned ab ov e, the Minnesota audit la w do es n ot guaran tee any particular level of confid ence that the outcome of the election is correct. See also Section 5.2 . Previous pap ers on the statistics of p ost-elect ion aud its [e.g., Saltman ( 1975 ), McCarth y et al. ( 2008 ), Dopp and Stenger ( 2006 ) and Riv est ( 2006 )] 6 www.njle g.state.nj.us/2 006/Bills/AL07/349 .PDF . 7 See holt.house.go v/HR 811.s html . 4 P . B. ST ARK in essence ha v e concen trated on the question, “if there is enou gh er r or o ve rall to c hange the outcome of an election, h o w large a random sample of p recincts m u st b e drawn to hav e c h ance at least 1 − α of finding at least one er r or?” 8 If fewer precincts than that are aud ited, w e will n ot ha ve 1 − α confidence that the outcome of the election is correct, ev en if the audit finds no err ors . That is b ecause th er e are w a ys of distributing enough miscoun t to s p oil the electio n that hav e chance greater than α of b eing missed entirely by th e sample. If the sample is at least as large as these metho ds prescrib e, and the man u al tally fin ds n o error, we are done: either the apparent w inner is the true winner or an ev ent with probability less than α o ccurr ed (or one of the assumptions of th e m etho d is wrong). But if the samp le conta ins any miscoun t, ho wev er small, these appr oac hes do not tell u s ho w reliable the electio n ou tcome is, nor whether to confirm the outcome. The rules are incomplete. Man ual tal lies routinely turn u p s m all miscoun ts. Wh at should we d o then? Recoun t the entire contest b y hand? Audit m ore p recincts? If so, how man y? What if the expanded audit finds more miscount? When do w e stop? Ho w do w e decide whether the outcome is in doubt? An audit pro cedur e is incomplete u nless it alw a ys either (i) confirms the outcome of the election or (ii) demands a f ull r ecoun t. An d it should ha ve an error rate that can b e qu an tified in a reasonable w a y . F or example, a pro cedure might come with a mathematical guaran tee that if it confirms the outcome of the electio n , either the outcome is the same that a full man u al recoun t wo uld find , or an even t with probabilit y n o greater than α o ccurred. 9 Deciding w hether to confir m the outcome of a con test can b e view ed as testing a null hyp othesis. The n ull hyp othesis for elect ion audits can b e c hosen in more than one wa y . F or example, the null h yp othesis could b e “the outcome is right” or “the outcome is wrong.” In the Neyman/P earson paradigm, the c h an ce of a t yp e I error, the err or of rejecting the n u ll hyp oth- esis when it is true, is con trolled to b e at most α , th e significance level . 10 The risk of incorrectly r ejecting the null hyp othesis when it is tru e is primary . 8 The comput ations in those p ap ers assume that the precincts to b e hand - tallied are a random sample without replacement drawn from all th e precincts in t he contest. H o w ever, in California, t he p recincts for audit are not c hosen t h at wa y . Rather, 1% of the precincts in each county are chose n at random (additional precincts are chosen, n ot n ecessarily at random, if con tests are missed by th e sample). This is a stratified random sample of precincts, not a simple random sample of precincts. 9 See Section 6.5 for other p ossibilities. 10 One can t ry to find the level - α t est that maximizes the p o wer, the chance of rejecting the null hyp othesis when a particular alternative is true. CONSER V A TIVE ELECTION AUDITS 5 In election aud iting, the p rimary risk is that of confir m ing an outcome that is wr ong. F ailing to confirm an outcome that is correct—on th e basis of an initial audit samp le—could lead to add itional auditing, bu t that economic risk seems less serious than the risk of aw arding the con test to the wrong candidate. W e wan t the aud it to pr ovide strong evidence th at th e cont est came out righ t, not just to fail to fin d evidence that th e conte s t came out wrong. Hence, it mak es sens e to c ho ose the null hyp othesis to b e that th e outcome is wrong, and to devise a test that has p robabilit y at most α of incorrectly rejecting that hyp othesis. I f we r eject the hypothesis that the outcome is wr ong, w e conclude that the apparent outcome is the outcome a full man ual r ecount would fin d. If not, we coun t m ore vot es. Even tually , either w e confirm the outcome or we ha v e recoun ted all the ballots b y hand. This pap er constructs a conserv ativ e sequential test of th e hypothesis that the apparent outcome is n ot the outcome a full man ual recount would fin d. The test terminates either with the declaration that the apparen t outcome is correct or with a full r ecoun t. The c hance is at most α that the pro cedur e declares that the outcome is correct if the outcome is not the outcome a full man ual recoun t would fi nd. The pro cedure also giv es a P -v alue for the h y p othesis that the outcome is incorrect: a n um b er P such that, giv en the errors observ ed in th e sample, either a fu ll manual recount would find the same outcome or an even t that had probability n o greater than P o ccurred. In the approac h dev elop ed here, an audit can confirm the outcome of a con test, but only a f u ll man ual recount can inv alidat e th e ou tcome. So, there is a p ositiv e probabilit y that a declaration that the electio n outcome is correct is mistak en, b ut a declaratio n that the outcome is incorrect is as certain as a full manual recoun t can b e. The approac h automatically leads to a full recoun t if the outcome of the con test is not v alidated b y a m an ual tally of some su ffi cien tly large r an d om sample of precincts. There are m an y ad ho c choice s in the metho d b elo w, and the app roac h is not th e most p ow erful [with a differen t metho d, it m igh t b e p ossible to get the same confid ence by aud iting few er pr ecincts. See, e.g., Stark ( 2008 b ).] The c hoices were made to simp lify the exp osition and implementa tion: meth- o ds need to b e transp arent to b e adopted as part of the election p r o cess and to inspir e public confi dence. F or example, an appr oac h that requ ir ed numer- ical optimization to maximize P -v alues for a like liho o d ratio test statistic o ve r sets of nuisance parameters migh t b e more efficien t, but b ecause of its complexit y w ould lik ely meet resistance from elections officials and v ot- ing righ ts group s. In con trast, the most esoteric calculation requir ed for the metho d present ed here is N n . It could b e implemen ted in a spreadsheet program, w hic h is p erhaps a go o d d esign criterion f or soft ware to b e used b y jur isdictional users at all leve ls of go ve rnment. The main p oin t of this pap er is not the metho d itself; r ather, the metho d is an existence pro of sho w ing that it is p ossible to get conserv ativ e statistical 6 P . B. ST AR K T able 1 Notation C num b er of counties with at least one precinct in the contest. C the integers { 1 , . . . , C } . N ≡ P c ∈C N c num b er of precincts in the contest. N the integers { 1 , . . . , N } . N c num b er of precincts in the contest in count y c . J ⋆ n a simple random sample of n elements of N . J ⋄ n a random sample with replacement of n elements of N . b p rep orted voting opp ortunities in precinct p , f times the num- b er of ballots rep orted in precinct p , includ ing und ervo ted and inv alid ballots. B c rep orted voting opp ortunities in county c . B ≡ P p ∈N b p = P c ∈C B c rep orted voting opp ortunities in th e contest. K num b er of candidates and pseudo-candidates in the contest, after p o oling. See Section 3.1 . K the integers { 1 , . . . , K } . K w the indices of the f candidates who are apparent winners. K ℓ the in d ices of t he K − f candidates who are app arent losers. a kp actual vote for (pseudo-)candidate k in precinct p . A k ≡ P p ∈N a kp actual total vote for (p seudo-)candidate k . r p upp er b ound on P k ∈K a kp , the actual total vote in precinct p . v kp rep orted vote for (p seud o-)candidate k in precinct p . V k ≡ P p ∈N v kp total vote rep orted for (pseud o-)candidate k . M o vera ll apparent margin in votes: rep orted votes for th e ap- parent winner(s) with few est rep orted votes, minus rep orted votes for an th e apparent loser(s) with the m ost rep orted votes: M = V k ∈K w V k − W k ∈K ℓ V k . e p ≡ P k ∈K w ( v 1 p − a 1 p ) + maximum by which error in precinct p could increase M . + P k ∈K ℓ ( a kp − v kp ) + u p a priori upp er b oun d on e p . See Section 3.2 . E = P p ∈N e p maximum by which error in all p recincts could increase M . w p ( · ) a monotonic weigh t function for error in precinct p . See Sec- tion 3.3 . w − 1 p ( · ) the inv erse of w p : w − 1 p ( t ) ≡ sup z { z : w p ( z ) ≤ t } . measures of confi dence in election outcomes from p ost-electio n audit results using simp le compu tations. 2. Assumptions and notation. T able 1 sets out the notation. All v ariables refer to a sin gle con test of the f orm “vot e for up to f candidates.” Eac h ballot has f v oting opp ortunities for the cont est; there are f apparen t winners of the con test. A b allot with vo tes f or more than f candid ates is overvote d . CONSER V A TIVE ELECTION AUDITS 7 Ov ervotes are inv alid—they do not count as v otes for any candidate. 11 A ballot w ith vo tes for fewe r than f candid ates is undervote d . The num b er of undervote s on suc h a b allot is f minus the num b er of v otes. The analysis uses the follo wing assumptions: 1. All kinds of error are p ossib le in the mac h ine counts: there can b e errors in the num b er of v alid vo tes for eac h candidate, un derv otes and in v alid v otes. Ballots can b e o verlook ed en tirely . Ballots that d o not exist can b e coun ted. 2. The truth is w hatev er the hand tally shows. (When the hand count do es not m atch the machine count, the h an d coun t is t y p ically r ep eated until the coun ters are confiden t that the problem is with the mac hin e coun t. Hand counts are sub ject to error, bu t they are the gold standard.) 3. Precincts are selected at random for p ost-election audit. Among the apparent losers, any candidate with at least as many rep orted v otes as the rest is an “app arent ru nner-up .” The apparent margin M is the difference b et w een the num b er of vote s rep orted for the apparent winn er(s) with the few est rep orted v otes and the n umb er of vot es r ep orted for an ap- paren t runner -u p. If more than f candidates hav e at least as man y rep orted v otes as th e apparent top f candidates, M = 0 : the con test is app arently tied for the last winning place. More p recisely , let V k b e the total num b er of vo tes rep orted for candidate k , k = 1 , . . . , K . Let ( V ( k ) ) K k =1 b e the v otes ( V k ) K k =1 in rank order, so that V (1) ≥ V (2) ≥ · · · ≥ V ( K ) . Then M = V ( f ) − V ( f +1) . If # { k : V k ≥ V ( f ) } > f , M = 0 and the conte st is a tie. As discuss ed in Section 3.1 , some subsets of apparen t losers (and un- derv otes and inv alid ballots) can b e p o oled to form a smaller num b er of “pseudo-candidates.” P o oling can reduce the sample size needed to con- firm the election. After p o oling, there remain K cand id ates and pseudo- candidates, n umb ered 1 through K . 3. T esting the election ou tcome. The appr oac h to testing whether the apparen t election outcome is wrong is as follo ws: 1. Select a test statistic. 12 11 Some states ha ve “voter in tent” la ws: the p eople conducting the h and tally try to determine what the voter intended, even if a machine could not. So, for example, a ballot that had a mark for George W ashington and also had George W ashington as a write-in candidate wo uld b e an overv ote according t o the machine, bu t a human might infer that the voter intended to vote for George W ashington. This p ap er assumes that rules are in place for determining whether that is a v alid vote. 12 In principle, the choice could b e optimized to maximize p o wer agai nst some alter- natives. In practice, the metho d must b e tran sp aren t, easy for th e public to un derstand, 8 P . B. ST AR K 2. Select a sampling design and an incr easing sequence of sample sizes ( n s ). 13 Select a corresp onding sequence of significance levels ( α s ) that giv e a level- α test ov erall. 14 3. Set s = 1. Set the initial s ample to b e the empt y set. 4. Augmen t the curren t sample by a random sample so that it cont ains n s precincts in all. 5. T ally the vot es in the new pr ecincts b y h and. 6. Calculate the test statistic and the maxim um P -v alue for the test statistic o ve r all wa ys of allo cating error among th e precincts that w ould result in a d ifferent election outcome. 7. If the maxim um P -v alue is less than α s , confirm the apparen t outcome. Otherwise, increment s and r eturn to step 4, u nless all N precincts h av e no w b een hand tallied. If all precincts hav e b een hand tallied, confirm the outcome the h and tally sho ws. 3.1. Mar gi nal notes. Example 1. Consider a winn er-tak e-all ( f = 1) cont est with K = 2 can- didates. The rep orted vot e for th e apparent winner is V 1 = 1 , 000 vo tes, and the rep orted vo te for the apparen t loser is V 2 = 500 v otes. The margin is M = 1 , 000 − 500 = 500 v otes. It is p ossible that b oth candidates actually had 750 vo tes an d the apparent margin was pro duced by miscounting 250 ballots w ith vo tes for the app aren t loser as vot es for th e apparent winner: if th e apparent winner’s v ote total w as h igh by 250 and the app aren t loser’s vote total w as low b y 250, that could h a ve turned a tie into the apparent margin. Alternativ ely , if 500 ap- paren t undervotes were miscounte d as vote s for the apparent win n er, that could ha ve turned a tie into the app aren t margin. Or if 500 ballots with v otes for the apparent loser had b een o ve rlo ok ed on election d a y , th at could ha ve turned a tie in to the apparen t m argin. Or if 500 ballots with v otes for the apparent win ner had b een double-coun ted on election day , that could easy for elections officials to implement, and easy to v erify or rep licate. Here I use the maximum of functions of th e amount by which error in eac h precinct in the sample could hav e inflated the margin, after p ooling subsets of losers as describ ed in Section 3.1 . This leads to simple probability calculations. See Section 3.3 . 13 W e might increase the sample size by a fixed number of precincts at each stage, su ch as ⌈ 0 . 02 N ⌉ . O r we might increment the sample by the smallest num b er of precincts such that, if the test statistic did not increase from its cu rrent va lue, w e w ould confirm the outcome. The only requirement is that n s +1 − n s ≥ 1. 14 F or example, α s ≡ α/ 2 s , s = 1 , . . . . A lternativel y , if the sequence of sample sizes ( n s ) guaran tees that by stage S all N precincts will b e in the sample, we could take α s = α/S . These c h oices just use Bonferroni’s ineq ualit y ; one could do b ett er u sing metho ds from sequential analysis. See Section 6.4 . CONSER V A TIVE ELECTION AUDITS 9 ha ve tu rned a tie into the app arent margin. I f 100 vot es for the app aren t loser had b een miscount ed as under votes, 100 h ad b een miscoun ted as v otes for the apparen t winner, 100 vot es for the apparen t loser h ad b een o v er- lo ok ed en tirely , and 100 nonexisten t ballots h ad b een coun ted as v otes for the apparent win ner, that could h a ve tur n ed a tie int o the apparen t margin. (Net, the rep orted v ote totals w ould ha ve b een off b y 200 for the apparent winner an d 300 for th e apparent loser, 500 vote s in all.) But if the o ve rcoun t for th e apparent win ner plus the undercount for the apparent loser is less than 500 vote s in all, the apparent winner m ust b e the tr u e winn er. More generally , supp ose there are K candidates in all, un d erv otes, inv alid ballots and o verlook ed ballots. (A negativ e n u m b er of ballots could b e ov er- lo ok ed, corresp onding to ov ercounting real ballots or coun ting nonexistent ballots.) An error that increases the coun t for an y of the apparent win ners b y 1 vote increases the apparen t margin by at most 1 v ote. An error that decreases the coun t for any of th e apparent losers by 1 vo te increases the apparen t m argin by at most 1 vo te. Conv ersely , err ors that decrease the coun t for any apparen t winner or that increase th e coun t for any apparen t loser might d ecrease the apparent margin, b ut cannot increase the apparent margin. Miscounting a vo te for an apparen t loser as a v ote for an app arent winner could affect inflate the apparent margin by as muc h as 2 vo tes (or p ossibly 0 or 1). Miscoun tin g an un d erv ote as a vo te for one of the apparent winners could increase the apparent margin b y as muc h as 1 vo te. Overlook- ing a v alid vote for on e of the losers could increase the apparent margin by as muc h as 1 vote . Errors in the n umb er of un derv otes or inv alid ballots d o not by themselv es affect the margin. In su mmary , the amount by whic h error could h a ve artificially inflated the apparen t margin is at m ost the total o v ercoun t for all the apparen t winners, plus the total un dercount for all the apparen t losers. Let v k p b e the rep orted n um b er of v otes for candidate k in precinct p , V k = P p ∈N v k p b e the total n u m b er of rep orted v otes for candid ate k , a k p b e the actual num b er of v otes for candid ate k in p recinct p , and A k = P p ∈N a k p b e the actual total num b er of v otes for candidate k . L et K w denote the indices of the candid ates who are apparent o veral l winn ers of the race (so # K w = f ) and let K ℓ denote the ind ices of the cand idates who are app aren t losers. F or real z , d efine z + ≡ z ∨ 0. Th e p otential mar gin overstatement in pr e cinct p is e p ≡ X k ∈K w ( v k p − a k p ) + + X k ∈K ℓ ( a k p − v k p ) + . (1) The total p otential mar gin overstat ement is E ≡ P p ∈N e p . Th e ne t p otential mar gin overstatement in the ele ction is E ≡ X k ∈K w ( V k − A k ) + + X k ∈K ℓ ( A k − V k ) + . (2) 10 P . B. ST AR K W e kno w that M = ^ k ∈K w V k − _ k ∈K ℓ V k . (3) Th us, ^ k ∈K w A k − _ k ∈K ℓ A k ≥ ^ k ∈K w V k − _ k ∈K w ( V k − A k ) + ! − _ k ∈K ℓ V k + _ k ∈K ℓ ( A k − V k ) + ! ≥ ^ k ∈K w V k − X k ∈K w ( V k − A k ) + ! (4) − _ k ∈K ℓ V k + X k ∈K ℓ ( A k − V k ) + ! = M − X k ∈K w ( V k − A k ) + − X k ∈K ℓ ( A k − V k ) + = M − E . So, th e apparent set of winners must b e the true set of winners if E < M . (5) By the triangle inequ alit y , E ≡ X k ∈K w ( V k − A k ) + + X k ∈K ℓ ( A k − V k ) + = X k ∈K w X p ∈N ( v k p − a k p ) ! + + X k ∈K ℓ X p ∈N ( a k p − v k p ) ! + ≤ X k ∈K w X p ∈N ( v k p − a k p ) + + X k ∈K ℓ X p ∈N ( a k p − v k p ) + (6) = X p ∈N X k ∈K w ( v k p − a k p ) + + X k ∈K ℓ ( a k p − v k p ) + ! = X p ∈N e p ≡ E . Hence, the app aren t outcome m u st b e th e same that a full manual recount w ould show if E < M . Our test is b ased on this condition. F or a sharp er sufficien t condition, see Stark ( 200 8b ). CONSER V A TIVE ELECTION AUDITS 11 Example 2. Consider a winn er-tak e-all ( f = 1) cont est with K = 4 can- didates. Th e rep orted vote totals are V 1 = 800 v otes, V 2 = 500 votes, V 3 = 150 v otes, and V 4 = 50 vot es. The margin is M = 800 − 500 = 300 v otes. The r e- p orted win ner might n ot b e the real winner if 150 vote s for cand id ate 2 had b een miscoun ted as v otes for candid ate 1, pro du cing a net p otent ial margin o ve rstatemen t in the election of 300 v otes; then candidates 1 and 2 might ha ve b een tied. Candidate 3 could not ha v e b een the winn er unless the n et p oten tial margin ov erstatemen t in the elect ion is more than 650 v otes, and candidate 4 could not h a ve b een the win ner unless the net p oten tial m argin o ve rstatemen t in the election is more than 750 v otes. The apparen t winner m u st b e the true winner if E < M . Example 3. W hat if, in Examp le 2 , w e pretend that candid ates 3 and 4 are a single “pseudo-candidate” with 150 + 50 = 200 rep orted vo tes? Then K = 3 (pseud o-)candidates, with V 1 = 800 vot es, V 2 = 500 vot es, and V 3 = 200 vote s. Candid ate 1 must b e the true winner if th e net p otent ial mar- gin ov erstatemen t in the election for cand idate 1, candidate 2 an d pseu do- candidate 3 is less than M = 300 votes. If p seudo-candidate 3 could n ot ha ve b een th e winner, then n either the original candidate 3 nor the original candi- date 4 could h av e b een the w inner, b ecause the pseud o-candidate gets all the v otes for b oth of th em—at least as many vo tes as either gets separately . The apparen t winner must b e the true w inner if E < M , w ith E measur ed for the th ree pseud o-candidates who remain after p o oling candidates 3 and 4. P o oling candidates 3 and 4 in to a single p seudo-candidate tends to resu lt in a more p ow erful test, b ecause e p , the p otent ial margin o v erstatemen t in precinct p , then ignores errors that d o n ot c hange the n u m b er of v otes for the p seudo-candidate, suc h as counting a v ote for candidate 3 as a vote for candidate 4 or vice versa. Suc h errors cannot su ffice to change the outcome of the election. F or the outcome to b e wrong, i n addition to errors that redis- tribute v otes among the candidates who are p o oled together, it is necessary that E ≥ M . If w e p o ol candidates 2 and 3 in to a single p seudo-candidate with 500 + 150 = 650 v otes, the margin b et ween th e apparen t win ner an d that pseudo- candidate is only 150 v otes. Pr o vided the total p oten tial m argin o v erstate- men t measured for candid ate 1, the p s eudo-candidate and candidate 4 is less than 150 v otes, candid ate 1 m u st b e the real win n er. The sufficient condition for the outcome to b e righ t has c hanged: we need E < 150 < M . In effect, w e need to test usin g a sm aller m argin, the margin b et w een the winner and the pseudo-candidate—the runn er -u p after p o oling. That could r esult in a less p o w erf u l test, so we will a voi d it. W e cannot p o ol candidates whose total v ote is greater than or equal to the vote for any of the apparent w in ners, b ecause then the outcome of the con test could b e wrong even if E = 0 . 12 P . B. ST AR K Example 4. Supp ose that the con test allo ws v otes for up to tw o out of three candid ates, so f = 2 and K = 3. S upp ose that a full h and recoun t w ould sho w th at candidate 1 got 1,000 v otes, candidate 2 got 500 vote s , candid ate 3 got 500 vot es, there were 250 und erv otes and there were 250 o verv oted ballots. Let M ≤ 500. Miscoun ting M / 2 of the vo tes f or candidate 3 as v otes for candidate 2 would pro d uce an apparent margin of M b et ween them, with net p oten tial m argin o ve rstatemen t in the election of M . Miscoun tin g M of the ov erv oted ballots as one v ote eac h for candid ate 1 and candidate 2 would pro du ce an apparent margin of M v otes b et w een candidates 2 and 3, with net p otentia l margin ov ers tatemen t in the election of 2 M . F ailing to count M of th e v otes cast for candidate 3 would pro d u ce an apparent margin of M v otes b etw een candidates 2 and 3, with a n et p oten tial margin o v erstatement in the election of M . Th e outcome m ust b e correct if E < M . Example 5. Finally , consid er an example with f = 2, u ndervot es and o ve rv otes and p o oling. Th ere are four candidates on the ballot, plus tw o write-in candidates. The rep orted vo tes are as follo w s : 500 v otes for the apparen t ov erall winner; 400 vote s for the apparent second-place win ner; 300 vo tes f or the apparent runn er -u p (the loser with the most v otes); 100 vot es for the apparent fourth -place candidate listed on the b allot; 5 v otes for eac h of the t wo write-ins; 50 undervote s and 50 inv alid ballots. Th e margin is M = 400 − 300 = 100 v otes. If we p o ol th e four th-place candidate, the write- ins, the u ndervot es and f times the o verv otes in to a single pseudo-candidate, that pseudo-candidate would h a ve 100 + 5 + 5 + 50 + 2 × 50 = 260 vote s, f ewer than th e runn er -u p. So, we can tak e K = 4 candidates, corresp ondin g to th e apparen t o v erall winner, the apparent second winner, the run ner-up and the pseudo-candidate. The outcome of th e electi on cannot b e wr ong u nless the net p oten tial margin ov er s tatemen t in the elect ion, measured for those four (pseudo-)candidates, is at least M = 100 vote s. The total p oten tial margin o ve rstatemen t E would ha v e to b e greater than 140 v otes f or the pseud o- candidate to b e one of the winn ers, and if the pseudo-candidate is not a winner, neither the app aren t f ourth-place candidate nor any of the write-ins could b e winners. Hence, the outcome m u st b e correct if E < M . W e shall adopt the follo wing ru le for p o oling: Pooling rule. P o ol the losers in to groups so that no group has more v otes than th e run n er-up, bu t the group with the fewest v otes h as as many v otes as p ossible. Other p o oling rules mak e sense to o, for example, “p o ol the losers in to as few groups as p ossible suc h that no group has m ore vote s than the run ner- up.” An y suc h p o oling rule ignores many errors th at b y themselv es cann ot CONSER V A TIVE ELECTION AUDITS 13 affect th e outcome of the conte st, but still the apparen t winners must b e the true winn ers if the tota l p oten tial margin o verstate men t E < M . If a pseudo-candidate cannot b e the win ner, then neither can an y of the real candidates wh o we re p o oled to form the pseud o-candidate. The v alue of K is the n u m b er of candidates and pseudo-candidates that remain after p o oling. It is n ot necessary to p o ol—the test d ev elop ed b elo w is conserv ativ e ev en without p o oling—bu t p o oling yields a more p o werful test. Ho w ever, compare with Stark ( 200 8b ). 3.2. Bounding the p otential mar gin overstatement in e ach pr e cinct. If the p oten tial margin o ve rstatemen t in individual precincts can b e large com- pared to the margin, it will tak e a large sample to provide comp elling evi- dence that E < M , b ecause an outcome-c hanging error could hid e in a small n u m b er of pr ecincts. Miscoun t that could affect the outcome of an election is easier to detect if it m us t b e spread ov er many precincts. By ho w muc h can er r or in p recinct p inflate the apparent margin? W e need an upp er b oun d u p for the p otent ial margin o versta temen t e p in p recinct p . The smaller th e v alues u = ( u p ) N p =1 are, the larger the num b er of p recincts that m ust b e “tain ted” to hav e E > M , and so the easier it is to detect an electio n-altering amount of error. I f any n u m b er of ballots could b e o ve r- lo ok ed or o v ercounted on election da y , there is no finite b oun d u p for e p . Some stud ies assume that if the discrepancy in any pr ecinct exceeds, sa y , 40% of the vote s rep orted in p recinct p [ Saltman ( 1975 ) and McCarthy et al. ( 2008 )] or 40% of the ballots r ep orted in precinct p , including undervotes and inv alid ballots [ Dopp and Stenger ( 2006 )], th at would b e detected ev en without an audit. (If vote s on 20% of the ballots had b een “flipp ed” to an apparen t w inner fr om an apparent loser, th at would pr o duce a p otentia l margin ov ers tatement e p of 40% of the ballots.) That is, th e s tudies tak e u p = 0 . 4 b p . This could b e reasonable in some circumstances, b u t it is hard to ju s tify . Supp ose we kno w a num b er r p ≥ 0 so that the actual total vote satisfies P k ∈K a k p ≤ r p . F or example, in precincts th at use optical scan ballots, the total num b er of v otes can b e no larger than f times the num b er of ballots deliv ered to th e precinct, so that could serve as r p . The num b er of v otes cast in a precinct can b e n o larger than f times the num b er of v oters registered in a pr ecinct, includin g same-da y r egistrations (if the jurisdiction allo w s th em), so that could serve as r p . A count of signatures in a pr ecinct p ollb o ok, times f , migh t pr o vide a v alue for r p , although o ccasionally someone migh t vot e without signing in . In some jurisd ictions, elections officials c h ec k the num b er of vote d, sp oiled and unv oted ballots in eve ry precinct against the num b er of ballots sent to and retur n ed from the precinct. The num b er of vote d ballots according to suc h an “acc oun tin g of ballots,” times f , could serv e as r p . 14 P . B. ST AR K If P k ∈K a k p ≤ r p , it is imp ossible for e p to exceed e + p ( r p ) ≡ max x ∈ R K : x ≥ 0 , P k ∈K x k ≤ r p ( X k ∈K w ( v k p − x k ) + + X k ∈K ℓ ( x k − v k p ) + ) (7) = r p + X k ∈K w v k p − ^ k ∈K ℓ v k p . These b ounds supp ose that ev ery one of the r p p ossible v alid vot es in pr ecinct p migh t in fact ha ve b een a v ote for the apparent loser k ∈ K ℓ with the few est rep orted v otes in precinct p . Let e + ( r ) denote the N -v ector w ith comp onent s ( e + ( r )) p = e + p ( r p ), p ∈ N . Note that if some app aren t loser k ∈ K ℓ gets no vote s in p recinct p , e + p ( r p ) tak es its m axim um p ossible v alue, r p + P k ∈K w v k p ; e + p ( r p ) gets smaller as the minimum n umb er of vote s an y apparen t loser gets in precinct p gets bigger. The p o oling ru le in Section 3.1 tends to m ak e V k ∈K ℓ v k p larger than it w ould b e without p o oling. This is another w ay p o oling helps, esp ecially in con tests with write-in candidates, b ecause often there are many p recincts in wh ic h some write-in candidate receiv es no v otes. Henceforth, u will b e a ve ctor of upp er b ounds f or e . Whether u is e + ( r ), 0 . 4 b or some other b ound d o es n ot matter for the rest of the mathematical dev elopment. 3.3. The test statistic. F or any x ∈ R N and J ⊂ N , d efine _ J x ≡ _ p ∈J x p (8) and X J x ≡ X p ∈J x p . (9) F or x, y ∈ R N , d efine x ∧ y to b e the v ector with comp onen ts ( x ∧ y ) p = x p ∧ y p , p ∈ N , (10) and x ∨ y to b e the vecto r with comp onen ts ( x ∨ y ) p = x p ∨ y p , p ∈ N . (11) Fix a set of monotonically increasing fun ctions w = ( w p ( · )) N p =1 and for x ∈ R N , d efi ne w ( x ) ≡ ( w p ( x p )) N p =1 . Let J ⋆ n b e a sim p le random sample of size n fr om N . The hypothesis test is based on the test statisti c _ J ⋆ n w ( e ) . (12) CONSER V A TIVE ELECTION AUDITS 15 The functions w = ( w p ( · )) quant ify our r elativ e tolerance f or err ors in d if- feren t precincts p ∈ N . All choic es of w y ield conserv ativ e tests, so w can b e c hosen at w ill. F or example, w e migh t c ho ose w p ( z ) = z . Th en ev ery error that could increase the apparent margin gets the same wei gh t. O r we migh t c ho ose w p ( z ) = z /b p ; then W J ⋆ n w ( e ) is the maxim u m p oten tial margin o ver- statemen t relativ e to the rep orted num b er of voti ng opp ortun ities in the precinct. W e migh t c h o ose w p ( z ) = z /u p ; then W J ⋆ n w ( e ) is the m aximum p oten tial o v erstatemen t of the margin as a f raction of the b ound on the margin ov ers tatement in the precinct. Or we migh t pick w p ( z ) to reflect the accuracy of th e vot ing tec hnology . F or example, we m igh t b e less tolerant of err or in precincts with direct-recording electronic (DRE) mac hines than w e are of error in p recincts with optically scanned ballots. 15 Then we migh t pic k w p ( · ) to gro w more rapidly for DRE p recincts th an for precincts th at use optically scanned b allots. Because p ost-election audits often find a mis- coun ted v ote or t wo, ev en in precincts w ith v ery few vote s, w eight functions of th e follo wing form can b e desirable: w p ( z ) = ( z − m ) + /b p , (13) with m on the order of 2 or 3. This function ignores p oten tial margin o ver- statemen ts of up to m v otes p er p recinct, and p enalizes larger p oten tial margin ov erstatemen ts in inv erse prop ortion to the size of the p recinct (here size is the rep orted num b er of v oting op p ortunities). That pr even ts an error in scann ing a s ingle b allot in a s mall precinct from making the test statistic large, but tak es into account the fact that we exp ect more discrepancies in larger pr ecincts, all other things b eing equal. 3.4. T ail pr ob abilities for the sample maximum. This section shows ho w to find P -v alues f or the hypothesis E ≥ M using the test s tatistic W J ⋆ n w ( e ) . W e ha ve a v ector u = ( u p ) N p =1 > 0 of upp er b ounds on the err ors ( e p ) N p =1 and a vect or of monotonically increasing fun ctions w = ( w p ) N p =1 . If the total p oten tial margin o ve rstatemen t E = P N e is big, if e ≤ u , and if the sample is b ig enough, it is un lik ely that W J ⋆ n w ( e ) w ill b e small. So, if the observed v alue of W J ⋆ n w ( e ) is “small enough,” th at is evidence that E = P N e < M — evidence th at a fu ll recoun t w ould find th e same outcome. This section makes the id ea pr ecise. 15 If a DRE is working correctly , it should record every vote p erfectly . In contrast, if a voter d oes not use an appropriate pen or p encil to fi ll in an optically scanned ballot, makes a stra y mark on the ballot or do es not fill in th e b u bble p erfectly , or if the scanner is miscalibrated, the op t ical scan could reasonably differ from a human’s inference ab out the voter’ s inten t [ Jefferson et al. ( 2007 )]. 16 P . B. ST AR K Let t ∈ R . Let the s ample size n < N b e fixed. Define X = X ( u, M ) ≡ ( x ∈ R N : x ≤ u and X N x ≥ M ) . (14) The set X con tains all wa ys of d istributing p otent ial margin o verstatemen ts across precincts that satisfy th e a p riori b ound e ≤ u and the null hyp othesis E ≥ M . T o reject the hyp othesis E = P N e ≥ M when w e observ e that W J ⋆ n w ( e ) = t , we need to kn o w that P { W J ⋆ n w ( x ) ≤ t } is small for all x ∈ X . Hence, w e seek π ⋆ ( t ) = π ⋆ ( t ; n, u, w , M ) ≡ max x ∈X ( u,M ) P x ( _ J ⋆ n w ( x ) ≤ t ) . (15) The related quan tity π ⋄ ( t ) = π ⋄ ( t ; n, u, w , M ) ≡ max x ∈X ( u,M ) P x ( _ J ⋄ n w ( x ) ≤ t ) , (16) where J ⋄ n is a rand om sample of size n with rep lacement from N , is us eful to b ound the P -v alue when the data come from a stratified sample. The individu al comp onen ts of e are nuisance p arameters: the null hy- p othesis in volv es only their sum, E = P N e , b ut the precinct-lev el p oten tial margin o v erstatemen ts { e p } affect the probab ility distribution of W J ⋆ n w ( e ) , the test statistic. Claim 1. Le t w − 1 ( t ) ≡ ( w − 1 p ( t )) N p =1 . Let J − k b e the set of ind ices of the k sm allest comp onent s of u − w − 1 ( t ). Let q = q ( t, u, w, M ) b e the largest in teger for w hic h X J − q u ∧ w − 1 ( t ) + X N \J − q u ≥ M (17) or q = 0 if there is no suc h in teger. Th en π ⋆ ( t ; n, u, w , M ) = 0 , q < n , q n N n , q ≥ n , (18) and π ⋄ ( t ; n, u, w , M ) = ( q / N ) n . (19) Claim 1 is pro ved in App endix A.1 . The follo w ing algorithm fin ds q iterativ ely: 1. Set J = N . 2. If J = ∅ or P J u ∧ w − 1 ( t ) + P N \J u ≥ M , q = # J . 3. Otherwise, let p ∈ J attain [ u p − ( u p ∧ w − 1 p ( t ))] = W J [ u − ( u ∧ w − 1 ( t ))]. (Ties can b e br ok en arbitrarily .) Remo ve p from J and return to step 2. CONSER V A TIVE ELECTION AUDITS 17 4. Putting it toge ther. 4.1. T esting using a simple r andom sample of pr e cincts. Su pp ose that the precincts for aud it w ill b e dra w n as a simple random sample. T o use the present metho d, d o the follo w ing: 1. Select an o verall significance lev el α and a sequence ( α s ) so that sequent ial tests at significance lev els α 1 , α 2 , . . . , giv e an o verall significance level no larger than α . F or example, w e migh t tak e α s ≡ α/ 2 s , s = 1 , 2 , . . . . 2. Group app aren t losing cand id ates usin g the p o oling r ule in Section 3.1 . 3. Set th e error b ounds u = e + . 4. Select a vecto r of monotonically in creasing functions w = ( w p ( · )) N p =1 . F or example, w p ( z ) = z , w p ( z ) = z /b p or w p ( z ) = ( z − 2) + /b p . 5. Compute the apparent margin M . 6. Select an initial sample size n 1 and a rule for selecting n s when the h y p othesis E ≥ M is not rejected at stage s − 1. 16 The only requiremen t is th at n 1 ≥ 0 and n s − n s − 1 ≥ 1. 7. Set s = 1, n 0 = 0 and J 0 = ∅ . 8. Dra w a ran d om sample J ⋆ n s − n s − 1 of size n s − n s − 1 from N \ J s − 1 . Set J s = J s − 1 ∪ J ⋆ n s − n s − 1 . C alculate W J s w ( e ) . 9. If π ⋆ ( W J s w ( e ); n s , u, w, M ) ≤ α s , confirm the outcome and stop. Oth er- wise, in cremen t s . 10. If n s < N , return to step 7. Otherw ise, audit any precincts not y et in the s amp le. Confi rm the outcome if the outcome was correct. If there is n ot a clear set of f winners (if M = 0), this w ill alwa ys escalate to a full manual tally . 4.2. T esting using str atifie d r andom samples of pr e cincts. Under current California la w, eac h coun ty dr a ws its o w n random s ample of 1% of precincts, at a min im um , for p ost-election aud its. (Eac h count y audits at least one precinct for eac h contest, and fractions are rounded up. Some coun ties vol- unt arily aud it ev en larger samples.) S imilarly , under Minnesota law, eac h coun ty draws its own random sample of 2, 3 or 4 p recincts for audit, de- p endin g on the num b er of registered v oters in the coun ty . T he samp les in differen t counties are drawn ind ep endently . Thus, for con tests that cross coun ty lines, the sample of precincts is a stratified rand om sample, not a simple rand om sample. This section presen ts t w o wa y s to com b ine ind e- p endent audits of differen t counti es conserv ativ ely . Both ha ve merits and shortcomings. Supp ose there are C countie s with precincts in the cont est. Let C ≡ { 1 , . . . , C } . Let E c b e the total p otent ial margin o ve rstatemen t in count y 16 Section 4.3 d iscusses selecting n 1 . See footnote 12 for ap p roac h es to selecting n s . 18 P . B. ST AR K c ∈ C , so E = P c ∈C E c . Let N c b e the num b er of precincts in the con test in coun ty c ∈ C , so N = P c ∈C N c . Let B c b e the n um b er of v oting opp ortun ities in the con test in count y c ∈ C , so B = P c ∈C B c . 4.2.1. Bounds f r om pr op ortional sampling with r eplac ement. Fix n s > 0. Let n cs ≡ ⌈ n s N c / N ⌉ . (20) (Note P c ∈C n cs ≡ n ′ s ≥ n s .) Claim 2. Supp ose n cs precincts are d ra w n at r andom without replace- men t fr om coun ty c , indep enden tly f or all c ∈ C . If there are k pr ecincts among the N in the con test for w hic h w p ( e p ) ≤ t , the c h ance that n on e of the pr ecincts in an y of the C samples has w p ( e p ) > t is at most ( k/ N ) n s . This is pro ved in App endix A.2 . Essen tially , for fi nding at least one precinct with w p ( e p ) > t , stratified sampling without replacemen t is more effectiv e th an stratified samp ling with replacemen t, whic h is at least as effectiv e as u n stratified sampling with re- placemen t if the stratum samp le sizes are { n cs } . So if w e dr a w a sample of size n cs [equation ( 20 )] from coun ty c , in d ep endently for eac h c ∈ C , π ⋄ is an upp er b ound on the maximum P -v alue. This approac h computes p robabili- ties as if the sample were dra wn with r eplacemen t from the en tire p opulation of N precincts in the con test, but allo cates the sample in p rop ortion to the n u m b er of p recincts in eac h count y . (F r actions are r ou n ded u p, so the actual sample size could b e up to C precincts larger than the sample size n s used in the probabilit y calculations.) If N is large r elativ e to the ov erall margin, this metho d leads to a sample size that is not m u c h larger than requir ed if the sample were a simp le r andom sample from all N pr ecincts in the con test. Eac h coun t y do es a “fair share” of the aud iting—the num b er of precincts a count y audits is prop ortional to the num b er of pr ecincts in the conte st in that coun t y , but for round off. Ho we v er, if the null h yp othesis is n ot rejected at stage s , the sample w ill need to b e expand ed in ev ery coun ty in the con test (but for roun doff ). More- o ve r, w hether suc h an exp an s ion is needed d ep ends on the audit r esults from all coun ties in th e con test, so count y audit schedules are in terdep enden t. In con trast, the appr oac h in the next subsection t yp ically r equ ires auditing more precincts, b ut the aud its in differen t coun ties are logistically indep en - den t: whether th e audit in a given count y n eeds to b e exp anded dep ends on the aud it results in that coun t y alone. Stanislevic ( 2006 ) mak es a claim that implies that the p robabilit y that none of the precincts in the stratified sample has w p ( e p ) > t is at most CONSER V A TIVE ELECTION AUDITS 19 k n s / N n s , that is, stratification using sample sizes { n cs } can only help. If that conjecture w ere true, one could calculate the maxim u m P -v alue using π ⋆ instead of π ⋄ , and th e sample size would b e at most C pr ecincts larger than that required for a s imple random sample from the N precincts in the con test. The conjecture is false, but seems to b e “almost tru e.” 17 Note that th is approac h can b e us ed to fin d a conserv ativ e P -v alue for any set of sample s izes n cs b y pretendin g th at the o verall sample size corresp onds to the s m allest samp lin g fraction n cs / N c ; that is, that th e data came fr om a sample of size n s = ⌊ N V c ∈C ( n cs / N c ) ⌋ drawn with replacemen t from the p opulation of N p r ecincts. If the samplin g fractions v ary widely b y coun t y , this can b e extremely conserv ative . See Section 5.2 for an illustration. 4.2.2. Bounds fr om indep endent tests i n every c ounty. Claim 3. There must b e at least one coun ty c ∈ C for which E c B c ≥ E B . This is pro ved in App endix A.3 . Supp ose w e test in eac h coun ty at significance lev el α whether E c ≥ M B c /B . Let R c b e the ev ent that th e test in coun ty c rejects the hyp othesis E c ≥ M B c /B . Th en, if in at least one count y E c ≥ M B c /B , Pr \ c ∈C R c ! ≤ ^ c ∈C Pr( R c ) ≤ α (22) 17 Stanislevic [p ersonal communicatio n (2007)] notes that there are counterexamples, but his numeric al experiments suggest that increasing n by one restores the inequality . Moreo ver, h e claims t hat the inequality fails only when the counties h a ve equal size and k and n s are divisible by C , and that when the inequality fails, k n N n < k/C n/C N/C n/C C . (21) That is, taint is hardest to detect when the coun t ies are th e same size and h ave t h e same num b er of tain ted precincts. Here is an example: T ake N = 100, n s = 80, k = 98 , C = 2, N 1 = N 2 = 50 , n 1 s = n 2 s = ⌈ 80 / 2 ⌉ = 40, k 1 = k 2 = 49 (i.e., one heavily tainted precinct in each coun ty). Then the chance a simple random sample of size 80 from the 100 p recincts con t ains neither of the tw o h eavily tain ted precincts is ( 98 80 ) ( 100 80 ) = 3 . 8%, bu t the c hance that a stratified rand om sample that dra ws 40 precincts from each of the tw o counties without replacement contains neither of the tw o h ea vily tainted precincts is ( ( 49 40 ) ( 50 40 ) ) 2 = 4%. In t h is case, the chance of finding a heavily t ainted precinct is less for the stratified random sample t h an for th e simple random sample: stratification can hurt. (If b oth heavily tainted precincts are in th e same county , stratification h elps.) If n s is increased to 81 so th at n cs = 41 precincts are draw n from each county , then stratification helps. The situation with stratification is rather d elicate. 20 P . B. ST AR K (the probabilit y of an in tersection of ev ents is no greater than the smallest of the ev ent probabilities). Thus, if the total error E across precincts is M or greate r, the c h ance that w e conclude at significance lev el α that E c < M B c /B in ev ery one of th e C coun ties is at most α ov erall, and t ypically rather less. 18 This approac h can b e quite conserv ativ e. When the coun ties all con tain man y p recincts in the cont est and the margin is large, the o veral l sample size will tend to b e ab out C times larger than wo uld b e required if the sample w ere drawn without stratification. This wastes resources. Ho we v er, the app roac h h as some logistical adv an tages. The apparent mar- gin d ep ends on results in every coun t y in v olve d in the con test, so there must b e comm u nication among countie s b efore the aud it can b egin. Bu t, unlik e the previous metho d, err ors detected in one count y do not require an y other coun ty to increase its sample size, and the audit p ro cess do es not require co op eration or comm u nication among count ies. 4.3. “ F ault-toler ant ” initial sample size. The pro cedure can start with an y initial sample size n 1 ≥ 0. How ev er, if the in itial sample size n 1 is to o small, w e will not b e able to r eject the hypothesis E ≥ M on the basis of the initial sample ev en if it shows no miscount whatso eve r. Audit samples often sh o w small miscounts. W e can determine an initial sample size n 1 so that we can confir m the outcome without expand ing th e sample, p ro vid ed th e p otent ial margin o ver- statemen t found in the initial samp le is sufficien tly small. F or example, sup - p ose we w ould lik e to b e able to confir m the outcome as long as th e test statistic ev aluated f or the initial s amp le is no greater than t 1 . If w e c ho ose n 1 = arg m in n> 0 { n : π ⋆ ( t 1 , n, u, w , M ) < α 1 } , (23) then if W J ⋆ n 1 w ( e ) ≤ t 1 , w e can confir m the outcome w ithout expanding the sample. S ection 5.1 gives an example of this calculation. If we are drawing a stratified random sample, w e n eed to fi nd an ini- tial sample size n 1 c for eac h count y c . F or the app roac h to stratification in Section 4.2.1 , we can tak e n 1 c = ⌈ n 1 N c / N ⌉ , w ith n 1 = arg min n> 0 { n : π ⋄ ( t 1 , n, u, w , M ) < α 1 } . (24) 18 Dopp and Stenger hav e asserted that to aud it contests that span more than on e count y , one should set the sample size u sing the smaller of the count y or state margin [ Dopp and Sten ger ( 2006 )]. T o the b est of my k now ledge, they hav e not inves tigated the effect that has on confi dence in the outcome of the election, and gav e no pro of that it results in a conserva tive test. This pro of shows that if one uses the ov erall margin—scaled by the number of b allots voted in the contest in the count y in question—th e result is conserv ative. CONSER V A TIVE ELECTION AUDITS 21 F or the appr oac h to s tr atificatio n in S ection 4.2.2 , the calculation is more complex. Let u c denote the vec tor of precinct error b ounds for count y c and let w c denote the vec tor of precinct weig h t fu nctions for coun ty c . If the initial sample size for coun t y c is chosen to b e n 1 c = arg min n> 0 { n : π ⋆ ( t 1 , n, u c , w c , ⌊ M B c /B ⌋ ) < α 1 } , (25) w e will not ha ve to expand the audit in count y c , pro vided th e test statistic for the initial sample is no greater than t 1 . 5. Examples. This section giv es examples of calculating P -v alues for the h y p othesis that the app aren t outcome of an election is wr ong. It do es not giv e examples of expandin g the s amp le size sequent ially: data requ ired for those compu tations are not av ailable. 5.1. Novemb er 2006 Sausalito Marin City scho ol b o ar d r ac e. The No vem- b er 2006 school b oard race for the Sausalito Marin Cit y S c ho ol District in Marin C oun t y , California inv olv ed n ine p recincts. V oters could vot e for thr ee of five candidates or a w rite-in. T able 2 lists v ote totals by p recinct for eac h candidate. Absen tee and p olling-place v otes were com bined . The winning candidate with the few est v otes wa s Mark T rotter, with 2022 v otes. Th e losing candid ate with the most v otes was George Strati- gos, with 1936 v otes. The margin b et ween the t wo w as 2022 − 1936 = 86 v otes—an extremely narrow margin of 0 . 57% of the 15,000 p ossible v otes. If 43 vo tes for Stratigos had b een a wa rded erroneously to T rotter, that would ha ve sufficed to change a tie (1979 vo tes eac h ) in to a win for T rotter, with a net p oten tial margin o v ers tatemen t in the election of 86. Any other c hange to the set of w inners would h a ve requ ir ed a larger p oten tial margin o v er- statemen t. Th us, if w e can reject the h yp othesis that th e total p oten tial margin o v erstatemen t is greate r than or equal to 86 vote s, w e can conclude that the outcome of the election w as correct. Ev ery unexercised opp ortunit y to v ote counts as an undervote . In this example, a ballot can contribute up to thr ee un derv otes: the n u m b er of un- derv otes on a ballot is 3 min us the num b er of candidates v oted f or, provided the n u m b er vote d for is no greater than three. If a v oter marke d the ballot for more than thr ee candidates, the b allot con tributes o ve rv otes. W e shall tak e w p ( z ) = z /b p , so th at the test statistic is the maximum p oten tial margin o v erstatemen t as a fraction of the v oting opp ortunities in eac h precinct in th e sample. Note that the num b er of vot es for write-ins plus the n um b er of vo tes for P eter C. Romano wsky is less than the num b er of v otes for the ru n ner-up, George T. Stratigos, b ut the num b er of undervotes plus three times the n u m b er of inv alid ballots is greater than the num b er of v otes for S tratigos. T h erefore, w rite-ins can b e p o oled with eac h other 22 P . B. ST AR K T able 2 V ote totals by pr e cinct for the Novemb er 2006 Sausalito Marin City Scho ol Bo ar d r ac e. V oters c ould vote for up to thr e e c andidates. The numb er of undervotes i s thr e e tim es the numb er of b al lots, minus the total numb er of votes for c andidates, ignoring b al lots showing votes f or mor e than thr e e c andidates (overvote d b al lots). C olumn 9, “ votes, ” is the total numb er of voting opp ortunities, thr e e tim es the numb er of b al lots. Ther e wer e 5,000 b al lots, includi ng two overvote d b al lots, one i n pr e cinct 3104 and one in pr e cinct 3601. The p ost-ele ction audit examine d al l the b al lots in pr e cinct 3107 and found a discr ep ancy of one vote. The discr ep ancy was due to op er ator err or; r e-sc anning the b al lots eliminate d the discr ep ancy. [E. Ginnold, R e gistr ar of V oters, Marin County, California, p ersonal c ommunic ation (2007).] Data c ourtesy of E. Ginnold and M. Briones Undervotes + 3 × Precinct o v er v otes Th ornto n Hoyt T rotter S tratigos Romano wsky W rite-ins V otes 3001 780 296 309 283 271 60 5 2004 3002 920 311 287 274 291 44 3 2130 3104 699 238 244 240 225 48 4 1698 3105 765 270 262 240 228 56 3 1824 3106 668 239 267 294 209 58 5 1740 3107 732 251 260 236 214 53 3 1749 3600 582 235 233 129 186 51 6 1422 3601 367 234 178 126 170 40 7 1122 3602 610 160 155 200 142 39 5 1311 T otal 6123 2 234 2195 2022 1936 449 41 15,000 and with Romano w sky , b ut undervotes and inv alid ballots are treated as a separate candidate, as describ ed in Section 3.1 . T able 3 giv es thr ee p oten tial margin o v erstatemen t b ound s: the a p riori b ound s e + based on p o oling write-in candidates only , e + based on p o oling write-in candid ates and Romanowsky , and ⌈ 0 . 4 b ⌉ , 40% of the vote s, including undervote s and three times the ov erv otes, round ed up to the n ext integ er . F or all three b oun ds, any of the nine precincts could h arb or enough miscount to change the apparen t outcome of the election. Supp ose we w ant to design an initial samp le size so that, pr o vided the maxim um p oten tial margin o v erstatemen t in any pr ecinct in the sample is no more than 0 . 2% of the v otes r ep orted in that pr ecinct (includ ing u ndervo tes and thr ee times the ov ervote s), we w ould reject the hyp othesis that the wrong set of winners was named at significance lev el 0 . 01 (w e wo uld confirm the outcome at “confiden ce lev el” 99%). That corresp onds to r ejecting the h y p othesis when W J ⋆ n w ( e ) ≤ 0 . 002. Note th at 0 . 002 × 15 , 000 = 30 < 86, s o at least one p recinct must h a ve more than this bac kground lev el of error (0 . 2%) for the outcome of the election to b e wrong. CONSER V A TIVE ELECTION AUDITS 23 T able 3 Thr e e p ossible b ounds on the p otential mar gin overstatement i n e ach pr e cinct. The b ound e + ( b ) i s define d i n e quation ( 7 ). Column 2 p o ols the write-in c andi dates i n c omputing e + . Column 3 p o ols the write-in c andidates and Peter C. R om anowsky in c omputing e + , which le ads to smal ler b ounds on the err or; se e Se ction 3.1 . The b ound ⌈ 0 . 4 b ⌉ i s 40% of the r ep orte d voting opp ortunities in the pr e cinct, r ounde d up to the next inte ger. This is analo gous to the maxim um wi thin-pr e cinct err or b ounds use d by Saltman ( 1975 ), Dopp and Stenger ( 2006 ) and McCarthy et al. ( 2008 ) e + ( b ) write-in s e + ( b ) write-in s & Precinct p ooled Romanowsk y p o oled ⌈ 0 . 4 b ⌉ 3001 2887 2827 802 3002 2999 2955 852 3104 2416 2368 680 3105 2593 2537 730 3106 2535 2477 696 3107 2493 2440 700 3600 2013 1962 569 3601 1653 1613 449 3602 1821 1782 525 Th us, π ⋆ (0 . 002 , n, u, w , 86) = 8 n 9 n . (26) Enough miscoun t to change the outcome could lur k in a single precinct. Supp ose that j ust one precinct had miscoun t, and th at the miscount w as enough to change the outcome of the election. Then eve n if we au d ited 8 of the 9 precincts at random, there is a one-in-nine c hance that w e would fail to audit that precinct. So, to hav e 99% confid ence in the outcome of this race if the observe d p otentia l margin o v erstatemen t w ere at most 0 . 2% of the v otes (including undervo tes and ov erv otes) in an y precinct, we wo uld ha ve to audit every pr ecinct. That is bad n ews, but sin ce the margin is only 0 . 57% of the p ossible vo tes, it is n ot su r prising. In fact, one precinct was audited (precinct 3107) and it was foun d to con- tain one error. W e sh all presu me that this err or fa vored one of the apparent winners. The num b er of v otes in precinct 3107 is 1749, so this corresp onds to a test statistic v alue W J ⋆ 1 w ( e ) = 1 / 1749 = 0 . 000 57. On the basis of this audit, the maximum P -v alue of the hypothesis that the wrong set of thr ee candidates was declared the winner is π ⋆ (0 . 000 57 , 1 , u, w, 86 ) = 8 1 9 1 = 88 . 9% . (27) So, eve n if there were enough miscount in the aggreg ate to cause the apparent set of winner s to d iffer f rom the true set of win ners, the chance that an audit 24 P . B. ST AR K of one precinct w ould sho w w p ( e p ) ≤ 0 . 00057 could b e as large as 88.9%, dep end in g on ho w the miscount is distributed across precincts. Con versely , what would we ha ve to b eliev e ab out the error f or these audit data to yield a P -v alue of 1% or less? F or a random sample of just one of the nine pr ecincts to hav e at most a 1% c hance of ha vin g W J ⋆ 1 w ( e ) ≤ 0 . 00057, all nine precincts w ould ha ve to hav e w p ( e p ) > 0 . 00057. F or an electio n-altering discrepancy to requ ire w p ( e p ) > 0 . 000 57 in ev ery p recinct corresp onds to a b ound u = 0 . 0057 b . Unless we b eliev e that a p oten tial margin ov erstatemen t of more than 0 . 0057% of the vote s is either im p ossible or certain to b e detected without an aud it, w e could n ot p ossibly get 99% confi d ence in the outcome of this r ace by aud iting only one pr ecinct. 5.2. Novemb er 2006 Mi nnesota U.S. Senate r ac e. T his section examines the No v emb er 2006 Senate r ace in Minnesota. Minnesota has 87 counti es with a total of 4,123 precincts, of wh ich 202 w ere audited after th e election. T able 4 lists the vo te totals for the race. Th e winner was Am y Klobuchar and the runn er-up wa s Mark Kennedy . T he s tatewide margin of victory wa s 443,19 6 vot es for 2,217,81 8 vo ters, 20 . 0% of v oters (not of cast vot es). 19 The aud it of this elec tion is discuss ed b y Halvo rson and W olff ( 2007 ). Minnesota elections la w S .F. 2743 (2006) requ ir es auditing a ran d om s ample of p recincts in eac h count y , with a samp le size that d ep ends on the v oting p opulation in the coun ty: countie s with few er than 50,000 registered v oters m u st aud it at least t w o p recincts; coun ties with b et we en 50,000 and 100,000 registered voters must audit at least three; and counties w ith more th an 100,00 0 registered vo ters must audit at least four p r ecincts. A t least one of the pr ecincts aud ited in eac h coun ty m u st h a ve 150 or more v otes cast. Hennepin Count y audited eigh t p recincts instead of the f ou r required. (It still had the smallest samp lin g fraction.) Sev er al other count ies also audited more th an the minim u m required. T able 4 Summary of 2006 U.S. Senate r ac e in Minnesota Undervotes Klobuchar & inv ali d Fitzgerald Ken nedy (Democ/F arm/ Ca vlan Po wers V oters ballots (Indep ) (Rep ub) Labor) (Green) (Constit) W rite-ins 2,217,8 18 15,099 71,194 835,653 1,278,8 49 10,714 5,408 901 19 Data in t his section come from www.so s.state.mn.us/docs/2006 General Results.XLS , electionresults.sos .state.mn.us/200 61107/ElecRslts.asp?M=S& R aces=0102 and www.so s.state.mn.us/home/index.asp?page=544 . CONSER V A TIVE ELECTION AUDITS 25 Precincts audited had from 2 to 2,393 ballots cast. 20 The largest v alue of e p w as 2; the largest v alue of e p /b p w as 0 . 67%. T he tota l observed discrep- ancy was 62 v otes, ab out 0 . 065% of ballots cast in the aud ited precincts, including un derv otes and inv alid v otes. The total observ ed p oten tial margin o ve rstatemen t w as 25 vote s, ab out 0 . 02 6 % of ballots. The audit shows a d ifferent num b er of b allots from that rep orted in 13 of the audited pr ecincts: “ballot accoun ting” apparent ly had not b een d one. T en of the differences we re one ballot eac h . In t wo precincts, 21 the n u m b er of b allots was off b y three. Most of the discrepancies in vote totals seem to ha ve b een caused by jams in the optical scanner or by ballots fed through the scanner twice . Th e observ ed discrepancies in b p are not large enough to affect the error b ounds e + or 0 . 4 b by muc h, b u t they s ho w that b p is not an inviola ble upp er b ound on a p , and there migh t b e larger discrepancies in the p recincts not sampled. Under Minnesota la w, auditors can inte rpret v oter inten t, even if the ballot is not marked prop erly . 22 In one p recinct, 23 three m achine-unreadable ballots originally tallied as undervote s we re interpreted b y the auditors as v otes for Am y Klobuchar. T he precinct h ad only 96 v oters, so a three-v ote error is a large p er centag e of b p —although in this case the error do es not con tribu te to e p b ecause it fa vo rs K lobu c har, the winn er. Th is illustrates wh y taking w p ( z ) = z /b p is p erh aps to o sensitiv e to o ccasional err ors, and w p ( z ) = z or w p ( z ) = ( z − m ) + /b p migh t b e preferable. W e will calculate P -v alues for the h yp othesis that a full man ual recount w ould not find that Am y Klobuc h ar is the w inner, un der a v ariet y of as- sumptions. Because the Minnesota la w links sample sizes to the num b er of registered vot ers in eac h coun ty rather than to the n um b er of precincts in eac h count y and nev er requires more than 4 precincts p er coun ty , the sam- pling fraction of precincts v aries w idely from coun ty to count y . The m ini- m u m sampling fraction in th e 2006 audit was 1 . 9% and th e maxim um w as 23 . 8%. Two-thirds of the countie s had precinct sampling fractions b et ween 4% and 9 %. On ly one had a sampling fraction b elo w 2%—the largest count y , Hennepin. T he o v erall samp lin g fraction was 4 . 9% of precincts. Rep orted undervotes, ov er votes and vot es for all other cand idates total less than the v ote rep orted for runn er-up Mark Kennedy , so th ey can all b e p o oled into one p seudo-candidate as describ ed in section 3.1 . Thus, w e ha v e K = 3 p seudo-candidates, f = 1 , N = 4 , 123, B = 2 , 217 , 818, M = 443 , 196. W e will consider t wo u p p er b ounds on the precinct-lev el miscount , u = e + ( b ) 20 Mean 471, median 272, IQR 505. 21 Spring Lake Park Precinct 3 and Orono Precinct 2. 22 How ever, discrepancies caused by machine-unreadable ballots do not trigger an esca- lation of the audit. 23 Lee T ownship, Norman Count y . 26 P . B. ST AR K T able 5 The smal lest numb er of pr e cincts in M innesota as a whole that must have w p ( e p ) > W J w ( e ) f or the outc ome of the ele ction to differ f r om the outc ome a ful l manual r e c ount would show, wher e J is the set of indic es of pr e cincts actual ly sample d. Her e W J w ( e ) i s the observe d value of the test statistic f or the 202 pr e cincts i n the sample. V alues ar e given for thr e e choic es of the weight functions w p and two b ounds u on the amount of err or e ach pr e cinct c an hold w p ( z ) = z w p ( z ) = z / b p w p ( z ) = ( z − 2) + /b p u = e + ( b ) 130 128 130 u = 0 . 4 b 721 720 721 and u = 0 . 4 b , and three fun ctions for we igh ting the pr ecinct-lev el p otentia l margin o v erstatement s, w p ( z ) = z , w p ( z ) = z /b p and w p ( z ) = ( z − 2) + /b p . The approac h to d ealing w ith stratification in Section 4.2.2 leads to ve ry large P -v alues in this example—o ver 27% for all six combinatio ns of u and w p . W e can get a v ery conserv ativ e P -v alue by p retending that the sample w as d ra wn w ith replacemen t from the ent ire p opu lation of precincts, but that only 1 . 9% of the pr ecincts (78) w ere samp led; this is an application of the b ound in S ection 4.2.1 . T able 5 shows th e low er b ounds on the num b er of precincts statewide that w ould h a ve to h a ve p oten tial margin o verstate men ts greater than w − 1 p ( W J w ( e )) in order to ha ve E ≥ M (here J are the ind ices of th e 202 pr ecincts in the actual sample). T able 6 giv es th e corresp ondin g P -v alues. It also giv es P -v alues using the same observ ed discr ep ancies, b ut p retending th at the samp le of 202 precincts w as dra w n in t wo other wa ys: as a stratified sample with sample size p rop or- tional to the num b er of precincts in eac h count y , using the b ound derived in S ection 4.2.1 , or as a s im p le rand om sample of 202 p recincts. Had the 202 precincts b een dra w n in either of th ose w ays, the P -v alues would b e m u c h s maller than th e b ound deriv ed for the samp lin g sc heme Minn esota actually u sed. T able 6 sho ws that the audit data wo u ld allo w us to reject the hypothesis that a full recount wo u ld find a different winner at significance leve l 10%, for all three choice s of test statistics and f or either err or b ound. Stark ( 2008b ) finds P -v alues ab out half as large usin g a sh arp er measure of discrepancy . F or the error b ound u = 0 . 4 b , we could reject the h yp othesis at significance lev el 1%. If th e data had come from a simple random samp le from the state as a whole, or if the sample size in eac h count y had b een pr op ortional to the num b er of p recincts in the coun ty , we w ould ha ve b een able to r eject the hyp othesis that the apparent outcome differs from the outcome a fu ll man u al recount wo u ld sh o w at signifi cance lev el 1%. CONSER V A TIVE ELECTION AUDITS 27 T able 6 P -values for the hyp othesis that a ful l m anual r e c ount would show that Amy Klobuchar did not win the Senate r ac e, under differ ent assumptions ab out how the sample was dr awn and the p otential mar gin overstatement in e ach pr e cinct [upp er b ounds u = e + ( b ) and u = 0 . 4 b ], and differ ent choic es of the weighting of err ors in e ach pr e cinct. The first r ow is for pr e cinct-level weight function w p ( z ) = z : e ach err or has the same w ei ght. The se c ond r ow is f or w p ( z ) = z /b p : err ors in lar ger pr e cincts have lower weight. The thir d r ow is f or w p ( z ) = ( z − 2) + /b p : that test statistic ignor es the first two p otential mar gin overstat ements i n e ach pr e cinct; after the first two, p otential mar gin overstatement s in lar ger pr e cincts have l ower weight. C olumns 2 and 3 ar e very c onservative upp er b ounds derive d by tr e ating the sample as if i t wer e a smal ler sample of 1 . 9 % of the pr e cincts i n e ach c ounty (78 pr e cincts i n al l, r ather than 202). Columns 4 and 5 pr etend that the data c ame fr om a str atifie d r andom sample of 202 pr e cincts in which the numb er of pr e cincts dr awn fr om e ach c ounty is pr op ortional to the numb er of pr e cincts in the c ounty. Columns 6 and 7 pr etend that the data c ame fr om a simple r andom sample fr om al l the pr e cincts in the state. Only the r esults i n c olumns 2 and 3 apply to the auditing scheme Minnesota actual ly use d 1.9% sample w/ Sample w/o replacement Prop ortional sample replacement u = e + ( b ) u = 0 . 4 b u = e + ( b ) u = 0 . 4 b u = e + ( b ) u = 0 . 4 b w p ( z ) = z 8.2% 0.00003% 0.15% 1 . 4 × 10 − 15 % 0.13% 4 . 6 × 10 − 16 % w p ( z ) = z /b p 8.5% 0.00003% 0.17% 1 . 5 × 10 − 13 % 0.15% 4 . 9 × 10 − 16 % w p ( z ) = ( z − 2) + /b p 8.2% 0.00003% 0.15% 1 . 4 × 10 − 15 % 0.13% 4 . 6 × 10 − 16 % 6. Discussion. 6.1. P -values. As illustrated in Section 5 , the metho d can also fi nd the maxim um P -v alue of the h yp othesis that E ≥ M and hence of the hyp othesis that the elec tion outcome is incorrect giv en d iscrepancy data from a par- ticular sampling design. The maxim um P -v alue is π ⋆ ( W J 1 w ( e ); n 1 , u, w, M ) , where J 1 is the in itial random samp le, of size n 1 . This expression applies only to the initial sample. If the approac h is used sequen tially , the P -v alues need to b e adju sted to tak e that into accoun t. 6.2. Two-p osition c ontests r e quiring sup er-majority. The b ound e + on p oten tial margin ov ers tatemen t can b e sharp ened easily for con tests su ch as ballot measures or prop ositions that hav e only t wo p ositions and that require more than a simp le ma jorit y to pass. F or example, sup p ose that a con test allo ws only “yes” or “no” v otes, and requires a 2 / 3 ma jorit y of “y es” v otes to pass. Su pp ose th at, according to the rep orted totals, the measure passed. Let “y es” b e candidate k = 1 and “n o” b e candidate k = 2. The effectiv e apparent margin is the margin ab o ve 2 / 3 of the total v ote: M = ⌊ V 1 − 2 3 ( V 1 + V 2 ) ⌋ . (28) 28 P . B. ST AR K An er r or th at in creases V 1 b y one vote increases V 1 − 2 3 ( V 1 + V 2 ) b y only 1 / 3 of a vo te. An error th at decreases V 2 b y one v ote increases V 1 − 2 3 ( V 1 + V 2 ) b y 2 / 3 of a v ote. Within eac h p recinct, error could hav e inflated the effectiv e apparen t margin o v er 2 / 3 b y no more than ⌈ ( v 1 p − 2 3 ( v 1 p + v 2 p )) + 2 3 r p ⌉ = ⌈ 2 3 ( r p + v 1 p / 2 − v 2 p ) ⌉ . (29) These are smaller u pp er b ounds u for e than e + are, bu t still rigorous. 6.3. Why not use the sample sum or sample me an? Using th e discrep- ancy of the totals across the precincts in the s amp le as the test statistic instead of calculati ng the discrepancy separately for eac h precinct would ha ve adv an tages. F or example, it would allo w errors that hurt a particular candidate to cancel errors fav oring that candidate in a differen t pr ecinct, whic h might allo w us to r eject the hyp othesis that the wrong candidate wa s named th e winn er using smaller samples. Ho w ever, it is far more difficult to calculate tail probabilities for the discrepancy of th e totals. In particular, it is not true that the most difficult-to-detect elect ion-altering tain t concen- trates as muc h miscount as p ossible in as few pr ecincts as p ossib le, p recisely b ecause cancella tions can o ccur. 6.4. Impr oving the p ower. The appr oac h p resen ted here is conserv ativ e: the c hance that it declares the outcome to b e correct when the outcome is n ot correct is at most α . How ev er, other approac hes could do the same thing usin g smaller aud it samples—they could ha v e more p o wer for the same significance lev el. The elemen ts of th e approac h with the most ro om for impro v ement are these: 1. The test statistic, p o oling and aggregation of the miscoun t. There are sharp er necessary conditions and m easures of discrepancy; see, for exam- ple, Stark ( 2008b ). The fun ctions { w p } p ∈N could b e optimized against v arious alternativ es. One could construct a more p o we rful test using lik eli- ho o d ratios or the sample sum, as describ ed in Section 6.3 . Ho w ev er, these impro v ements in p o wer come at a cost of far m ore complex probabilit y calculatio ns and a loss of transparency to jurisdictional users. Numerical optimization w ould app ear to b e n ecessary to calculate P -v alues. 2. Stratification. T h e approac hes to dealing with stratification for conte sts that cross count y lin es are conserv ative bu t not sharp. Better inequalities w ould allo w smaller samples to b e us ed . 3. Thresholds for s equen tial tests. The inequalities u s ed to set the signifi- cance levels in the sequential tests could b e impro ved. CONSER V A TIVE ELECTION AUDITS 29 4. Sample d esign. If we were at lib ert y to c h o ose th e sampling design, a differen t approac h —suc h as sampling with pr obabilit y p rop ortional to u p , the upp er b ound on the p oten tial margin o verstat emen t—might p ermit smaller samp les. 24 Ideas from sequen tial analysis [ S iegm und ( 1985 ) and W ald ( 2004 )] could certainly help improv e the thresholds for sequen tial testing. 6.5. Alterna tive appr o aches. One could also tak e a Ba y esian app roac h to the problem: giv en a prior probabilit y distrib ution, one could compute p osterior o d ds that the electio n named the right w inner giv en the audit data, and confir m the outcome if those o d ds we re, say , 100 to 1 or greater. Th is approac h requires prior pr obabilit y distrib utions for the num b er of v otes for eac h candidate and for th e p otentia l margin o verstate men t. The false disco very rate [ Benjamini and Ho c hb erg ( 1995 )] give s another p ersp ectiv e: rather than insist that the c h ance of confi r ming an outcome that is incorrect b e n o larger than α , w e could require the exp ected fraction of confirmed election outcomes that are confirmed in er r or to b e no larger than α . A rather differen t appr oac h is to com b ine a b ase rate of random samplin g with “targeted” sampling, wher e candid ates or other in terested parties select some precincts for audit by an y means they choose [ Norden et al. ( 2007 ) and Jefferson et al. ( 2007 )]. Computing a P -v alue for this approac h would require an ad h o c mo d el f or the efficacy of “educated guesses” in fin ding miscoun ted pr ecincts, bu t the metho d could increase public confidence in the election outcome. An y of these app r oac hes is incomplete without rules for expand ing the audit if the precincts in the targeted s amp le show material miscoun t, cul- minating either in confi rming the outcome or in a full recoun t. 7. Conclusions. P ost-election aud its can b e us ed to confirm election out- comes or s ho w that a full manual recoun t is n eeded. The electio n outcome is confirmed if, on the assump tion that th e election outcome is incorrect, the p robabilit y is large that the samp le wo u ld h a ve contai ned larger p oten- tial margin o verstat emen ts than it did conta in . If that pr obabilit y is not sufficien tly large, the samp le size needs to b e increased. Even tually , either 24 See, for example, Aslam, Popa and Rivest ( 2007 ) and Stark ( 2008a ). Current and p ending aud it laws do n ot con template sampling designs oth er than simple or stratified random samples. If sampling with probability p rop ortional t o u p w ere allo wed, it w ould bring election auditing muc h closer to work in financial aud iting, where monetary unit sampling is often u sed [ P anel on N on stand ard Mixtures of Distributions ( 1989 )]. How ever, if precincts hav e differing probabilities of selection, so do ballots, whic h might raise legal issues of differential enfranchisemen t. 30 P . B. ST AR K the samp le includes ev ery p recinct (there has b een a complete manual r e- coun t), or there is comp elling statistical evid en ce that the election outcome is corr ect. Confirmin g an election outcome statistically requires up p er b oun ds on the p oten tial margin ov ers tatemen t in eac h p recinct. Such u pp er b oun ds can b e calculate d f rom u pp er b ou n ds on the total num b er of vo tes in eac h precinct. Upp er b ounds on the num b er of votes could in tur n come from the num b er of registered vo ters, from the num b er of ballots iss u ed to precincts, from precinct p ollb o oks or from “ballo t accoun ting.” Alternativ ely , one migh t use ad h o c b ounds on the p oten tial margin o verstat emen t, suc h as 40% of the num b er of rep orted ballots in the precinct. Th e results are sensitiv e to the b ounds, so ad ho c c hoices need to b e ju stified and tested empirically in ev ery election. Com b ining a base rate of sampling (suc h as C alifornia’s 1% law) with rules for increasing th e sample size for con tests, wh ere—giv en the margin, the num b er of ballots cast in eac h precinct and the miscoun t obs erv ed in the initial s ample—the outcome is in d oubt, is a statistically sound and p oten tially practical wa y 25 to use p ost-election audits to d ecide wh ether to confirm th e outcome. The b ase rate of sampling provides a broad c h ec k for gross errors; increasing the samp le size for close cont ests and con tests wh ere the audit revea ls p oten tial m argin ov er s tatemen ts can guaran tee an y desired lev el of confidence in the outcome. In states where election regulations do not con template increasing the size of an initial audit, the app roac h outlined here can b e used to calculate the confi dence th at eac h election outcome is correct, 26 giv en the size of the sample, the margin, the rep orted v otes in eac h pr ecincts and the p oten tial margin o v erstatement s observed in th e sample. APPENDIX A.1. Pro of of Claim 1 . Bot h P x { W J ⋆ n w ( x ) ≤ t } and P x { W J ⋄ n w ( x ) ≤ t } are monotonic in # { p : x p ≤ w − 1 p ( t ) } . Hence, π ⋆ ( t ) and π ⋄ ( t ) are attained b y th e element x − of X with the few est comp onent s greater than the cor- resp ond in g comp onents of w − 1 ( t ). T o maximize # { p : x p ≤ w − 1 p ( t ) } while k eeping E = P N x ≥ M and x ≤ u , set x p = u p for those comp onents p for 25 The metho d was tested in practice in Marin Count y , California, to audit Measure A on the 5 F eb ruary 2008 ballot to attain 75% confidence th at a full manual count would matc h t he apparent outcome. 26 As mentioned ab o ve, “confid ence” that the outcome is correct is tak en to mean 100% minus the P -val ue of the hypothesis th at the outcome is incorrect; this is not a standard statistical definition of “confidence.” CONSER V A TIVE ELECTION AUDITS 31 whic h u p − w − 1 p ( t ) is largest, and set the remainin g comp onen ts of x to whic h ev er is smaller, u p or w − 1 p ( t ). Thus, for some k , x − is of the form x − p = ( ( u ∧ w − 1 ( t )) p , p ∈ J − k u p , p / ∈ J − k . (30) The v alue of k th at giv es x − is the largest p ossib le v alue for whic h E ≥ M , namely , q [defi ned in equation ( 17 )]. The chance that J ⋆ n ( w ( x − )) ≤ t is the c hance that J ⋆ n consists of n of the q comp onen ts of x − that are less than the corresp ondin g comp on ents of w − 1 ( t ), as equation ( 18 ) asserts. S imilarly , the c hance that J ⋄ n ( w ( x − )) ≤ t is the chance that J ⋄ n includes only comp onents x − that are less than the corresp onding comp onents of w − 1 ( t ). Th ere are q suc h comp onents, so the c hance is ( q / N ) n , as claimed. A.2. Pro of of Claim 2 . Among the N precincts in the con test, k ha ve w p ( e p ) ≤ t . W e divide the N precincts into C strata. In stratum c , there are N c precincts of which k c precincts ha v e w p ( e p ) ≤ t , and P c ∈C k c = k . W e dra w n cs = ⌈ n s N c / N ⌉ precincts at r andom without replacemen t from coun t y c . Let n ′ s = P c ∈C n cs ≥ n s . Let S c b e th e num b er of precincts in the sample from count y c for whic h w p ( e p ) > t . Th en S c has the h yp ergeometric distri- bution w ith parameters N c , N c − k c and n cs , and { S c } c ∈C are ind ep endent. Moreo v er, P { S c = 0 } = k c n cs N c n cs ≤ ( k c / N c ) n cs . (31) This f ollo ws from the f act that x y > x − 1 y − 1 when x < y and y > 1 . Because the samples from different strata are in dep end en t, P ( X c ∈C S c = 0 ) ≤ Y c ∈C ( k c / N c ) n cs . (32) Since n cs ≥ n s N c / N , and n s = P c ∈C n s N c / N , Y c ∈C ( k c / N c ) n cs ≤ Y c ∈C ( k c / N c ) n s N c / N ≤ 1 n s X c ∈C ( n s N c / N )( k c / N c ) ! n s (33) = ( k / N ) n s . The second step is an application of the arithmetic mean–geometric mean inequalit y . Ho effding ( 1956 ), Th eorem 4, prov es something r ather more gen- eral. 32 P . B. ST AR K Inequalit y ( 33 ) sho ws that if w e draw a s tratified samp le of pr ecincts with n cs precincts from coun ty c , c ∈ C , bu t compute the maxim u m P -v alue as if w e were sampling with replacemen t from the en tire p opulation of N precincts (i.e., if we u se π ⋄ as the b ound on the P -v alue), we get a conserv ativ e test. A.3. Pro of of Claim 3 . Claim 3 ju st asserts that either ev ery elemen t of a list is equal to the mean of the list, or there is at least one element greater than the mean: E B = P c ∈C E c B = P c ∈C B c ( E c /B c ) B = X c ∈C B c /B ( E c /B c ) (34) ≤ X c ∈C B c /B ! × _ c ∈C | E c | B c = B /B × _ c ∈C | E c | B c = _ c ∈C | E c | B c . The antepenultimate step follo ws from H¨ older’s in equalit y . So, if the total p oten tial margin o verstate men t E across coun ties is M or more, ther e m ust b e at least one count y c for which E c ≥ M B c /B . Ac kn o wledgment s. I am grateful to Vittorio Add ono, K im Alexander, Alessandra Baniel-Stark, Kathy Dopp, S tephen Fien b erg, Da vid F reedman, Jo e Hall, Mark Halv orson , Da vid J efferson, Mark L in deman, John Mc- Carth y , Jasjeet Sekh on, How ard Stanislevic, Da vid W agner and an anon y- mous referee f or h elpful conv er s ations and commen ts on an earlier d raft, and to Elaine Ginnold and Melvin Briones for data. REFERENCES Aslam, J . A., Pop a, R. A. and Rivest, R. L. (2007). On auditing elec- tions when precincts ha ve different sizes. A v ailable at p eople.csail. mit.edu/rive st/ AslamP opaR ivest-OnAud itingElectionsWhenPrecinctsHa veDifferentSizes.pdf . Benjami ni, Y. and Hochberg, Y. (1995). Controlling th e false discov ery rate: A practical and p ow erful approach to multiple t esting. J. R oy. Statist. So c. Ser. B 57 289–300. MR1325392 Bjornlund, E. C. (2004). Beyond F r e e and F air: M onitoring El e ctions and Building Demo cr acy . W o o drow Wilson Center Press, W ashington, DC. CONSER V A TIVE ELECTION AUDITS 33 Dopp, K. and Stenger, F. (2006). The election in tegrit y audit. Av ailable at uscountv otes.org/ucvInfo/release/Elec tionIntegrit yAu d it-release.pdf . Elections: Federal effor ts to impro ve security and reliability of electro n ic vo ting system s are unde r w a y, but key activities need to be completed (2005). T echnical R ep ort GAO-05-956, U .S. Gov ern ment Accountabilit y Office, W ash- ington, DC. Elections: The na tion’s evol ving election system a s re flected i n the No vem - ber 2004 general election (2006). T ec hnical Rep ort GAO-06-450, U .S. Gov ernment Accountabilit y O ffice, W ashington, DC. Estok, M ., Nevitte, N. and Cow an, G. (2002). The Quick Count and Ele ction Obser- vation . N ational Demo cratic Institute for International Affairs, W ashington, DC. Hal vo rson, M. and Wolff, L. (2007). Rep ort and analysis of the 2006 p ost- election audit of Minnesotas voting systems. Av ailable at ceimn.org/files/C EI MN AuditRep ort2006.p df . Hite, R. C. (2007). Elections: All levels of gov ernment are n eeded to address electronic voting sy stem challe nges. T ec hnical Rep ort GAO-07-714 T, U.S. Gov ernment Account- abilit y Office, W ashington, DC. Hoeffding, W. (1956). On the distribution of the num b er of successes in indep endent trials. Ann. Math. Statist. 27 713–721. MR0080391 Jefferson, D., Alexan der, K., Ginnold, E., Lehmkuhl, A., Midstokke, K. and St a rk, P . B. (2007). P ost election audit standards rep ort–ev aluation of audit sam- pling mod els and options for strengthening Califor nias manual count. Av ailable at www.so s.ca.go v/elections/pea s/final p easwg rep ort.pd f . McCar thy, J., St anislevic, H., Lindeman , M ., Ash, A ., Addona, V. and Ba tcher, M. (2008). Percen t age based vs. statistical-p ow er-based vote tabulation auditing. The Amer ic an Statistician 62 11–16. Norden, L., Burstein, A., Hall, J. L. and Chen , M. (2007). Post-election audits: Restoring tru st in elections. T echnical report, Brennan Center for Justice, New Y ork Universit y and Samuelson Law, T echnology & Public Policy Clinic at Universit y of Califo rnia, Berkeley School of Law (Boalt Hall), N ew Y ork. Na tional Associa tion of Secret aries of St a te (2007). P ost election audit pro cedures by state. Av ailable at nass.org/index.php?option=com do cman&task=do c download&gid=54 . P ane l on No nst andard Mi xtures of Distributions (1989 ). Statistical models and analysis in aud iting: Panel on nonstandard mixtures of distributions. Statist. Sci. 4 2–33. Rivest, R. L. (2006). On estimating the size of a statistical audit. Ava ilable at p eople.csail. mit.edu/rive st/Rives t-OnEstimatingTheSizeOfAStatisticalAudit.p df . Sal tma n, R. G. (1975). Effective use of comput ing tec h nology in vote-tallying. T echnical Rep ort N BSIR 75-687, National Bureau of Standards, W ashington, D C. Siegmund , D. (1985). Se quential Analys is: T ests and Confidenc e Intervals . Springer, New Y ork. MR0799155 St a nislevic, H. (2006). Rand om auditing of e-voting sy stems: How muc h is enough? Av ailable at www.v otetrustusa.org/p dfs/VTTF/EVEP Au diting.p df . St a rk, P. B. (2008a). Election audits by sa mpling w ith probabilit y pro- p ortional to an error b ound: Dealing with discrepancies. Av ailable at statistics.berkeley .edu/˜stark/Preprin ts/pp ebwrwd08.pdf . St a rk, P. B. (2008b). A sh arp er discrepancy measure for p ost-election audits. Ann . Appl. Statist. T o app ear. 34 P . B. ST AR K Verified V oting F ounda tion (2007). M an ual aud it req uirements. Av ailable at www.v erifiedvoting.org/do wnloads/stateaudits1007.pdf . W ald, A. (2004). Se quential A nalysis . Dov er, Mineola, NY. Dep ar tment of St a tistics University of California Berkeley, California 9472 0-3860 USA E-mail: stark@stat.berkeley .edu
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment