Analyzing EPH

By Bruce Schneier: Jameson Quinn and I analyzed the E Pluribus Hugo (EPH) voting system, proposed as a replacement for the current Approval Voting system for the Hugo nominations ballot. (This is an academic paper; the Hugo administrators will be publishing their own analysis, more targeted to the WSFS Business Meeting, in the coming weeks.) We analyzed EPH with both actual and simulated voting data, and this is what we found.

If EPH had been used last year in the 2015 Hugo nominations process, then…

The number of slate nominees would have been reduced by 1 in 6 categories, and by 2  in 2 categories, leaving no category without at least one non-slate nominee.

That doesn’t seem like very much. A reasonable question to ask is why doesn’t it reduce the number more. The answer is simply that the slate was powerful last year.

The data demonstrates the power of the Puppies. The category Best Novelette provides a good example. This category had 1044 voters, distributed over 149 different works with 3 or more votes. Of these voters, around 300 (29%) voted for more Puppy-slate works than non-Puppy ones, and about half of those (14%) voted for only Puppy-slate works. These numbers are also roughly typical. The other 71% of the ballots included under 3% with votes for any Puppy work (this is relatively low, but not anomalously so, compared to other categories).

Despite being a majority, the non-Puppy voters spread their votes more thinly; only 24% of them voted for any of the top 5 non-Puppy works. This meant that 4 of the 5 nominees would have been from the Puppy slate under SDV-LPE or SDV.

(SDV-LPE stands for “Single Divisible Vote – Least Popular Elimination,” the academic name for this voting system. SDV is “Single Divisible Vote,” a long-standing and well-understood voting system.)

To further explore this, we took the actual 2014 Hugo nominations data from Loncon 3 and created a fake slate, then analyzed how it affected the outcome at different percentages of the vote totals:

In Figure 1, we assume perfectly correlated bloc voters. They vote in lockstep (with minimal exceptions to prevent ties), and their five nominations are completely disjoint from the other nominations. As you can see, both SDV-LPE and SDV reduce the power of the bloc voters considerably. Under AV, the voting bloc reliably nominates 3 candidates when they make up 10.5% of the voters, 4 candidates when they make up 12.5%, and 5 when they make up 19%. Under SDV-LPE, they need to be 26% of voters to reliably nominate 3 candidates, 36.5% to reliably nominate 4, and 54% to reliably nominate 5….

Figure 2 simulates a more realistic voting bloc. We sample the actual behavior of the bloc voters in the 2015 Hugo nominations election, and add them to the actual 2014 nominations data. For the purposes of this simulation, we define bloc voters as people who voted for more Puppy candidates than non-Puppy candidates. In this case, the actual bloc voters did not vote in lockstep: some voted for a few members of the slate, and some combined slate nominations with non-slate nominations. For the purposes of the simulation, when they voted for the nth most popular non-Puppy candidate in 2015, we imputed that into a vote for the nth most popular non-Puppy candidate in 2014. In this case, SDV-LPE and SDV reduce the power of those voting blocs even further. Under AV, the voting bloc reliably nominates 3 candidates with 14% of the voters, 4 candidates with 17% of the voters, and 5 with 39%. Under SDV-LPE, they need to make up 27.5% to nominate 3 candidates, 38% to nominate 4, and 69.5% to nominate 5….

The upshot of all this is that EPH cannot save the Hugos from slate voting. It reduces the power of slates by about one candidate. To reduce the power of slates further, it needs to be augmented with increased voting by non-slate voters.

There is one further change in the voting system that we could make, and we discuss it in the paper. This is a modification of EPH, but would — for the slate percentages we’ve been seeing — reduce their power by about one additional candidate. So if a slate would get 5 candidates under the current system and 4 under SDV-LPE (aka EPH), it would get 3 under what we’ve called SDV-LPE-SL. Yes, we know it’s another change that would require another vote and another year to ratify. Yes, we know we should have proposed this last year. But we had to work with the actual data before optimizing that particular parameter.

Basically, we use a system of weighing divisible votes named after the French mathematician André Sainte-Laguë, who introduced it in France in 1910. In EPH, your single vote is divided among the surviving nominees. So if you have two nominees who have not yet been eliminated, each gets half of your vote. If three of your nominees have not yet been eliminated, each gets 1/3 of your vote. And so on. The Sainte-Laguë system has larger divisors. If you have two nominees who have not yet been eliminated, each gets 1/3 of your vote. If three of your nominees have not yet been eliminated, each gets 1/5 of your vote. Each of four get 1/7; each of five get 1/9. This may sound arbitrary, but there’s well over a hundred years of voting theory supporting these weights and the results are still proportional.

Implementing SDV-LPE-SL using the actual 2015 Hugo data:

SDV-LPE-SL comes even closer to giving slate voters a proportional share, with 7 fewer slate nominees overall, and only 1 category without a choice between at least 2 non-slate nominees.

For the perfectly correlated voting bloc simulation:

Under SDV-LPE, they need to be 26% of voters to reliably nominate 3 candidates, 36.5% to reliably nominate 4, and 54% to reliably nominate 5. Under SDV-LPE-SL, they need to be 35% for 3, 49% for 4, and 66% for 5.

And for the more realistic voting bloc simulation:

Under SDV-LPE-SL, they need 36% for 3, 49% for 4, and over 70% for 5.

That’s a big difference.

Here’s our paper. It’s academic, so it refers to the voting system by its academic name. It spends a lot of time discussing the motivation behind the new voting system, and puts it in context with other voting systems. Then it describes and analyzes both SDV-LPE and SDV-LPE-SL.

356 thoughts on “Analyzing EPH

  1. JJ. “Apparently you missed the testing which was done last year on the real-life 1983 nomination data.”

    Ooops, you’re right. Comes of not following things closely. (We’re very much on the periphery of fandom with most of us only going to just several worldcons over nearly 40 years (first being 1979 when it was over here (Europe)) so most years we are out of it).

    Anyway, thanks for correcting.

    🙂

    it’s not cool to come into the middle of the discussion and throw around absolutist statements when you don’t know all the backstory.”

    Right, so everyone has to know everything before they can make a contribution, and that’s ‘cool’.

    🙁

    .Brian Z “If I’m catching SF2 Concatenation’s drift, they want to say that the goal of the Hugos is for fans to come together organically to choose the most excellent work.

    The more successful the fans are, the more the pursuit of the goal of the Hugos looks like a slate and the more EPH works against the very behavior that the Hugos exist to reward.”

    Exactly.

    JH. A few of us (hence ‘we’) discussed this a couple of times and I’m just doing the best to summarise our conclusions. (Nothing royal.)

    Hampus E. Yup, if you can make Barcelona in November then great (they are having a social gathering the evening the day before as well as day after the Eurocon convention so folk can meet and to touristy things). A few of us will be there including a couple of us who have discussed this Hugo Puppy business as we see it (albeit in an uncool incomplete way from afar on the peripheries of Worldcon fandom).

    Laura “I think SF2 Concatenation is actually saying that’s the problem. EPH can’t tell the difference between an organic clump and a slate, but it’s decreasing the power of both.”

    Absolutely, its something that needs to be checked. Testing and checking against null hypotheses is at the heart of science. And if things work then we can have greater confidence in the new system and if it doesn’t then we’ll need to take the problem into account. That’s all.

    Andrew M. “As JJ et al say, it was tested on real data, those for 1983…”

    Which is great and useful and necessary. That test — being on data from a pre-Puppy year — will help (or rather helped) inform as to whether EPH affects a potential organic slate against more random nominating (in which nominators tend to nominate just one title in common with many others).

    The new test (reported in the main post above) on real 2015 data tests EPH impact on puppies (and we have seen that EPH is not as effective as might have been hoped) could it be) but could it also be that it has an impact on an organic, virtual slate (many fans homing in on the same excellence) in conjunction with another real (puppy) slate?

    The 1983 EPH test was on puppy free data. If a virtual, organic slates exists (which we can’t know without doing a multivariate similarity analysis [though there is a simpler dirty way of doing this as a rough check]), then the 1983 test helps demonstrate that EPH does not seem to affect either no slate or alternatively a single unwitting single (organic) slate nominating. However it does not answer the question (and this is a personal view as we have not as a group discussed this discussion thread) as to whether EPH serves to work against both a puppy slate and — should it mathematically exist — an organic virtual slate (albeit proportionally) so letting in works next in line further down the list. (NB: The 1983 test was on no-Puppy alone) Also, as previously said, this could possibly explain why EPH does not seem as effective as was hoped.

    Laura: “I don’t want any small group to dominate the ballot. Not a puppy slate. Or a small organic clump. If something is popular across fandom, then the little clumps will overlap on those and they will push through to the ballot with EPH.”

    Much sympathies. Alas emergent behaviour is emergent behaviour and that’s life. The mathematical properties of systems is blind to personal likes.

    As pointed out earlier in this discussion thread, there does seem to be a homing in on what a lot of people call ‘excellence’ and this is not just in Hugo nominating (with its long tail of many works getting few nominations) but also looking at the number of commentators that each year do best-of-SF-work lists. You can see that some works are common to a number of these lists and that each year a number of these end up on the Hugo shortlist. Which is why (hate to repeat what I said earlier) this year we did a list of links to these commentators so folk could see for themselves.
    http://www.concatenation.org/news/news1~16.html#others_best

    If something is popular across fandom, then the little clumps will overlap on those and they will push through to the ballot with EPH

    Agreed, let’s hope so, but also lets make sure that EPH or whatever we end up with, is the best way to do this.

    Mike G “SF Concatenation: A slate is not a mere list, it is accompanied by a call for action and is voted in response that call.

    It is nonsense to label as a “virtual slate” the convergence of choices that voters have made independently.”

    You are absolutely correct (in one – nonetheless very real – sense) which is why I hoped I explained it at some length that we (our group) had been discussing the mathematical properties of Hugo nominators’ ballot. If a sufficient number of Hugo nominators (as per the best-of-lists linked to a few paragraphs above) have the same works on their ballots then this in effect (mathematically speaking) means they have the similar properties. This is irrespective for a call for action or whatever. So ‘yes’ you are quite right a virtual (or organic as some call it above) slate is not a real slate in the dictionary sense you mean it, but it may be in the mathematical sense have slate-like (mathematical) qualities and so we need to test for this.

    Phew. Had not expected to get sucked into this, had hoped just to signal to a couple of points we made on our site a year ago and earlier this year before the Bruce Schneier and Jameson Quinn new analysis reported in the main post above.

    Much appreciation to Jason and Quinn. They have put a lot of work in.

    Hope some of the passing thoughts from Concat possibly helpful. It is a complex subject and there are many nuances. Please forgive me if I bow out now and don’t contribute further here. As said, a few of us from SF2Concat, including a couple who were part of a small group that have discussed this, will be at the Barcelona Eurocon and if any of you are there then hope we get time for a beer or whatever.

    So long.

  2. @SF2 Concatenation

    Just in case you pop back in.

    Here’s the way I understand it:

    A very disciplined slate like the Rabid Puppies creates a bloc of ballots which are almost completely the same. They do not care who wins. They just want to cause trouble. They have no reason to actually pick any of their own choices. They just vote their slate.

    The Sad Puppies in previous years had a less disciplined slate. They actually are fans who probably put some of their own choices and then filled out the rest of their ballots with their slate. So their ballots overlap on many but not all choices. Moreover, where they overlap, it is probably for choices that many of them don’t individually think are Hugo-worthy.

    Then you have some small groups who are big fans of something particular like Doctor Who. They might fill out all 5 slots of the BDP:SF category with episodes of their favorite show. Naturally, since they are not intentionally matching their ballots to one another, they do not agree on which 5 episodes are the best. There may be certain episodes more of them converge on. So their ballots will overlap more in that one category then your average Hugo nominator, but not as much as those following a slate.

    Then you have everyone else. They’re reading, discussing, and swapping recommendations. Yes, certain works/creators are getting more attention. But individual ballots are still going to overlap the least in this group.

    EPH will have the greatest effect on the most cohesive group. So it has the greatest effect on a group like the Rabid Puppies, then a group like the Sad Puppies (in previous years), then a situation like a small group of Doctor Who Superfans. Finally, it will have the least effect on everyone else.

    This is why I like it. It has a greater effect where it’s most needed (although not as much as we hoped). It would still make a better ballot, more reflective of general fandom, even in a year without griefers.

  3. @laura
    The Sad Puppies in previous years had a less disciplined slate. They actually are fans who probably put some of their own choices and then filled out the rest of their ballots with their slate. So their ballots overlap on many but not all choices. Moreover, where they overlap, it is probably for choices that many of them don’t individually think are Hugo-worthy.

    I can make a case for the opposite, I think.

    People who pay attention to the SP recommendations would converge on the works that would normally get the most votes anyway — the “good” works. Then, to fill out their ballots, they would randomly select from the remaining SP recommendations, thus diffusing them somewhat relative to the former works (although not as much as they would be diffused if they came from the long tail of works normally nominated by organic voters).

    I say this not to be intentionally contrary, but to say that different assumptions about how groups act leads to different results, and that’s why analysis of data is so useful — it helps us figure out how people really act, instead of how we’d guess they’d act.

  4. @SF Concatenation 2:

    @JJ:

    “it’s not cool to come into the middle of the discussion and throw around absolutist statements when you don’t know all the backstory.”

    Right, so everyone has to know everything before they can make a contribution, and that’s ‘cool’.

    Clearly it is not the case that “everyone has to know everything before they can make a contribution.” Where you went wrong, and you most assuredly went wrong, was combine not knowing everything with speaking as if you did know everything. Specifically saying, “Nobody tested this on a non-slate year” when, in fact, someone had. Had you said, “Did anyone test this on a non-slate year? That would be an important step.” Nobody would’ve had any problems.

    Now, you did one other thing wrong too, which was recast JJ’s very specific criticism in terms much more favorable to yourself that don’t comport with what JJ actually said, right in front of everybody. That’s rude and self-pitying and, on top of everything else, ineffective. You owe JJ an apology and yourself some introspection time.

  5. @Bill

    I would characterize Sad Puppies 4 as a recommendation list (not a slate), and I’d agree with your assessment of what may have happened there. My previous comment was in regard to Sad Puppies 3. Admittedly, I don’t really know how they decided to nominate what they did. The point I was trying to make was that their slate seemed to have been adhered to less than the Rabid Puppies.

  6. Pingback: To Say Nothing of the Dogs; or, How We Confound the Hugos’ Third Slump (Hugo voting proposal discussion 5) | File 770

Leave a Reply

Your email address will not be published. Required fields are marked *