Analyzing EPH

By Bruce Schneier: Jameson Quinn and I analyzed the E Pluribus Hugo (EPH) voting system, proposed as a replacement for the current Approval Voting system for the Hugo nominations ballot. (This is an academic paper; the Hugo administrators will be publishing their own analysis, more targeted to the WSFS Business Meeting, in the coming weeks.) We analyzed EPH with both actual and simulated voting data, and this is what we found.

If EPH had been used last year in the 2015 Hugo nominations process, then…

The number of slate nominees would have been reduced by 1 in 6 categories, and by 2  in 2 categories, leaving no category without at least one non-slate nominee.

That doesn’t seem like very much. A reasonable question to ask is why doesn’t it reduce the number more. The answer is simply that the slate was powerful last year.

The data demonstrates the power of the Puppies. The category Best Novelette provides a good example. This category had 1044 voters, distributed over 149 different works with 3 or more votes. Of these voters, around 300 (29%) voted for more Puppy-slate works than non-Puppy ones, and about half of those (14%) voted for only Puppy-slate works. These numbers are also roughly typical. The other 71% of the ballots included under 3% with votes for any Puppy work (this is relatively low, but not anomalously so, compared to other categories).

Despite being a majority, the non-Puppy voters spread their votes more thinly; only 24% of them voted for any of the top 5 non-Puppy works. This meant that 4 of the 5 nominees would have been from the Puppy slate under SDV-LPE or SDV.

(SDV-LPE stands for “Single Divisible Vote – Least Popular Elimination,” the academic name for this voting system. SDV is “Single Divisible Vote,” a long-standing and well-understood voting system.)

To further explore this, we took the actual 2014 Hugo nominations data from Loncon 3 and created a fake slate, then analyzed how it affected the outcome at different percentages of the vote totals:

In Figure 1, we assume perfectly correlated bloc voters. They vote in lockstep (with minimal exceptions to prevent ties), and their five nominations are completely disjoint from the other nominations. As you can see, both SDV-LPE and SDV reduce the power of the bloc voters considerably. Under AV, the voting bloc reliably nominates 3 candidates when they make up 10.5% of the voters, 4 candidates when they make up 12.5%, and 5 when they make up 19%. Under SDV-LPE, they need to be 26% of voters to reliably nominate 3 candidates, 36.5% to reliably nominate 4, and 54% to reliably nominate 5….

Figure 2 simulates a more realistic voting bloc. We sample the actual behavior of the bloc voters in the 2015 Hugo nominations election, and add them to the actual 2014 nominations data. For the purposes of this simulation, we define bloc voters as people who voted for more Puppy candidates than non-Puppy candidates. In this case, the actual bloc voters did not vote in lockstep: some voted for a few members of the slate, and some combined slate nominations with non-slate nominations. For the purposes of the simulation, when they voted for the nth most popular non-Puppy candidate in 2015, we imputed that into a vote for the nth most popular non-Puppy candidate in 2014. In this case, SDV-LPE and SDV reduce the power of those voting blocs even further. Under AV, the voting bloc reliably nominates 3 candidates with 14% of the voters, 4 candidates with 17% of the voters, and 5 with 39%. Under SDV-LPE, they need to make up 27.5% to nominate 3 candidates, 38% to nominate 4, and 69.5% to nominate 5….

The upshot of all this is that EPH cannot save the Hugos from slate voting. It reduces the power of slates by about one candidate. To reduce the power of slates further, it needs to be augmented with increased voting by non-slate voters.

There is one further change in the voting system that we could make, and we discuss it in the paper. This is a modification of EPH, but would — for the slate percentages we’ve been seeing — reduce their power by about one additional candidate. So if a slate would get 5 candidates under the current system and 4 under SDV-LPE (aka EPH), it would get 3 under what we’ve called SDV-LPE-SL. Yes, we know it’s another change that would require another vote and another year to ratify. Yes, we know we should have proposed this last year. But we had to work with the actual data before optimizing that particular parameter.

Basically, we use a system of weighing divisible votes named after the French mathematician André Sainte-Laguë, who introduced it in France in 1910. In EPH, your single vote is divided among the surviving nominees. So if you have two nominees who have not yet been eliminated, each gets half of your vote. If three of your nominees have not yet been eliminated, each gets 1/3 of your vote. And so on. The Sainte-Laguë system has larger divisors. If you have two nominees who have not yet been eliminated, each gets 1/3 of your vote. If three of your nominees have not yet been eliminated, each gets 1/5 of your vote. Each of four get 1/7; each of five get 1/9. This may sound arbitrary, but there’s well over a hundred years of voting theory supporting these weights and the results are still proportional.

Implementing SDV-LPE-SL using the actual 2015 Hugo data:

SDV-LPE-SL comes even closer to giving slate voters a proportional share, with 7 fewer slate nominees overall, and only 1 category without a choice between at least 2 non-slate nominees.

For the perfectly correlated voting bloc simulation:

Under SDV-LPE, they need to be 26% of voters to reliably nominate 3 candidates, 36.5% to reliably nominate 4, and 54% to reliably nominate 5. Under SDV-LPE-SL, they need to be 35% for 3, 49% for 4, and 66% for 5.

And for the more realistic voting bloc simulation:

Under SDV-LPE-SL, they need 36% for 3, 49% for 4, and over 70% for 5.

That’s a big difference.

Here’s our paper. It’s academic, so it refers to the voting system by its academic name. It spends a lot of time discussing the motivation behind the new voting system, and puts it in context with other voting systems. Then it describes and analyzes both SDV-LPE and SDV-LPE-SL.

356 thoughts on “Analyzing EPH

  1. @Jim Henley: If it were my party and a guest stuck his fingers in the food and everyone’s faces, claimed it was his party, and threatened to burn my house down if his demands weren’t met, I’d call the police. I wouldn’t fool around. But, to stretch the analogy, it’s not just my party. I was chosen by the party’s members to host the event that year. Next year, it will be someone and where else. So, how can I make sure Salad Fingers doesn’t come back, in a way that’s as portable and dispassionate as possible, for a party that’s justly suspicious of the host having that power?

    3SV + EPH+ is a mechanical and a social solution. With a filter in the second and third steps, unqualified works aren’t likely to make it through to an award, regardless of origin. Doing this places the burden of exclusion on the party’s membership, in a way that has general utility. It also gives no a weight individual and/or administrative rejection doesn’t have.

    I see cases where it could break down, like:

    1) Multiple competing slates.
    2) See: “McBoatface, Boaty”

    I’m not against administrative exclusion. Given how difficult it is to make changes to the rules, and the lack of an overt mechanism for dealing with bad actors, a jury/panel with the power to make decisions about the award’s content, and subject to the same two-step process as changes to the constitution, might be a good tool to have in the kit. This would satisfy the desire to deal with emergencies as they come up, while limiting potential for abuse.

    –And, off to the new thread.

  2. @Hampus Eckerman: but you seem to be ok with a second nomination round. What’s the substantive difference? And it’s really the complete opposite of a slate – it’s informing nominators that five of these fifteen works will be the finalists (which is true whether the longlist is published or not), and asking them which ones they prefer. It gets a lot more people involved in picking the finalists, which is a good thing whether there’s a slate or not. Last year, the fifth most popular non-slate short story got just 41 nominations out of 1174 ballots; surely a higher percentage of nominators should endorse a work to get it on the final ballot?

  3. @Mokoto: I support 3SV. As I’ve said in so many words, I view 3SV as giving fandom the tools to make social solutions to the social problem.

  4. the little mole:

    “@Hampus Eckerman: but you seem to be ok with a second nomination round. What’s the substantive difference?”

    The substantive difference is that both round of nominations are done in good faith. People vote for their own favourites, not according to popularity among other voters. There is no way during the first round for people to adjust their voting after what others vote for. There is no way during the second found for people to adjust their voting after what others vote for.

    You want people to change their own votes according to how others vote. That is not ok.

  5. Over on the other thread, I’ve proposed a way to Grief the double nominations round which I think means I can’t support it.

  6. @Jim Henley: I support 3SV. As I’ve said in so many words, I view 3SV as giving fandom the tools to make social solutions to the social problem.

    Alright! I read the objection to my catharsis comment as an objection to using 3SV (or a mechanical solution in general) as a social solution. Given the problems inherent in an administrative solution, I figure it’s the best way to deal with trouble in general. A creep with delusions of grandeur runs a slate? No. Trolls try to run a Joyce pastiche by spambot through the award? No. With the added benefit that it’s the electorate’s choice vs something either can pin on administrative figures. They’ll have their Emmanuel Goldstein, but at least they have to invent him.

  7. A consensus seems to be forming around 3SV or some tweak thereof. This leads to the question: if this passes, is EPH+ still needed?

    A combination of 3SV(ish) and EPH would substantially address most of the issues people have. I do think that EPH+ would still marginally improve the finalist list, giving a slate less ability to influence it. I think it is up to fandom as to whether that improvement is enough for EPH+ to earn its keep, both in terms of time at the business meeting and in terms of (slightly) increased difficulty in explaining the overall system and its motivations. I’d personally think that the answer is “yes”, but I realize I’m not representative of fandom in this regard (and that could well be putting it mildly).

    I’m posting this message on all three threads because I don’t really know where it should go.

  8. Richard Gadsen:

    “Over on the other thread, I’ve proposed a way to Grief the double nominations round which I think means I can’t support it.”

    The same griefing method can be used against 3SV.

  9. The same griefing method can be used against 3SV.

    No, it can’t. It wouldn’t reach quorum.

  10. “No, it can’t. It wouldn’t reach quorum.”

    I do not agree with you. By nominating things that weren’t crap, i.e human shields, griefers can become kingmakers. We can see this strategy at work this year with works like Penrics Demon and others. It is very unlikely anyone would downvote this during a second phase, even if it was on a slate.

  11. Ah, I understand what you’re saying. And you’re right; under 3SV, slaters would still have the power to promote human shields at the expense of other works with marginally higher non-slate popularity. But:

    1. That has nothing at all to do with 3SV; it would happen before the longlist was announced.
    2. EPH+ would substantially limit this power. Based on my simulations, I’d say that at an absolute minimum, 2/5 of the finalists would be the same as they would have been without the slate, including the top non-slate candidate. On average, I’d guess it would be closer to 3/5 including the top 2/5. So a slate coordinator could look at the longlist, guess what would come in #3-6 without the slate, and, if they guess well enough, knock off 2 of those by aligning with the other two. That’s more power than I’d like them to have, but still probably not enough to hold their interest for the long term.
    3. EPH would also help; basically, one slot worse than the numbers above.
    4. X/6 would help too, though it increases the homework burden for the voting round.

  12. 1. It has something to do with 3SV as we are now discussing what voting system we would like and if it solves this kind of problem. 3SV by itself does not.

    2-3. I agree with this. But EPH/EPH+ would have the same effect in DN. In two stages. That is why I said this kind of griefing (kingmaking) could be used in both DN and 3SV with about the same success rate. Possibly less in DN (or Omnibus) as there would be two rounds of EPH/EPH+.

  13. Hampus Eckerman http://file770.com/?p=28946&cpage=5#comment-433718 “No. It is not the number of nominations an item gets that is important in EPH. What is important is if they are given by slatevoters, i.e persons a group of persons that all vote on exactly the same items.”

    Yes, understood. I did not use correct terms. Apologies. Hope the sense of what was meant still comes through.
    http://file770.com/?p=28946&cpage=5#comment-433681
    Our concerns still stand and could still possibly (and easily) explain the recent results on last year’s nomination stats (see above main post).

    Regarding your mention of the original (2015?) testing on two different Worldcon data. Would have been happier if some multivariate analysis generating a voting(nominating) similarity cluster graph had been included in the analysis (I saw last years WSFS business meeting on YouTube: what a marathon session that was…). Such an analysis could have helped nullify one of our concerns had only one cluster appeared: the puppies. Had two appeared (which we suspect might) then that would argue against (or explain) the EPH efficacy recently demonstrated (again see above main post).

    Moving on… Conversely, the 4/6 rule (nominate 4 and vote 6) seems sensible and has the advantage of being easier to understand than EPH (whose complexity may mean that some may well claim is an attempt to game the Hugos: a concern that was raised by a N. American Worldcon regular at the 2015 WSFS business meeting).

  14. SF2 Concatenation:

    “Yes, understood. I did not use correct terms. Apologies. Hope the sense of what was meant still comes through.”

    No. You came with a statement that seemed like total misunderstanding of how EPH works. Now you say that your concerns still stand.

    Why when your first statement was so wrong? What concerns is it you have and why? In what way would EPH make it worse if there was two slates? It has been proven to work with multiple slates and it is quite obvious why if you look at the math.

  15. @Hampus Eckerman: the post-longlist portion of the nomination period is a de-facto second round. In the first portion, people vote for their own favourites; in the second portion, they vote on the longlist (and/or what are effectively write-ins, if they continue to nominate anything not on the longlist). I don’t see how that’s meaningfully different from the second round in 3SV/DN voting on what other people voted for in the first round.

  16. @SF2

    Just because there may be some very popular non-slated works does not make them like a pseudo-slate. The chances that a bloc of ballots would have all of them is still smaller than the slated works. If a small group of people all did just happen to nominate the same 4-5 works even without co-ordination, should their taste dominate the ballot?

    The category where unintentional bloc-voting did seem to be a problem in the past was BDP:SF. But I think many people see the fact that EPH lessens the chance of *unintentional* bloc-voting dominating a category as an additional benefit, not a bug.

    Frankly, I’m tired of people complaining that EPH is too complex to understand and that people will think it’s gaming the nominations. Clearly it’s not. Do those who assume it’s unfair think the results were seeing now are being doctored?

    4 and 6 may be simple, but it’s less effective. I’d be in favor of passing both. Unless there seems to be sufficient evidence that 4/6 would reduce the effectiveness of EPH.

  17. FYI, it isn’t approval voting being used right now. It’s bloc voting. That’s a very important distinction. I’m not sure why terminology is being tortured so much in this discussion in general. It’s STV, not SDV or EPH. Why does the terminology need to be tortured?

    It seems to me that your analysis is rather badly limited by a lack of data of how many people would choose to rank more or less than 5 candidates, and how they would rank those candidates. That’s a necessary limitation, but one I think you should be more up front about. There is real world data to draw on here. STV is used in national and subnational elections. For a slate/party with 29% of the vote took take 4/5 available spots would be completely unprecedented. I’d suggest your analysis is underestimating the effectiveness of STV here, and badly. North Ireland just had its election under STV less than 2 weeks ago, and in that election the largest party received 29.2% of the vote, and 35% of the seats. That’s pretty typical of STV – if anything, it’s more distortion than normal.

    I think that’s a useful guide. It strikes me as extremely unlikely that a slate with 29% of the vote would be able to manage any more than 2/5 nominees in any category.

    The 4/6 would be unnecessary with EPH, and probably less effective, as the non-puppy vote is likely going to be split between a greater number of choices, and better able to take advantage of additional/rankings choices. Just moving the number of nominees up by 1 would increase the overall proportionality though.

  18. @Ryan

    We are discussing the nomination stage. No ranking is done until the final ballot.

  19. @Ryan: My concern about your a priori reasoning is that it does not take cognizance of the fact that the paper linked in the top-line post here analyzes an actual real-world data set. You don’t say, based on your reading of their paper, where they went wrong.

  20. the little mole:

    ” I don’t see how that’s meaningfully different from the second round in 3SV/DN voting on what other people voted for in the first round.”

    That you do not does not mean that we others don’t. We have explained the difference and why we think your idea is less good.

  21. Hampus: “What concerns is it you have and why? In what way would EPH make it worse if there was two slates?”

    OK. Sorry again for not being clear. The fault is at this end for not spelling things out, so here’s another stab. The concern is this…

    EPH is designed to work against slates and originally theoretically trialled — this testing was on hypothetical data.

    There is a worry. There are assumptions underpinning employing a technique such as EPH. The key one for us is whether or not, mathematically, slate voting different to ‘normal’ voting?

    The assumption (or hypothesis) EPH operates under is that it is.

    The null hypothesis is that it isn’t.

    This assumption therefore needs to be tested (if that is someone is into the science approach). This testing should be in addition to the trial using theoretical data (and the subsequent trial using real 2015 voting data).

    The above paper was written _after_ the proposal first went to WSFS (but before this year’s ratification). So the WSFS decision to accept EPH last year did not benefit from the paper we now have. Further trialling has now been done on real data (the 2015 Hugo results) and the conclusions from the resulting paper drawn to our attention in the main post above.

    EPH has been found not to be as effective as was hoped.

    Why?

    We contend that this may be because the assumption that slate voting via puppies may not be as different as ‘normal’ voting. (Note, we are not and never have been _certain_ but we have this concern.)

    Looking at past Hugo long lists (published after each award ceremony and at the bottom below the shortlist voting stats for 2015 see here
    http://www.thehugoawards.org/content/pdf/2015HugoStatistics.pdf ) you can see the number of nomination ballots cast, percentages and (at the very bottom of it all) the number of works nominated in each category.

    This shows (as has also been mentioned a number of times by past administrators) that few works get a lot of votes and many works get few works.
    (If you are into the science, it’s leptokurtic with a long tail.)

    All well and good, and all verifiable.

    Now, back to the assumption. EPH proposers assume slate voting and normal voting are statistically different in similarity. That is to say puppies have on their ballots all puppy titles and that non-puppy voters (lets call them ‘normal’ voters – no disrespect meant to anyone so take ‘normal’ as meaning voters who previously voted in non-puppy or normal years) have non-puppy slate titles on their ballots and that there are a range of these including some (many) works that will not get near to being on the shortlist.

    OK. Now, if this is so and if in normal year many normal voters vote for a few titles then this normal pattern of voting could (note ‘could’ as in hypothetically) be construed as a kind of virtual slate within a more random sea of many works getting just a very few nominations. This is not to say that normal Hugo voters have an intentional slate but that they are homing in on a few titles and that this can possibly be considered mathematically to be an unintentional slate.

    So while there be many that do not get on the shortlist, only a very few propose each of these titles. What we do get is many nominating a few titles and this is like a slate albeit a virtual one.

    If this counter hypothesis is true (and remember we made it last year) then we would predict that any measure to counter slate voting would be ineffective as it would also serve to counter normal voting (as, as mentioned above, this too is unwittingly a slate). EPH (to be teleological which is bad science but you’ll hopefully follow the drift) will not be able to distinguish between the two slates: remember, the purpose of EPH is to impede the puppy slate. Being unable to distinguish between the two it will have little effect on just one compared to another.

    As we have seen from the main post above, EPH is not as effective as was hoped.

    This means that our original concerns (see links given in earlier discussion) may have validity. (Note ‘may’ not ‘will’.) We had a concern and this led to a prediction that came to pass.

    Can this be tested? As with all good science, yes it can. (We need to test lest we too make a type II error: it could be that our prediction came to pass because of some unrelated factor we have not considered.) A multivariate analysis on the similarities of nomination forms can be used to produce a cluster graph. _If_ we are right then this should show two clusters amid a more random sea.
    _If_ we are wrong then there will only be one cluster and that will represent the puppies’ forms.

    And if we are right then things would not bode well for any revised EHP+.

    Further, in addition to all the above, it would be preferable if we could test EPH not just on 2015 voting but also on a previous non-Puppy year as a control. (Being old fashioned scientist types we feel happier with controls for comparison.) However it is likely that the raw nominating data for past Puppy free years has been lost and no longer available. “C’est la vie” as you English speakers say.

    Finally, Hampus, please do not mistake that raising concerns is any lack of respect for the not inconsiderable work Bruce Schneier and Jameson Quinn have undertaken. _All_ work has its limitations and limitations need to be explored if error is to be ascertained.

    Hoping that this suffices as an explanation (as originally had hoped the initial contribution to this thread would suffice and don’t want this to drag on further).

  22. I’m quite sure that some of the information that will be made public in the report to the Business Meeting the correlation of nominations between Puppy ballots versus between non-Puppy ballots. I would be very shocked if this wasn’t one of the things they specifically looked at.

  23. @SF2 Concatenation: As I recall, the studies run on sample data concentrated on mimicking Novel, as did the early treatment of the 1984 data the EPH team got hold of. EPH was not well-studied for scenarios modeled on categories where there’s a combination of fewer nominators and wider spread of candidates, such as the short fiction, related work and fan awards. People who did even cursory top-line analyses of the impact of EPH on those categories (me) or more in-depth toy models (Greg? Kyra?) had real doubts EPH would live up to its original goal of granting “booster slate” nominators a voice proportional to their percentage representation. The cause seems to be the opposite of non-slate noms naturally clustering into slate-like patters. Rather, it’s a consequence of just how long the tails get.

    Looking at the table of results from the Quinn-Schneier paper, slate voting under the present system achieves a ~5x magnification of slate power, EPH drops it to ~3X and EPH+ to 2-2.5x. Personally I consider that a clear failure to achieve the original goal. And I don’t consider that marginal improvement from EPH to EPH+ to be worth the change. Other people choose to judge EPH on whether it keeps a single slate from sweeping all categories, and pronounce themselves pleased.

    Now, your concerns about competing slates are well taken. Last year, the RP and SP slates had heavy overlap. But we could imagine a year in which, say, two competing slates of about the size of the RP group – 200-300 folks – are in play. It strikes me as obvious that between them, the two slates would lock up every spot on the shortlst: EPH eliminates a candidate or two within each slate, but there’s no elimination across slates. So The Future Is 1488 slate gets 3 nominees and #TallGenocide gets 2, or vice versa.

  24. SF2 Concatenation:

    Ok, lets take it from the beginning:

    EPH is an algorithm created to weaken the power of clusters of people who vote as a group on the same items. Read that sentence carefully.

    This means that if you have 100 persons who nominate the exact five items in one category, i will not be affected, even if I vote for one of the same items. Why? Because I am not part of the cluster. I am not part of the group that voted exactly the same for all nominees, I voted only for one item. So the cluster of people have their nominations weakened, but mine is not.

    Ok with that?

    So, lets continue.

    We have group 1, slaters. 10 persons. All of them vote for the following items: A, B, C, D, E. A clear cluster of common voting, so their voting power is lessened.

    Then we have group 2. They vote for:
    A,Y,G,C,I
    R,E,P,U,A
    Z,A,T,Q,M
    …and so on.

    The only thing in common is that all voters in group 2 nominate A. They have nothing else in common. They are not a cluster that is block voting. So even if work A gets 10 nominations in group 1 and 10 nominations in group 2, the voting powet in group 1 is lessened by the algorithm and the voting power of group 2 is not.

    The algorithm is not about how many nominations a candidate got. But about if those who nominated the candidate did it as a group that were voting for exactly the same candidates.

    So that some candidates got more nominations than others is not a problem which have been proven with real data from two worldcons. As long as the nomination didn’t come from a group that blockvoted, the votes aren’t affected.

  25. Also, the idea is diversity. That a small group should not be able to lock up the ballot. Will a few votes for ordinary voters be affected? Most likely. If so, it will only affect the least popular candidate in that category, meaning that in worst case they would only get one of their candidates on the ballot insted of more.

    Is see that as a feature.

  26. As Hampus said, and I said earlier, many of us want to break up unintentional slates too. It helps prevent “Best Episode of Doctor Who” situations.

  27. Doctor Who is a special case. I agree that breaking up that concentration is not a problem. If there is unintentional concentration of votes in other categories, though, it would seem to be a different kind of phenomenon – not a bloc of fans with similar interests (beyond the general community of interest that all Worldcon members tend to have), but convergence on (perceived) excellence. If a lot of people, non-conspiratorially, agree that a certain group of works are good, that is evidence that they are good. If EPH penalises this, I see it as a problem.

    Last year I raised the question whether EPH was intended to change results in the absence of slates, and Jameson assured me that it was not, and it had been mathematically proved that it did not. (I am not sure whether this would apply to BDP Short Form, which of course did not exist in 1983, against whose figures EPH was tested: but as I say, I don’t see this as a problem.) If it turns out that the normal concentration of votes is actually similar to that produced by slates, and therefore EPH would work against such concentrations, then either this is a problem for EPH, or it means that the wrong works have been getting nominated for the Hugos throughout their existence, which I see as a problematic conclusion.

    I think it most unlikely that this is true of most categories, for the reasons Jim gave. I think it might be true in one category: BDP Long Form. There it’s very likely that there will be a small group of works that stand out and so attract a lot of votes. In the File 770 straw poll this year, three works – The Martian, Fury Road and The Force Awakens – were way ahead of the field (together with Jessica Jones, which didn’t have the same success in the wider community), and the actual ballot confirmed their position. Lots of people must have voted for all three, not because they represent the interests of a particular group of fans, but because they are the outstanding works of the year. It would be very odd if this counted against them.

  28. If something is popular generally, then it’s not a small minority dominating the majority. EPH tries to make everyone’s choices more proportional. It reduces the effects of a small group of people all choosing the same works. If a large number of people converge on a small number of works, then those will make the ballot with EPH.

  29. @Jim Henley “Personally I consider that a clear failure to achieve the original goal. And I don’t consider that marginal improvement from EPH to EPH+ to be worth the change.”

    Apologies. Missed yours above. Yes concur with your view quoted above.

    However in addition to Puppy slates (of various flavour) our concern is also that genuine Hugo voters (the regulars etc) in homing in on excellence (whatever that may be) are in effect (unconsciously) creating a virtual slate against which EPH may have an effect.

    Would in an ideal world like to see EPH not just tested on 2015 real Hugo data but on an earlier pre-puppy year or two (as a control) to see if EPH affected outcome.

  30. Hampus “Ok, lets take it from the beginning…”

    All very well and good, but we are raising a _different_ point.

    Have had two goes now in trying to explain it to you (once in some detail).

    Don’t know where you are based but if it is in Europe then a few of us will be at the next Eurocon in Barcelona. We can discuss over a beer.

    Banging on about it here will just bore others.
    .

    Andrew M. “Last year I raised the question whether EPH was intended to change results in the absence of slates, and Jameson assured me that it was not, and it had been mathematically proved that it did not.”

    That may well have been the intention but “last year” it could not have “been mathematically proved that it did not” if the testing on _real_ nominating data had taken place only _this_ year. (See main post above.)

    Also, not withstanding this, as mentioned in our earlier contributions to above there has been no testing on pre-puppy year real-life Hugo nominating data as a control to see what effect EPH would have had on normal nominations. No testing on real data with controls = no mathematical proof. Sorry.

  31. I’d also be really curious to see EPH/EPH+ tested on 2016 data. Do we know if or when that might happen?

    I know we can’t count on that large of a turnout every year, but I’d like to know how much it would have helped if EPH had already been in place. To get around twice as many nominators and have that pretty much negated by tighter slate discipline was very discouraging. If honest nominators feel that they’re just wasting their time, we’ll definitely get less nomination ballots.

  32. SF2 Concatenation: That may well have been the intention but “last year” it could not have “been mathematically proved that it did not” if the testing on _real_ nominating data had taken place only _this_ year.

    Also, not withstanding this, as mentioned in our earlier contributions to above there has been no testing on pre-puppy year real-life Hugo nominating data as a control to see what effect EPH would have had on normal nominations. No testing on real data with controls = no mathematical proof. Sorry.

    Apparently you missed the testing which was done last year on the real-life 1983 nomination data.

    I know that there were many hundreds of posts made last year on the subject of Hugo nomination algorithms, but it’s not cool to come into the middle of the discussion and throw around absolutist statements when you don’t know all the backstory.

  33. Just to add to the above replies.
    EPH *proportionalizes* rather than punishes. It really doesn’t penalize a slate it just ensures that a slate or any kind of ‘organic’ groupings get things on the final ballot in proportion to the size of their vote. It only weakens a slates ability to sweep a category – it doesn’t punish a slate but limits the number of works that a slate gets as finalists to the proportion of the votes it gets. This is important when considering ‘organic’ groupings – they don’t lose out with EPH but they don’t get to sweep a category either. That is what is so elegant about it and why it is cool regardless of puppies, griefers or anything else.

  34. As far as I recall the simulations done based on the 1983 data last year, in about half the runs EPH would change a 5th place nomination with a different work, so its effect in a non-puppy year was said to be “minimal”.

  35. Would in an ideal world like to see EPH not just tested on 2015 real Hugo data but on an earlier pre-puppy year or two (as a control) to see if EPH affected outcome.

    We know it has been done for 2014 but the results were not reported in the paper. Quinn said in the other thread that what he called “organic” (that is non-puppy) voters had their voting power reduced by 5-10 % under EPH. Then he said he’d actually have to look up the exact number, and couldn’t tell us what it is because of his NDA.

  36. it could not have “been mathematically proved that it did not” if the testing on _real_ nominating data had taken place only _this_ year.

    A 2014 Administrator said at the August 2015 Business Meeting that he had tested it himself on 2014 data and “there was a change.” He then gave the 2014 data to the two researchers. But we are still waiting for any further information.

  37. Camestros Felapton on May 21, 2016 at 4:44 am said:
    Just to add to the above replies.
    EPH *proportionalizes* rather than punishes. It really doesn’t penalize a slate it just ensures that a slate or any kind of ‘organic’ groupings get things on the final ballot in proportion to the size of their vote. It only weakens a slates ability to sweep a category – it doesn’t punish a slate but limits the number of works that a slate gets as finalists to the proportion of the votes it gets. This is important when considering ‘organic’ groupings – they don’t lose out with EPH but they don’t get to sweep a category either. That is what is so elegant about it and why it is cool regardless of puppies, griefers or anything else.

    (emphasis mine)
    This is why I really like it. It’s not just good for the current situation. It makes the final ballot more reflective of overall preferences regardless of whether the clumping is organic or slate driven. It would change a pre- or post-puppy year for the better too.

  38. If I’m catching SF2 Concatenation’s drift, they want to say that the goal of the Hugos is for fans to come together organically to choose the most excellent work.

    The more successful the fans are, the more the pursuit of the goal of the Hugos looks like a slate and the more EPH works against the very behavior that the Hugos exist to reward.

    I honestly don’t see why none of you agree with this.

  39. @Camestros:

    It really doesn’t penalize a slate it just ensures that a slate or any kind of ‘organic’ groupings get things on the final ballot in proportion to the size of their vote.

    Of interest. Also. 😉

  40. @Brian Z:

    If I’m catching SF2 Concatenation’s drift, they want to say that the goal of the Hugos is for fans to come together organically to choose the most excellent work.

    The more successful the fans are, the more the pursuit of the goal of the Hugos looks like a slate and .

    I honestly don’t see why none of you agree with this.

    This is not a crazy criticism so I’m responding.

    1. There’s a difference between “com[ing] together organically to choose the most excellent work” throughout the entire nominating and balloting process and cohering at the nomination stage. SF Concatenation 2, who hopefully is a real collective speaking institutionally for the website and not one person referring to themselves in the Royal Plural, may wish for the latter, but I don’t know that most people do.

    2. I didn’t grant the argument that “the more the pursuit of the goal of the Hugos looks like a slate and the more EPH works against the very behavior that the Hugos exist to reward” at the time for a couple reasons: I hadn’t pushed my top-line analysis far enough to verify that because I didn’t feel the need given the other obvious problems; and I hadn’t seen it verified by the people who did more in-depth toy models or the actual analysis of the real-world ballots. Jameson Quinn has now said that this factor does appear to reduce the strength of individual organic noms by 5-10%, though I don’t know that we’re completely clear on whether this is a factor in EPH’s failure to really “ensure[] that a slate or any kind of ‘organic’ groupings get things on the final ballot in proportion to the size of their vote” but it’s a clear potential point of failure, based on Jameson’s report.

    Ironically then, it would suggest that EPH fails at both ends: in a very long-tail category like short story, EPH fails to ensure proportionality because the organic votes are spread out. In a more concentrated field like novel, it fails to ensure proportionality because the organic votes are concentrated.

    It is not unfair to say that this is indeed a problem.

    Personally, I find myself wanting to hear more about Highlander Single Elimination.

  41. Brian Z on May 21, 2016 at 6:08 am said:
    The more successful the fans are, the more the pursuit of the goal of the Hugos looks like a slate and the more EPH works against the very behavior that the Hugos exist to reward.

    I don’t want any small group to dominate the ballot. Not a puppy slate. Or a small organic clump. If something is popular across fandom, then the little clumps will overlap on those and they will push through to the ballot with EPH.

  42. SF Concatenation:

    You know, I have never been to Spain. Maybe I’ll go to Eurocon at that. I know how to order beer in spanish after all.

  43. SF Concatenation: A slate is not a mere list, it is accompanied by a call for action and is voted in response that call.

    It is nonsense to label as a “virtual slate” the convergence of choices that voters have made independently.

  44. Mike,

    I think SF2 Concatenation is actually saying that’s the problem. EPH can’t tell the difference between an organic clump and a slate, but it’s decreasing the power of both.

  45. That may well have been the intention but “last year” it could not have “been mathematically proved that it did not” if the testing on _real_ nominating data had taken place only _this_ year. (See main post above.)

    As JJ et al say, it was tested on real data, those for 1983. However, I agree entirely that it could not have been mathematically proved for any distribution not involving a slate, and it is quite easy to think of non-slate distributions for which it would not be true (quite possibly including the actual distribution in BDP Short Form). I mention this only as evidence for the intention; the aim was not to remedy a problem that the Hugos have always had, or to penalise the kind of convergence that always exists.

  46. EPH was never going to target slates only. It can’t actually tell the difference between honest organic clumps and a slate bloc. However, I think a disciplined slate will always converge more than an organic group and their power will be lessened more than the honest votes. At the same time, if a small group naturally converges on some works, I don’t want their choices overwhelming everything either. If it’s generally popular enough, EPH will help it by getting more points.

  47. Jim Henley on May 21, 2016 at 6:59 am said:

    @Camestros:

    It really doesn’t penalize a slate it just ensures that a slate or any kind of ‘organic’ groupings get things on the final ballot in proportion to the size of their vote.

    Of interest. Also. ?

    Noted 🙄

  48. Pingback: AMAZING NEWS FROM FANDOM: 5/22/16 - Amazing Stories

Comments are closed.