E Pluribus Hugo Tested With Anonymized 2015 Data

By Jameson Quinn: [Originally left as a comment.] So, Bruce Schneier and I are working on an academic paper about the E Pluribus Hugo (EPH) proposed voting system. We’ve been given a data set of anonymized votes from 2015. I don’t want to give all the results away but here are a few, now that people are actually voting for this year’s Hugos:

  • A typical category had around 300 ballots which voted for more puppies than non-puppies, and about half of those ballots were for puppies exclusively. There were few ballots which voted for half or fewer puppies (typically only a few dozen). The average number of works per ballot per category was around 3.
  • There were some weak correlations among non-puppies, but nothing that remotely rivals the puppies’ coherence. In particular, correlations were low enough that even if voting patterns remained basically dispersed, raising the average works per ballot per category from 3 to 4 (33% more votes total) would probably have been as powerful in terms of promoting diverse finalists (that is, not all puppies) as adding over 25% more voters. In other words: if you want things you vote for to be finalists, vote for more things — vote for all the things you think may be worthy.
  • EPH would have resulted in 10 more non-puppy finalists overall; at least 1 non-puppy in each category (before accounting for eligibility and withdrawals).
  • SDV(*) would have resulted in 13 more non-puppy finalists overall.
  • Most other proportional systems would probably have resulted in 13 or 14 more.
  • The above numbers are based on assuming the same ballot set; that is, that voters would not have reacted to the different voting system by strategizing. If strategizing is not used unless it is likely to be rational, that is a pretty safe assumption with EPH; less so with other proportional systems. Thus, other systems could in theory actually lead to fewer non-puppy nominees / less diversity than EPH.

Feel free to promote this to a front page post if you want. Disclaimer: EPH is not intended to shut the puppies out, but merely to help ensure that the diversity of the nominees better reflects the diversity of taste of the voters.

(*) Editor’s note: I believe SDV refers to Single Divisible Vote.

Update 02/08/2016: Added to end of second bullet missing phrase, supplied by author. Corrected footnote, based on author’s comment.


Discover more from File 770

Subscribe to get the latest posts to your email.

407 thoughts on “E Pluribus Hugo Tested With Anonymized 2015 Data

  1. Thanks Kevin. Note that in talking about write-in, I have used that term because it is, almost universally, used as the catch-all backup in election systems to deal with issues in the nomination system. For simplicity, my actual proposal similar to this is to have a large pool of nominees (10-20), and unlike today, to sort them in the order of the number of nominations received. This sort discloses some minor information which traditionally has not been revealed, but it has a powerful benefit — it allows each voter to decide for themselves how many nominees they wish to give consideration to. Voters might decide to do the top 5 (ie. what they have today) or “the top 5 that did not in my estimation come from slates” (what most non-slate voters probably prefer) or anything else. It is up to the voter.

    I derive this from the write-in concept because when doing a write-in, it is fruitless to write in something that is not already popular. A rational voter (who wants to affect the results) will only write in what they think can win. A sorted long-ish list allows that. It makes the response to nomination problems entirely in the hands of the voters in pure direct democracy, and is robust against almost all methods of corrupting the nominations, not just slates, which is why I like it.

    That write-ins require qualifications is quite common, by the way. You can write in your vote for President of the USA, but the candidate you name has to meet the requirements (natural born citizen over 35, etc.) And they can decline the result, too.

  2. @Brad Templeton
    Current: 4-5 slate candidates, sometimes 1 non-slate
    EPH: 3-4 slate candidates, 1-2 non-slate candidates
    EPH 4/6: 3-4 slate candidates, 2-3 non-slate candidates
    MRMR: 4-5 slate candidates, 5 non-slate candidates

    I’m pretty sure that 0 is going to kill your proposal. I could be wrong. But EPH isn’t guaranteed to pass ratification. Not just because it might not fix a problem but because a number of people don’t feel a change is needed. Of course that ignores the fact that the EPH team and you disagree on how 4/6 affects EPH. I’m staying out of that. Full analysis is needed from them before we can have any kind of discussion at that level.

  3. Sorry, Tasha, “0 is going to kill your proposal” — what is 0? Where can I find the prediction of the EPH authors on how it combines with 4/6, and how does it differ? My expectation is that since EPH does make a 5-slate very difficult, only slates-of-4 are practical and so having 4 or 5 nomination slots does not affect the pure slate. (It does affect the many slate voters who want to mix slate choices and non-slate, dropping them to 4 means a slater must choose.) If this latter factor is strong, then EPH 4/6 probably is more stable as 3 slate, 3 non-slate, but if there is major defection, you could get a preferable 2 slate, 4 non-slate in lucky cases.

  4. Any system that reveals the rank order of the finalists is a Bad Idea IMO. I say this because we have had more than one case where the winner turned out to have just barely made it onto the ballot, and I feel certain that if we told people the count of nominations each finalist had up front, there would be a bias toward the finalist with the most nominations, with people saying “I don’t want to throw my vote away.”

  5. @Brad Templeton
    Sorry 5 non-slate items = 0

    The EPH team (Jameson Quinn & Bruce Schneier). 1st page of comments on this post Jameson Quinn has stated that 4/6 weakens EPH:

    The “4” part of “4 and 6” helps not at all when you have EPH. In fact, as I was arguing above, such restrictions actually hurt a bit; it would be better to raise it, not lower it.

    The “6” part would help. But just rerunning the numbers and getting the 6th winner is not really the fair way to evaluate that, because if the limit were 6, slate voters could react strategically to that too (spreading their votes over 6 nominees). So doing a fair analysis of that question is not trivial. My unsubstantiated intuition is that even if the puppies had planned for 6 winners in 2015, if you looked at just the 6th-place finalists, they would end up being about 20-50% puppies. That would mean significantly fewer categories with only one non-puppy, but there would still be a few of those.

    ETA: I had no idea I’d been following this so closely. See what happens when you hang around with people and ask questions.

  6. Brad Templeton: Based on 2015 voting, I think there would be a strong preference for the MRMR result. The voters who all voted No Award over the slate (a majority) would, I believe, been very happy to find the 5 non-slate candidates available as choices, and would have read and voted for them, with some also reading and voting the slates, and they would have given them awards, rather than No Award. Do people disagree with that interpretation?

    I think that you are very, very mistaken about what sort of support it would get. But hey, feel free to propose it at the Business Meeting. I’ll be one of the people getting in line to speak against it, for numerous reasons.

    And I think if MRMR were enacted, people would be extremely unhappy to find that the promised result did not occur, because spreading their votes across the 15 non-slate finalists would result in some, if not all, of the slate finalists still being in the Top 5, and possibly even winning.

    Oh, and who gets to call themselves a Hugo Finalist? The authors whose works made the Top 5 during nominations? Or the authors whose works made the Top 5 in the final voting?

    This methodology would be an epic-level CF.

  7. There may be some conflation of two distinct proposals here. In the one I am calling MRMR, the final ballot is unsorted, and consists of 5 non-slate nominees plus most or all of the slate. Voters (as they did in 2015) would attempt to tell the difference if they wished. If there were no slate winners, the final ballot would be 5 non-slate nominees, just as it was before the puppies.

    The other proposal has a larger number of nominees all the time, but sorts them. I agree that showing the sort order is not ideal, perhaps as Kevin says “a bad idea,” but I suspect its negative effects would be quite minor, and certainly vastly, vastly less than the effect of what we have now, with several legitimate natural nominees pushed off the ballot, which is a “terrible idea.” Likewise — I spell all this out in the blog post, so I presume some may not have read it — both approaches increase the nominee pool, the latter proposal quite seriously, something I am not fond of, but once again, it’s much better than having a nominee pool that is mostly slate entries. The advantages (listed in the blog post) still overwhelm the disadvantages in my view.

    Are these the sort of objections you would have put forward? While I already listed them, I am interested in others.

    Tasha, why would the fact the ballot contains 5 non-slate nominees, (ie. what the ballot would have had if no slates were involved) be sure to kill the proposal? I think that’s kinda the best feature — you get what we had a few years ago, isn’t that what people want?

    Back to the question of how bad ranking is (again, only one proposal has ranking, and it’s not the one I call MRMR) I wonder if there’s any evidence from other elections of what this does. First, I think in many cases, people already have a decent guess of what the ranking was, though it is only a guess. In races where the results are close, the ranking conveys very little information.

    And Kevin, you surely know as well as anybody, it is not possible to “throw your vote away” in STV. (Of course, it is possible for people to falsely believe that even with education about it.) I would actually predict if there is an effect, it’s the exact opposite of what you describe, that people might find themselves ranking the lower candidates higher on their ballot, wanting to “give them a boost” to avoid elimination on the first round. That’s also not super-rational, because if you want them to win, you should rank them high regardless of what you think others will. That’s the reason we use STV — it is one of the most strategy-free voting systems out there.

    Overall, I would be interested in evidence, rather than speculation (yours or mine) on what would happen, but I’m having a hard time imagining it hurting the result more than having a ballot of 5 with 3 puppies and 2 natural nominees, and having the Hugo voters ignore the puppies and then have only 1 or 2 nominees to choose from, as they appear to have done in 2015. EPH seems to make a slate 5-sweep quite unlikely.

    As for 4/6, I still think it helps EPH and 5/6 helps it very slightly more, but that’s not the main issue I’m here to talk about, and I would be happy to see evidence either way on this.

  8. Brad Templeton: There may be some conflation of two distinct proposals here. In the one I am calling MRMR, the final ballot is unsorted, and consists of 5 non-slate nominees plus most or all of the slate.

    Who decides which nominees are “non-slate”? Who decides which nominees are “slate”?

    The deep-seated problems underlying that, and the fact that you are claiming that 4/6 actually helps EPH, tell me that analysis is not your strong suit.

    If you’re serious about convincing a bunch of geeks — many of whom are experts at data analysis — that your proposal has merit, you’re going to have to provide more convincing arguments — and a more convincing algorithm solution — than you have.

  9. @JJ, I recommend you read the proposal where this is all outlined. That is the algorithmic part. EPH effectively (though not explicitly) tempts to determine slate from non-slate, by creating a system which de-weights slate nominations when they exist. A simple detection algorithm is probably easier.

    And as for “analysis is not your strong suit” — if that is your debate style, our discussion is at an end. 4/6 has both positive and negative effects on EPH, we await more data to see which is stronger.

  10. @Brad, I couldn’t find any algorithm in your linked post. Just talk of creating one from examining the nomination ballot data. Since that data won’t be made available, the idea seems moot at this point.

    From my examination of the ’84 data I suspect natural vote clustering is going to create a lot of false positives for a “simple detection algorithm”, such that most categories would be well into the double digits of nominees.

  11. @Brad Templeton Tasha, why would the fact the ballot contains 5 non-slate nominees, (ie. what the ballot would have had if no slates were involved) be sure to kill the proposal? I think that’s kinda the best feature — you get what we had a few years ago, isn’t that what people want?

    It’s like you’ve Not read a single one of my comments here explaining things.

    What part of Limit not eliminate slates is hard for you to understand?

    They tried to explain it to you on ML. I’ve tried to explain it in multiple comments here. We’ve used clear declarative statements. None of us has been subtle. Slates get a place at the table it’s what WE want. The fact that EPH does let slates still get slots is a feature not a bug. The only question is are they getting a Proportional place.

    Your working off incorrect assumptions because you keep dismissing what people are telling you. You don’t believe us because you saw our outrage at SP/RP and you decided what we must want. Try reading what people are saying instead of deciding we mean something different

    ETA: Notice how few people are disagreeing with my statements? That should give you a heads up. People here are perfectly happy to jump all over me when I’m wrong/mistaking their opinions/thoughts.

  12. Errhead — a proper algorithm should be planned with real world data. Data from old years is probably fine since you can model various slate strategies, though it doesn’t hurt to get the recent data. Access in confidence makes sense. EPH should have been planned with real slate data — there was an android simulator written which made the error I talked about above — presuming a slate is a bloc of lockstep ballots — and thus made incorrect predictions portraying the results much better than they are.

    My instincts, however, suggest an algorithm can be found. Indeed, if there is an algorithm to limit slates (like EPH) it strongly suggests there is an even better algorithm to detect slates. For example, one algorithm would be “Any work which under the EPH (or similar systems) gets X% fewer points than it has nominations.” In a slate of 3 with 80% cohesion, the three slate entries would average around 47 EPH points per 100 nominations, and I would guess — but need data — that the natural nominees would have much better.

    Tasha — indeed we should end our discussion. You think I’m not reading your words, and I know you’re not reading mine, because if you had read the post you would see that I describe one of the advantages of the system is that it lets all the slate candidates get on the ballot (along with the non-slates) so nobody is excluded and the fans decide. The system was specifically designed to meet the goal you imagine I didn’t read. But if you’re going to get into yelling and accusations about character rather than actually discussing facts and ideas, we’re done.

  13. Brad Templeton: if you’re going to get into yelling and accusations about character rather than actually discussing facts and ideas

    Tasha has neither “yelled at you” (the emphasis is because you don’t seem to be understanding the key aspects of what people are saying), nor has she made accusations about your character. Perhaps you’ve been reading too many of The Phantom’s posts and think that his technique is a successful one. It’s not, and it doesn’t reflect well on you.

    The reason you got a lot of pushback at Making Light — and why you’re getting a lot of pushback here — is that a lot of people think that what you’re suggesting is a really bad idea. Posting it again and again and browbeating people here is not likely to persuade those people that it is not a really bad idea.

    Write it up formally. Solicit a bunch of MAC II attendees who are willing to put their name on it as sponsors. (I think that you’ll find that part rather difficult.) Present it at the Business Meeting.

    Or, you could recognize that the fact that you’re not getting any traction here means that you’re not likely to get much traction anywhere else, either (the Puppies certainly won’t be supporting your proposal).

    And you’ve still never answered my question: Who gets to call themselves a Hugo Finalist? The authors whose works made the Top 5 during nominations? Or the authors whose works made the Top 5 in the final voting?

  14. there was an android simulator written which made the error I talked about above — presuming a slate is a bloc of lockstep ballots — and thus made incorrect predictions portraying the results much better than they are.

    Yeah, I wrote it and a web app, and agree one hundred percent. The ’84 data was intended as a place holder for the actual data so the apps could be written and debugged and ready to go when the anonymized data was released. The tie breaking problem with the artificially injected slates, as well as the dirtiness of the data make any results very suspect and only generally informative. There are probably some ways to make it slightly more effective, but when it became clear the real data wouldn’t be released soon I stopped wasting time with development.

    In the ’84 data the nominees ranged between 41-97 EPH points per hundred ballots.

  15. So to be clear, you are saying that given the choice between these 3 results below, you feel that only a small minority like myself would have preferred to see the MRMR ballot, and most would have preferred the EPH result or the Approval result

    Some estimates for best Novella from 2015.

    Approval: (Real ballot, resulting in No Award)
    Big Boys Don’t Cry Tom Kratman
    Flow Arlan Andrews Sr
    One Bright Star to Guide Them John C. Wright
    Pale Realms of Shade John C. Wright
    The Plural of Helen of Troy John C. Wright

    EPH: (estimate)
    Big Boys Don’t Cry Tom Kratman
    Flow Arlan Andrews Sr
    One Bright Star to Guide Them John C. Wright
    The Regular Ken Liu
    The Slow Regard of Silent Things Patrick Rothfuss

    EPH 4/6 (estimate)
    Big Boys Don’t Cry Tom Kratman
    Flow Arlan Andrews Sr
    One Bright Star to Guide Them John C. Wright
    The Regular Ken Liu
    The Slow Regard of Silent Things Patrick Rothfuss
    Yesterday’s Kin Nancy Kress

    MRMR:
    Big Boys Don’t Cry Tom Kratman
    Flow Arlan Andrews Sr
    Grand Jete (The Great Leap) Rachel Swirsky
    One Bright Star to Guide Them John C. Wright
    Pale Realms of Shade John C. Wright
    The Mothers of Voorhisville Mary Rickert
    The Plural of Helen of Troy John C. Wright
    The Regular Ken Liu
    The Slow Regard of Silent Things Patrick Rothfuss
    Yesterday’s Kin Nancy Kress

    If indeed a considerable majority actually feels the EPH result is the better result here in terms of fairness to nominator’s wishes and ability for fans to pick their winner, then indeed I am barking up the wrong tree. But the actual voting data tells me otherwise.

  16. errhead, thanks for that data. I would be interested in the distribution of EPH points per nomination for all the works from real data. Were there many works down in the 40s per 100 or were those outliers? I know various models suggested EPH rarely affected natural nominees but a score of 41/100 is a very strong effect. I still have optimism that if it is possible to de-weight with any success it is possible to detect with greater success, but it remains an open question just how good you can be. (Both against real slates and theoretical slates and idealized slates.) A couple of people here are more tolerant of false negatives than I am. However, even if you miss some, every one you do detect makes room for what is probably a natural nominee. The balance of false positives and false negs is always the issue with such algorithms.

    Idealized slates will never occur, because even if you had your followers hand over all their voting PINs so you could craft the slate completely, you would not output a solid bloc. It is too easy to spot, and as seen in your models falls down to multi-elimination. A smart slater would make each candidate get a different total as well as they could, though you can’t be perfect as others are nominating outside your control.

    A slater with all the PINs also has other strategies available which makes it very difficult, even impossible, to make algorithms that de-weight or detect them, though human eyes might detect them. Fortunately there is no sign yet of a slate movement that dedicated to the cause as to vote exactly as they are told.

  17. Makes sense. Certain categories have a lot of natural diversity, with short story often being the most diverse, and semiprozine has been known as the least diverse (and had the same winner for ages, and many of the same nominees.) So it makes sense a lot of those candidates are showing up on the same ballot. It does mean that (at least in that era) a semiprozine facing EPH would be wise to tell its fans, “Please nominate us if you love us, and please avoid nominating any the usual suspects if you love us extra” and it could in theory mean getting on with half the supporters. But you would be glared at, for sure. Fans, on the other hand, might conclude it on their own.

    There are times when several of the categories had “usual suspects” though the fiction, DP, BRW, Campbell do not have those. There are algorithms that detect clustering which could certainly do slate detection work, but it’s not out of the question that the semiprozine clustering, especially in the past, might have shared many parameters with slate clustering.

    Even though the fiction awards don’t have “usual suspects” when it comes to works, they do with authors, so we see a more clustering on them than you might expect. Still, if a 3-slate expects a score of 40 and a 4-slate a score in the range of 30-35, they probably look quite distinct, particularly if measured in sigmas from the mean.

    Thanks a lot for calculating that data.

    It would be interesting to calculate results of 4/6 but of course nomination ballots are not ordered, so you don’t know what to do if you have a ballot of 5 and need to remove one.

  18. Brad Templeton: If indeed a considerable majority actually feels the EPH result is the better result here in terms of fairness to nominator’s wishes and ability for fans to pick their winner, then indeed I am barking up the wrong tree. But the actual voting data tells me otherwise.

    Brad, you have invented a bunch of hypothetical results which support your claims, and expect that people should judge the algorithms by what you have invented.

    *headdesk*

  19. I think it would be worthwhile to generate plausible simulated 4/6 ballots using the actual nomination data and various algorithms, and see what range of results pop out. (For example, for non-slate ballots, you could drop the last item; you could drop one item at random; you could drop the least popular item, measured globally over all ballots; you could weight the likelihood of dropping an item by global popularity. For slate ballots, you could simulate different slating strategies, using a four-item slate, a five-item slate, or even a six-item slate, using various criteria to decide the probability of SlatePitch6 getting added to the ballots, or the probability of a slate vs. a non-slate item being dropped from mixed ballots.)

    These results would have very, very limited validity, of course, but I think it would be better than having no results at all.

  20. Steve — you could indeed get valid results using stochastic methods, which is to say run huge numbers of random scenarios (and the specific ones if you like) and then you can learn a range of results, as in, “What’s the worst this does” and “what’s the most this does?” If the worst, or something not far from it, is reasonable, then you know something. If it ranges all over, we don’t learn a lot.

    To model the slates, you can do the same, but since Sad puppies was in theory, were built from a limited survey of their supporters, removing the least popular sad puppy is not a bad model. Rabid is different though, but it’s still not a bad model because if presented with a slate of 5, and only 4 slots, slate fans would drop their least favourite. However, a clever slate-master, knowing there are only 4 slots, will probably do a slate of only 4, or at most list 5th choices as “alternates.”

    Based on the stat in the OP, the average ballot had 3 nominees, and that’s including puppies who probably had a higher average. So the first question is simple — how many non slate ballots had how many full-5 nomination entries? These are the only ones affected by 4/6 (not counting the likely recommendation that would spread in fandom to “make sure you use all of them.”) 4/6’s negative effect under EPH is complex. With fewer slots, some natural nominees lose support, but they also lose fewer points under EPH rules, because they are less likely to bear multiple winners if they nominate fewer. 4/6 has its purely positive result independent of EPH — adding another entry that is probably not slated.

  21. Speaking for myself, getting the time to read is very difficult; thus more nominees come voting time is not a good feature for any new proposals by my lights.

    Hopefully I did not misread the discussion about the larger nominating list.

  22. Shambles, indeed, the expectation is that you probably would not read more than usual. You would judge for yourself if some works made the ballot through collusion or unfair play, and I suspect most fans would feel no obligation to read or rank them; I certainly would not. Some fans would decide to read them, but that’s entirely up to them.

    I disagree with John Scalzi when he wrote that the right response was to read them and rank them. While I understand the source of that view and respect it, I see a parallel between that and the Hong Kong elections, where Beijing says to the people of HK, “Hey, it’s democracy. Just evaluate and vote for any of the candidates that Beijing has approved.” I side with those in Occupy Central who did not find that satisfactory.

    The only hard part is deciding what to do about works you know were on slates but which didn’t ask to be and other clues tell you they have merit. Again that’s subjective. For example, File 770 is on the Rabid Puppy list at present, but anybody who knows its history will not believe it made the ballot only because of that, if it does.

  23. Thanks for the clarification Brad. Hmm, it’s a different approach but I don’t know that I want a long list with slate vs non-slate which I assume would be done through some clustering analysis for discrimination but it feels troubling to me at a visceral level.

    Food for thought though.

  24. You don’t see the analysis (though one could do a variant where you do.) If you see 9 nominees, you will strongly suspect that slating caused the expansion, and you would judge for yourself what took place. The goal (which not all share) is to give you the chance to vote for all the natural nominees, rather than just 1 or 2 of them, and you can ignore the slates, read nothing but the slates or any mixture you like. My goal is to leave that decision to the member. In 2015 with the slates we had no opportunity to vote for any of the natural nominees in many categories, and a majority decided that No Award was their only viable choice. I contend (though others seem to disagree) that those voters would have loved the opportunity to cast their ballot for an excellent natural nominee instead of for No Award, even if the only way to be fair to all was to expand the ballot.

  25. Brad, what if there are three competing slates? Does that mean that there will be 20 candidates in each field? That strikes me as… untidy. Also unworkable from an “I must read all of these!” position.

  26. Indeed, that would be bad, especially if you feel you must read all of them. However, we’re much worse off in this situation because with 3 strong unrelated slates, almost all approaches give most or all of the ballot to it, and you get 2015 with few to no natural nominees, and No Award the remaining choice. In this situation, you end up needing one of the de-weighting algorithms like EPH. When EPH is working, a slate needs X members (where X is little less than the minimum number of nominations needed to make the ballot naturally) to get one slot, but almost 2X members to get 2 slots. Puppies appear to have 3X to 4X in some categories.

    However, under approval, it’s bad news if there are multiple slates. Strictly, under approval, a slate with X’ (the number of nominations for the top nominee) wins all 5 slots if cohesive, and 2 slates like that win 10 slots.

    There is another solution, but it doesn’t have the sense of fairness to the slates that some folks want. That is to do MRMR where works that meet the slate test do not get added to the ballot as it is expanded. In that case, you get a maximum of 10 nominees — the top 5 slate nominees from any slate, plus the top 5 natural nominees.

    Note however, as I describe in the blog post, that all algorithmic tests are defeated by multiple slates, or one large slate acting like multiple slates. If you have 3X with EPH, you can get your 3 if you have complete control by casting 3 equal sets of single nominations. (They beat the naturals because the naturals have some clustering on their own.) Fortunately, current slates are not nearly this coordinated, nor are they likely to be.

    Note that against MRMR this also works, but is just vindictive. Algorithms will not spot a slate done by splitting into groups, and so the nominee pool would not be expanded, giving the slate candidates only 2 natural competitors. Of course 2015 suggests that even a single natural competitor means a loss for the slates, but it’s a poor victory for the truefen, as they did not really get to have a real competition.

    BTW, all of this is why algorithmic approaches are definitely my 2nd choice, but there are enough people online who go into flamewar mode at the suggestion of judgment based approaches or fan based approaches that I am not discussing them much here. But because they are not responses to slates, but rather to any attack, they are the superior choice. Algorithms will be adapted to by those trying to corrupt the system, unfortunately.

  27. Due to a family emergency – my mother’s health took a turn for the worse – I wasn’t at Sasquan. Due to it continuing to get worse I didn’t slog through the video of the Business Meeting. So I had no awareness of any of this until today. It’s important enough that I’ve dropped everything to go through all the comments, taken notes, and am putting my head in the noose.

    To begin with I’m utterly appalled that the information was given out in such a blatantly biased fashion. Whether or not various people said it would be difficult to anonymize the data or impossible to do so, the data was handed over for analysis. The the NDA was broken and the whole thing went public. I may be wrong but there also appears to be an assumption that all right people agree with EPH. Which simply isn’t true. There are even long-standing members of WSFS who don’t agree with it. But once you’ve put something into writing and out on the internet life gets interesting. At least I assume it does.

    Meanwhile the Puppies, part of whose complaint is that people are locked out, have even more proof than just the EPH. From a totally pragmatic viewpoint proving them right isn’t the brightest idea anyone’s ever had. Then:

    Ed Green on February 8, 2016 at 5:24 pm said:
    So, two responsible adults violated the NDA. Now what do the Administrators do?

    Dave McCarty on February 8, 2016 at 5:29 pm said:
    The administrators accept their apology and continue to work with them on getting results useful to the WSFS business meeting at MidAmeriCon II.

    Dave McCarty – Hugo Administrator
    MidAmeriCon II

    Let me get this straight: the people who do have the data violate the NDA, apologize for doing so, and that’s that? No taking them off the job? No finding neutral parties? You don’t need to know about WSFS or Worldcons, an outline of the current voting system is useful, but after that data is data. A statistician has volunteered – and been slapped down. I can think of one who owes me a favor and can probably pull in more. And if the point is to get “results useful to the WSFS business meeting” there’s plenty of time to get them out well before.

    As for violating the NDA being a statistician, for example, doesn’t get someone off the hook. I was raised by them, proofed data, reviewed papers being reviewed for journals – an outside view is useful – write damn good surveys, etc. And certainly handled or had people handle proprietary data in one of my own fields. Best intentions or not there is no excuse for breaking an NDA. Employee or volunteer, that’s that.

    If the data was handed out it should have gone to neutral people; they aren’t that hard to find; the NDA was broken; and July is soon enough to get it out and hashed over before the business meeting. So why not switch to the way it should have been done, assuming the data was handed out in the first place? I trust the Hugo Admins to be unbiased when counting votes, always have, always will. I just don’t trust their judgment in other things anymore.

    But done is done. So now what?

    First, contact someone on the ‘other’ side. Given the circumstances in should be someone handling SP4. People may feel more secure going through the person who audited Hugo results for three years to prove to Puppies that yes, the admins are honest. Seriously, not all of them hate ‘us’ and eat panels for breakfast. The loudest people get heard.

    Second, admit that you/we screwed up. Since it’s true it’s not a difficult admission to make.

    Third, discuss how to handle it. My suggestion is handing the whole kit’n’kaboodle to someone neutral. At the same time “You’ve seen the data, we get to see it” is entirely justifiable. As I said, discuss how to handle it.

    Fourth, quit insulting people as ‘puppy lovers” or “token puppies”, yelling or all the other useful things. That is, go after me if you want – I started by this by saying I’d had enough and was putting my head in the noose – but I don’t have time to deal with personal attacks anyway. And enough people know how much I care about the Hugos, as well as fandom-at-large and the Worldcon community. Those who don’t, consider the fact that I just kinda gulped and took on an entire thread.

    In all fairness I do have to warn you: there’s still a great deal left over from Mom’s death to get done so I won’t be checking this more than once or twice a day. Then Thursday we leave for Boskone. We’re not staying at the main hotel but I’ll see about lugging my laptop over to our Dealer’s table. Obviously normally I wouldn’t take on an entire group when I can’t stay on top of things; if nothing else it’s rude. But someone does have to take on how this started.

  28. Sorry, did the two who did this analysis (JQ and BS) violate the NDA and release the raw data to the public?

    I know that anonymization is quite difficult. With the totals released, even if you changed the names of all the works to “Work 1”, “work 2” etc. everybody would know which they were. However, works that didn’t get 5% could be given aliases safely, and most de-anonymization attacks (yes, that’s the world we use here, it’s not meant to be provocative) like to find unique patterns. For example, if you find a ballot with 4 regular nominations and the only nomination for an obscure work, you might trace it back to the author of that obscure work in some cases. But that’s much less likely for works that appeared on more than 5% of ballots. So this was one I thought could be pulled off. Another common de-anonymization attack occurs if you give out demographic info, like what state or country the ballot came from, but that’s not needed here (interesting, though it may be.)

  29. Elspeth Kovar: If the data was handed out it should have gone to neutral people

    It did. While Bruce Schneier does have some connection with going to Worldcons in the past, he’s pretty much an impartial observer here, and is in this in a professional capacity. Jameson Quinn has never been involved with Worldcon or fanac before and has no loyalties to any “side” whatsoever; he’s in this in a professional capacity.

    The NDA wasn’t broken, no nomination data has been released; what was broken by one person was the agreement not to discuss any of the findings before the official paper was released.

  30. @Elspeth Kovar

    1. So far as I’m aware one of the people who received the data isn’t pro- or anti- EPH, and so is as close to neutral as we’re likely to get.

    2. Getting shooed away by some people here (when making claims to expertise that may or may not be true, but can’t be verified, and with a history of being obnoxious to other commenters here) is hardly the same as being shooed away by the Hugo Administrators organising the testing.

    3. I’ve noticed that walking into somewhere acting as if I’m about to be verbally shredded tends to annoy people. That’s pretty counter-productive if my aim is to discuss the thing rather than provoke people into wanting to verbally shred me. Something to bear in mind in future.

    4. “Whole thing went public”? Not really. The data is still only in the hands of the Administrators and those testing EPH under NDA. We know that the data is being tested, and we have a small amount of information about the preliminary conclusions, but that’s not the “whole thing” by any means.

    5. If you know people who may be useful in assisting in testing the data, by all means write to the Administrators and make a case for including them in the testing. The worst that can happen is they say no, they would rather keep the number of people with copies of the data at a minimum.

    6. No, no-one who leads the Puppies or handles SP4 should get the data. They (all of the leaders of every iteration of the Puppies) have more than proved their animosity towards the WSFS, the Hugo Administrators, and the majority of Hugo voters. Giving them data that isn’t fully anonymised is not an acceptable solution.

    7. The people who inadvertantly broke the NDA have apologised. So, they already admitted they screwed up.

    8. I’m very sorry about your mother.

  31. Elspeth, even though it must be difficult considering your situation, I would highly recommend reading more material beyond this thread. There is a lot of background here and if you’re planning to “take on how this started” it would probably help to have more information and context. If you don’t have time for the Business Meeting videos the minutes for Sasquan are available as a PDF on the bottom of this page, for example.

  32. @Elspeth Kovar – The the NDA was broken and the whole thing went public.

    @Brad Templeton – Sorry, did the two who did this analysis (JQ and BS) violate the NDA and release the raw data to the public?

    No and no. For some reason my computer puts in additional characters when I try to link, but on 2/10 in the piece here titled MAC II Statement on Data Release for EPH Testing, Dave McCarty, the Hugo Administrator for MidAmeriCon II, said:

    The fact that we were doing it was not secret, there was no plan that the results of the analysis would be secret. The only faux pas was that it was agreed that results wouldn’t be made public until all analysis was complete and substantially agreed to by both parties, along with preparing a submission to the business meeting.
    Mr. Quinn’s report was early and incomplete from what had been agreed to. There remains joint analysis to do and information to prepare for the business meeting. That will still take a fair amount of time.

    No raw data was released, the Hugo Administrator refers to it as a faux pas rather than a violation of the NDA (that wording was from a member of the commentariat of File 770 and is in no way authoritative in its description) and if either of you remain concerned, a more useful course would probably be to take it up with the Hugo Administrator.

    If I were to start up a pool as to who and when an SP/RP blogger would be shocked, shocked about the (non) release of data, I’d pick VD in May.

  33. When is it planned to be released? I certainly hope it’s not planned to be released only a short time before the business meeting, because anybody preparing for that meeting will surely want time to learn as much as possible. I was kinda hoping, after the (non-binding) resolution, that this sort of analysis would be done last fall.

  34. One of the things I’ve found helpful is finding out the background information on issues before jumping all over someone. There are links here to Making Light. There were numerous and very long conversations there on EPH as it was being developed. There are also a number of posts here on File 770 from when EPH was first being discussed. I believe there are thousands of comments between the two on the proposal. That’s in addition to the business meeting minutes.

    When coming up with alternatives it’s important to consider additional factors. Worldcon recently defined finalist to distinguish it from nominee. This is why JJ keeps asking how Finalist would be determined if one were to expand the list beyond the shortlist for the final voting.

    No issue is a single issue. Chances are it has many other things it touches on or which touch on it. Which means solutions need to understand the larger culture of Worldcon and what other issues have come up in the previous year’s or are scheduled to be brought up at the next business meeting.

  35. @Brad Templeton
    I just can’t help myself

    When is it planned to be released?

    As has been stated in the comments when the analysis is complete. Before the business meeting. Should be early enough for people to prepare.

    I recommend reading all the comments in the thread again as it seems you’ve missed some critical ones. Who knows what else you missed.

  36. Tasha Turner: No issue is a single issue. Chances are it has many other things it touches on or which touch on it. Which means solutions need to understand the larger culture of Worldcon and what other issues have come up in the previous year’s or are scheduled to be brought up at the next business meeting.

    Nods. I got more educated following the discussions on Making Light that led to EPH. The WSFS constitution is an accretion of rulechanges* over time; if we were to write a constitution from scratch, we wouldn’t end up with what we currently have. So any proposed rule change has to take that history into account if it is to succeed. Even though there was/is support for EPH its ratification this year is not guaranteed. There are a number of Worldcon members who don’t think a rulechange is needed, or even if one is, don’t think EPH is the right rulechange to adopt.

    *We want what’s best for the WSFS & the Hugos but don’t (always) agree on how to get there.

  37. JJ et. al.

    I’ve been getting my most recent information from this discussion but the rest from earlier ones and talking with people.

    Ed Green said the NDA had been broken and asked what was being done. Dave McCarty, the Hugo Administrator, said the apology was accepted. I’m going on the assumption that if Dave thought it hadn’t been broken he would have said so.

    My understanding was that someone on the side of EPH paid for Jameson Quinn attending. If people insist I’ll follow up on that but, given that a person I like a great deal strongly implied that he was the one who’d done so I’d prefer not to. I’m not sure who brought in Bruce.

    The raw data was not released to the public and I never said it was. What was released to the public was that the data – I’m still not certain how anonymized it was, or if it could be opened up, or what: I need to backtrack – had been given to people. That is what I consider where this whole things started, that is, this specific thing on File 770. Not the discussion of the EPH, data being handed to people.

    I missed Dave’s statement here that the fact that they were doing so was no secret. I was, as suggested, using additional sources. Given that they were surprised the information had been handed out it wasn’t public knowledge.

    Yes, that’s twice that I’ve referred to someone without giving a name. I don’t have permission to so won’t but, again . . . I know: Mike: would you say I probably have reliable sources, and even then check them?

    If the Administrator wanted people with no connection whatsoever with any of this it would have been simple enough to ask around. Guy in the Public Health Service would run the numbers for me. No connections to fandom, never a cent of money or anything paid for. Any concerns whatsoever and I’d also have been perfectly happy to just put Andy in touch with Dave, told him to do it, and stepped away.

    The people who inadvertently – that is, professionals who didn’t apply professional protocols – apologized. Here. If I’ve missed other places please give me the URLs and if in a thread, a way to find the right place. That’s not snarky,if I’ve missed something I want to know.

    I’m not sure where the “being shooed away” came from, except if it was polite phrasing. I did say that another statistician had volunteered and been shut down by people here. Not by the Hugo Admin but don’t forget: we use a volunteer economy.

    I’m not walking in acting as if I expect to be shredded, I’m walking in having read everything thus far and knowing that even the rankest newbie isn’t going to get any slack. But there was probably a fair bit of bravado thrown in as well: despite being around for more than 15 years stepping up and going against the grain was and is daunting. Which isn’t an excuse for the bravado but hopefully will explain it.

    No, not everyone related to the Puppies holds ‘us’ in animosity. If nothing else one person involved spent several years auditing the Hugos to *prove* that they are trustworthy. There is, however, such a total lack of understanding as to be very difficult to bridge. Still, I wouldn’t normally suggest releasing the data to folks even distantly related to them for analysis except that it has been released to other people. Please go back and read, however. What I said was

    “Third, discuss how to handle it. My suggestion is handing the whole kit’n’kaboodle to someone neutral. At the same time “You’ve seen the data, we get to see it” is entirely justifiable. As I said, discuss how to handle it.”

    Thank you for sympathies for the loss of my mother, and for understanding that I’ve been rather busy. But her attitude was that if things mattered they mattered and not to be an idiot. I cared about things going on at the Worldcon, if I needed to be at the computer I should be.

  38. Elspeth Kovar: My understanding was that someone on the side of EPH paid for Jameson Quinn attending. If people insist I’ll follow up on that but, given that a person I like a great deal strongly implied that he was the one who’d done so I’d prefer not to.

    Mr Quinn’s presence at Sasquan was crowdfunded. Your friend probably contributed to that. I did, also — as did numerous other people. It was done on a shoestring budget, and it was done at Mr Quinn’s offering as someone with expertise which might be beneficial. I can guarantee you from personal observation that he has no loyalties to any “side” here, other than to the study of the data.

    Whatever you’ve been told, Mr Schneier and Mr Quinn are objective participants, and they have a great deal of experience and expertise.

    I am sorry to hear about your current circumstances and about your mother.

    As Wildcat and others have pointed out, while you’ve only just now become aware of all this, it has been the topic of intense discussion for 10 months now. There are indeed thousands of posts and other supporting documentation in numerous places on the Internet. The situation and the numerous possible responses to it have been discussed, and dissected, and modeled, to the nth degree.

    Just so you aware of what you’re signing on to — I’ve spent many dozens of hours over the last 10 months reading, educating myself, and participating in the discussion and analysis. I know that many others have done this as well.

    If you are truly committed to bringing yourself up to speed, that is what you are looking at. I’m not trying to discourage you — you sound as though, with your knowledge and experience, you would be an asset to the discussion — but I hope you will think long and hard about whether this is really where your time and effort should be going right now.

    But if so — welcome to the jungle.

  39. Ed Green said the NDA had been broken and asked what was being done. Dave McCarty, the Hugo Administrator, said the apology was accepted. I’m going on the assumption that if Dave thought it hadn’t been broken he would have said so.

    Of course the NDA was broken by the researchers releasing information and analysis about the ballots without the agreement of MAC II.

    Ed Green didn’t make that up. He said it because one of the other parties to the NDA told Jameson Quinn that he had broken it, and Jameson Quinn and Bruce Scheier then separately stated that they had broken it and issued apologies.

    Jameson Quinn on February 8, 2016 at 12:08 pm said:
    It appears I should have checked with other people before sharing anything here. I was complying with my understanding of the letter and spirit of the NDA I am under, but as a statistician I interpreted some of the words in that NDA with other than their everyday meaning. So, people have now told me:

    1. I should not make any further posts here discussing the 2015 data.

    2. All the people I’m talking to do still want to have meaningful analysis available to the public with all due haste, and in particular before MidAmeriCon.

    3. Anything I have said here should be regarded as provisional until confirmed.

    It was not my intention to stir anything up. It’s just that, with nominations open, I felt it was time to start sharing what I have, in a way that I felt respects privacy and the integrity and image of Sasquan and Worldcon. But that was not my call to make, and I sincerely apologize.

    Bruce Schneier on February 8, 2016 at 2:10 pm said:
    Just to be clear:

    We are still analyzing the data: not just from last year, but from previous years as well. There will be a more complete analysis, and it will be made public well before the next WSFS business meeting. We are still in discussions with the Hugo administrators on what to release and how.

    So, apologies for this early release of information. One of the conditions of us seeing the raw data was not to discuss it unilaterally, and we broke that agreement.

    We don’t know the precise text of the NDA, but the content is clear enough from that, plus the statements by the MidAmeriCon II WSFS Division Head Tammy Coxen and MAC II Hugo Administrator Dave McCarty:

    As previously announced, it was determined that the data was unable to be sufficiently anonymized for a general release, so the researchers were provided data under a non-disclosure agreement.

    There was to have been a coordinated release of the research findings between MidAmeriCon II and the researchers, which would have made clear the circumstances under which the data had been shared. Planning was already underway regarding that release, but as noted, analysis is still occurring. Our intention is to jointly share the research findings when they are complete, which will be well in advance of the business meeting at MidAmeriCon II.

    Dave McCarty on February 11, 2016 at 11:02 pm said:
    @Mike Glyer

    And is it correct that “none of this effort was secret”? Then who told Quinn he committed a faux pas by making preliminary comments about it?

    The fact that we were doing it was not secret, there was no plan that the results of the analysis would be secret. The only faux pas was that it was agreed that results wouldn’t be made public until all analysis was complete and substantially agreed to by both parties, along with preparing a submission to the business meeting.

    Mr. Quinn’s report was early and incomplete from what had been agreed to. There remains joint analysis to do and information to prepare for the business meeting. That will still take a fair amount of time.

    Dave McCarty – Hugo Administrator
    MidAmeriCon II

    The NDA may not prevent the researchers from disclosing the bare fact that they have access to the data. It may have only seemed so since as far as anyone could tell, this had not been publicly announced.

    But Mr. Quinn was certainly not allowed to publish analysis and commentary on EPH until a joint submission to the Business Meeting had been agreed on by both parties.

    And Mr. Quinn was certainly not allowed to use the data to advocate for a new strategy for nominators for the express purpose of altering the outcome of the 2016 Hugo nominations, and thus the 2016 final ballot, at any time under any circumstances.

    Thus, “faux pas” refers to Jameson’s mistaken understanding that he had not violated the NDA, even though he did so in a post which might as well have been titled “A Call to All Right-Thinking Fans to Help Us Stop the Puppies Based on My Knowledge of Their Own Secret Ballots.”

  40. And oh, yeah, Elspeth — in case you haven’t figured it out yet, Brian Z. is the resident Puppy Troll who spends most of his time posting false and disparaging things about EPH in an attempt to dissuade people from supporting it at Worldcon.

  41. You are on solid ground in characterizing my assessment of Mr. Quinn’s Original Post as “disparaging.”

  42. The whole point of the arrangement between Quinn/Schneier and the admins appears to have centred around avoiding releasing insufficiently anonymised personal data, and that has not happened. There also seems to have been an agreement not to release any analysis until jointly agreed to. That is what Quinn appears to have done, albeit without realising it was an issue, and for which he has apologised.
    Whether or not the two issues – data and analysis – happened to be written on the same piece of paper doesn’t prevent them from being two different issues. No data has been released. At most we’ve had an inadvertent sneak peek at the analysis.

  43. I will absolutely vote against any system with 10 nominees. I haven’t got the time to read twice the number of books to be able to vote. I’m also not fond of having twice the number of people being able to call themselves nominees, as that will dilute the honour.

  44. Elspeth Kovar:

    “A statistician has volunteered – and been slapped down. “

    I do think you mean “an anonymous conspiracy theorist”. He is known here since before which I think you might have missed.

    “First, contact someone on the ‘other’ side. Given the circumstances in should be someone handling SP4.”

    Contacting other conspiracy theorists doesn’t really help.

  45. JJ:

    “Just so you aware of what you’re signing on to — I’ve spent many dozens of hours over the last 10 months reading, educating myself, and participating in the discussion and analysis. I know that many others have done this as well.”

    To be honest, I think most of us regulars here could say many hundreds of hours instead of many dozens. To start from scratch will take time.

  46. @Elspeth

    I’m not sure how familiar you are with NDA’s, but from what I’ve read thus far, I would agree with you that the NDA was breached (early and incomplete disclosure of analysis), but I would correct you in that it wasn’t broken (leak of data that may identify Worldcon members, or nomination ballots).

    Given this, it’s not entirely surprising that the MAC2 admin(s?) decided that there was no material breach, and that an apology sufficed.

  47. Hampus, as I said from the start, those are downsides to systems which increase the number of nominees. Your first downside (having to read more) depends on how many fans would feel obligated to read works they felt got on the ballot through improper action which probably would not have gotten on it without the collusion. We won’t learn that percentage with my intuitions or yours or anecdotal claims, though.

    The second objection I find less understandable. Given the presence of 4 slate nominees who pushed their way on, and 5 “natural” nominees who would have received nominations in the absence of collusion by slates, are you really saying that you feel it’s better than only the 4 slated works and one natural nominee get to call themselves finalists, rather than naming the 4 slated nominees and the 5 natural nominees as finalists? That it is better to deprive those natural nominees of the title, and deprive the fans of the ability to vote them on the final ballot, than it is to increase the number of finalists?

    On another note:

    Has there been any statement of why the admins judged the data as too difficult to anonymize. I am familiar with the challenges around anonymization and seen the papers which outline how difficult it can be, and know that people often think anonymization will be easier than it is. But since you can hide, in this case, not just anything about the voter but also the names of all works which appeared on fewer than 20 ballots (except in Fan Artist, where it’s 10 ballots) it seems pretty hard to see a path to PII here. (In fact you could learn a lot though not get a perfect simulation if you simply removed all those entries and left a note of how many were removed from each ballot — they affect the results only by adding chaos to the early elimination rounds of most systems.)

    I would be interested to see a more detailed statement of why it was judged too risky. Of course, it could be that the risk is only apparent once you see the data, but if so, I would like to know even that one fact.

  48. Brad Templeton, your premise seems to be that there will only ever be one slate. Why else would you keep talking about 4 slate nominees and 5 natural nominees?

    But I honestly can’t understand why you don’t think there might, in future, be two slates, or five slates, or twenty slates. In which case your solution becomes unmanageble. A one-hundred-entry finalist list????

    Sorry, I just don’t think your proposition is workable. You haven’t even given us a mechanism to recognize slates (at least, not that I’ve seen), so how do you even define slates? Do you have an algorithm, or are you relying on the Hugo administrator’s judgment? In the latter case, I think the overwhelming vote at the Business Meeting, including from the administrators, will be HELL, NO. If the former, what is your algorithm?

  49. Hampus Eckerman: To be honest, I think most of us regulars here could say many hundreds of hours instead of many dozens. To start from scratch will take time.

    Well, yeah, “hundreds” was the word I really wanted to use there — but I was trying not to scare her off too badly. 😉

Comments are closed.