E Pluribus Hugo Tested With Anonymized 2015 Data

By Jameson Quinn: [Originally left as a comment.] So, Bruce Schneier and I are working on an academic paper about the E Pluribus Hugo (EPH) proposed voting system. We’ve been given a data set of anonymized votes from 2015. I don’t want to give all the results away but here are a few, now that people are actually voting for this year’s Hugos:

  • A typical category had around 300 ballots which voted for more puppies than non-puppies, and about half of those ballots were for puppies exclusively. There were few ballots which voted for half or fewer puppies (typically only a few dozen). The average number of works per ballot per category was around 3.
  • There were some weak correlations among non-puppies, but nothing that remotely rivals the puppies’ coherence. In particular, correlations were low enough that even if voting patterns remained basically dispersed, raising the average works per ballot per category from 3 to 4 (33% more votes total) would probably have been as powerful in terms of promoting diverse finalists (that is, not all puppies) as adding over 25% more voters. In other words: if you want things you vote for to be finalists, vote for more things — vote for all the things you think may be worthy.
  • EPH would have resulted in 10 more non-puppy finalists overall; at least 1 non-puppy in each category (before accounting for eligibility and withdrawals).
  • SDV(*) would have resulted in 13 more non-puppy finalists overall.
  • Most other proportional systems would probably have resulted in 13 or 14 more.
  • The above numbers are based on assuming the same ballot set; that is, that voters would not have reacted to the different voting system by strategizing. If strategizing is not used unless it is likely to be rational, that is a pretty safe assumption with EPH; less so with other proportional systems. Thus, other systems could in theory actually lead to fewer non-puppy nominees / less diversity than EPH.

Feel free to promote this to a front page post if you want. Disclaimer: EPH is not intended to shut the puppies out, but merely to help ensure that the diversity of the nominees better reflects the diversity of taste of the voters.

(*) Editor’s note: I believe SDV refers to Single Divisible Vote.

Update 02/08/2016: Added to end of second bullet missing phrase, supplied by author. Corrected footnote, based on author’s comment.


Discover more from File 770

Subscribe to get the latest posts to your email.

407 thoughts on “E Pluribus Hugo Tested With Anonymized 2015 Data

  1. I do think it’s possible — wrote a message about it earlier in the thread, sorry you missed it (lots of messages, so easy to miss.)

    Though it’s unlikely to have two slates that are each as strong as the combined puppies were in 2015, and if you did they would be approaching being a majority of nominating fandom. Short of that, I have been following the doctrine that it is better to expand the ballot (with all its downsides) than to push the natural nominees off the ballot.

    On the other hand, two overlapping slates is what we had in 2015. They seem to have been about 20-30% of nominators.

    Keep in mind that I am refining proposals in response to feedback here. I maintain that the most effective proposal (a simplified version of write-in from a sorted longlist) is still the best choice, but I am trying to distill its value into proposals that address some of the flaws people see. It remains my favourite because it withstands not just slates and bought slots, but almost anything else. EPH, SDV and MRMR on the other hand are specific defences against the current problem (slates) and as such are inferior. They are fighting the last war. They are making you take your shoes off at airport security.

    I view the problem as “There is a large group of fans, together with allied non-fans, who wish to corrupt the nomination process to promote their choices artificially and unethically ahead of the real opinions of the membership as a whole.” Slates are their current method of doing that.

    One new thought on “who is a finalist?” in alternate systems, one possible definition could be, “The top 5 of the longlist PLUS the top 5 of the final vote.” Yes, it would be announced the day after Hugo voting closes who the top 5 of the final vote are. (Today that’s known long in advance.) Those people would sit at the front and get the right to claim being a Hugo finalist. Only the top 5 of the longlist would know long in advance to make plans to attend, but that’s how it is today, of course.

    Oddly, publishing the whole sorted longlist does answer the question of “how many to read” because the list is clearly too long to read them all and so it is more likely the voter would understand they need to select the 5 most likely in their view. Which for many fans would be their estimate of who the 5 natural nominees are.

    Again, some people object to that, so I am searching for alternatives that contain the main goals (be fair to all but don’t allow slates to push natural nominees unfairly off the ballot) but of course everything (including EPH, 4/6, do-nothing and many others) contains a series of trade-offs. Naturally, people disagree on which trade-offs are best.

  2. @Brad Templeton
    Italics used for emphasis not yelling.

    It seems you missed many, many discussions around the SP/RP items which made it onto the ballot and all the various reviews of it. You’ve also missed current discussions where people are again saying they will be reading slated items which make it to finalist. It’s unclear how many are like me & don’t read and how many follow GRRM and John Scalzi advice and read everything on the ballot.

    Has there been any statement of why the admins judged the data as too difficult to anonymize.

    I know after the business meeting there were a number of online discussions where the problems were explained in detail. You may be able to find them by searching file770 using Google. I suspect discussions also happened on ML but I don’t read there regularly. I have not kept links as I kept up with the issue as it was happening and wasn’t preparing a research paper for others. Your idea of how easy it is to anonymize Hugo data is laughable and we’ve been over the ground multiple times.

    Keeping up with the issue over the last 10 months IMHO requires:
    1. Reading post related to EPH continuously over the 10 months as well as business meeting notes (or spending time playing catchup)

    2. Knowing issues which touch on it such as definition of finalist and other business meeting items going back 5 years as well as proposals for the upcoming business meeting

    3. Paying attention to the variety of people’s behavior SP1-SP3/RP1 and their stated intentions for SP4/RP2:
    A. Reading and voting based on quality of work GRRM/Scalzi style (lots and lots of people)
    B. Voting against slates and reading work (Steve Davidson is very vocal but not the only one to do this)
    C. Voting against slates Not reading (not nearly as big a number from what I can tell as #A)

    4. Knowing what the fans against EPH objections are

    I’m sorry if this feels like a personal attack but from someone whose only followed this a bit (more in dozens of hours rather than hundreds) it looks like you’ve been working in a vacuum. You miss major comments in the very thread your in (EPH teams belief of how 4/6 works with EPH and when the full analysis announcement will be made).

    It’s very frustrating to those who have spent dozens or hundreds of hours participating in numerous discussions on this and related issues for 10 months. As well as the way you keep dismissing everyone who disagrees with you as if they can’t possible understand your brilliant idea or can’t really want a lesser extreme outcome. Logic suggest if everyone you talk to is insisting no it’s a bad idea and they’d vote against it they mean it. Especially if they’ve spent dozens/hundreds of hours keeping up with the issue over 10 months while you seem to either only pop in occasionally or can’t remember data from stuff you’ve read (which won’t help you in a business meeting).

    Yes I totally failed at my flounce/wasn’t going to talk with you anymore. Nobodies perfect.

  3. @Brad Templeton One new thought on “who is a finalist?” in alternate systems, one possible definition could be, “The top 5 of the longlist PLUS the top 5 of the final vote.” Yes, it would be announced the day after Hugo voting closes who the top 5 of the final vote are.
    I believe that will require a proposal of it’s own as it goes against the currently redefined definition.

  4. @Tasha, wow that was a pretty big wall of text to manage to be so content free and boil down to “educate yourself.” Flouncing isn’t the only thing you fail at.

    Online discussions theorizing varied and contradictory reaaons are pretty much the polar opposite of a statement from the hugo admins.

    The business meeting ruled that the category nominations by ballot with other identifying information removed was sufficiently anonymized, that is the only fact of the matter so far. The Hugo admins aeem to disagree, but like most of their decisions the actual rulings or reasons are not publicly available.

  5. @MC DuQuesne,

    The business meeting ruled that the category nominations by ballot with other identifying information removed was sufficiently anonymized, that is the only fact of the matter so far. The Hugo admins aeem to disagree, but like most of their decisions the actual rulings or reasons are not publicly available

    Actually, per THIS LINK, it was the Sasquan Vice Chair, not the Hugo Administrators, who said the information would be released. Sasquan doesn’t (as I understand it) have any authority over the Hugo Administrators.

    Again, as I understand it, a NON-BINDING resolution was passed to the effect that IF the information could be sufficiently anonymized, THEN they requested (not required!) that it be released.

    My google-fu has failed me; I can’t find a link to the exact text of the resolution.

    Kevin Standlee, I’m sending up the Standlee Signal; can you clarify?

  6. MC DuQuesne: I plonked Ms. Turner, but indeed I have seen nothing further. Usually when one does an analysis of anonymization, one can explain (without specifics) what the difficulties were. For example, in the Massachusetts databases, they were able to say, “We found that if you narrowed a query by zip code, sex and birth-date, you could often identify a specific individual.” (In that famous case they did name the individual because it was the Governor, but not his private data, but it was not necessary to name him. In the AOL search results case they were able to explain how certain types of queries contained PII.

    If the PII can’t be removed, then their decision makes sense. I am critical of not having told what sort of PII was at risk. In particular, did they consider obfuscating the names of the works, which would work on those that appeared on fewer than about 3% of ballots. You can’t easily obfuscate the names of the works whose counts were published, of course.

    This is no idle request. The business meeting wanted this data to be open if it can be, because members of the conventions seek to redesign the voting system, and they need this data to do a good job of it. Yes, it also allows the slates to learn more about how to attack the system, but for better or worse, the business meeting decided to go ahead anyway. The resolution may have not been binding, but it needs a good reason to be ignored, and “we can’t tell you why, but we don’t know how to anonymize it” isn’t enough of an answer for me. There may be people better able to advise on how to anonymize it.

    The thread pointed to below specifically referes to items that got just a handful of nominations. If no data are released on the names of the works that appeared on fewer than 20-50 ballots (depending on category) and those names are replaced with “work1”, “work2” etc. in some random way, it seems to deal with that concern. It is much harder to associate the appearance of a work with an individual if it appeared on 30 other ballots.

  7. @MC DuQuesne
    Brad Templeton needs to educate himself on the issues around Hugos and Worldcon and apparently offering ways & things needed to do so is too much data for a variety of readers.

    I’ve found in the past if someone is interested in becoming educated on an issue knowing what they need to know is often appreciated. Those who aren’t well they just keep spouting nonsense.

  8. @Cassy

    http://sasquan.org/wp-content/uploads/2013/07/2015-WSFS-Minutes-Complete.pdf

    “Moved, that the WSFS Business Meeting requests that the Administrators of the 2015 and 2016 Hugo Awards make publicly available anonymized raw nominating data from the 2015 and 2016 Hugo Awards, including the works nominated on each ballot in each category, but not including any information that could be used to relate ballots to the members who cast them;

    and Resolved, that it is the opinion of the WSFS Business Meeting that releasing such anonymized raw nominating data after the announcement of the results of the 2015 or 2016 Hugo Awards is not a violation of the privacy of members’ ballots.

  9. @Brad: This is just vague recollection, but I remember some of the PII issue being that some participants publicly share their full nomination ballot. A blogger who has an unusual combination of “public” nominees in any one category (or across several categories) could be identified.

    Just an example, imagine that Artist A was one of the Hugo Finalists for Best Fan Artist, so her name’s public in the data. Blogger Joanne posted her nomination ballot, and she nominated Artist A. And no one else, in that category.
    Now imagine it so happens that Joanne is the only person whose nomination in Best Fan Artist was Artist A and no additional nominations. Congrats, you can identify Joanne’s ballot in the anonymized data.

    That’s already an effective attack (in the data-security sense). But identifying Blogger Joanne’s ballot lets you identify all the other works Joanne nominated, even if they’ve been anonymized. In all the categories. Thaaaaaat’s a problem.

  10. @MC DuQuesne,

    Moved, that the WSFS Business Meeting requests that the Administrators of the 2015 and 2016 Hugo Awards make publicly available anonymized raw nominating data from the 2015 and 2016 Hugo Awards, including the works nominated on each ballot in each category, but not including any information that could be used to relate ballots to the members who cast them;

    and Resolved, that it is the opinion of the WSFS Business Meeting that releasing such anonymized raw nominating data after the announcement of the results of the 2015 or 2016 Hugo Awards is not a violation of the privacy of members’ ballots.

    Thank you for finding this.

    You’ll note two things: First, this resolution does not say that the information HAS been anonymized, or even that it CAN be anonymized. It merely requests that anonymized data be made public. If the data cannot be anonymized, then obviously the data cannot be made public.

    (I resolve to hand over all the non-allergenic cookies to my friend from this box. Alas, the cookies all have peanut butter in them, to which she is allergic. Therefore, despite my resolution, I don’t give her any cookies.)

    Note also that this is a request, not a requirement. The Hugo Administrators are under no obligation whatsoever to comply.

  11. @Standback: Right — these are the sorts of scenarios the Hugo admins should publish in declining to carry out the request of the business meeting. Because if they posted those, then fans would (and have) suggested:

    a) It is not necessary to connect my nominations for Best Novel with my nominations for best Novella in the data. I mean that is useful data, but not for many algorithms.
    b) When the categories are independent, the fact that I published I nominated only Artist A is already public information. If I published, “I nominated Artist A and one other artist I don’t want to name” then you could learn something about me if there is only one ballot with Artist A and another popular artist. You need an example where somebody published online some highly unusual information about their choices in a category, naming some but not naming others, the others have to have appeared on more than the minimum number of ballots, and the pattern unique or very rare. Possible, but low probability. However, you learn nothing about the other categories as they should not be connected in the data.

    Anyway, this is the sort of situation I wanted to hear about, so thanks. While it is largely true that people who post ballot choices try to keep some secret, I do see situations where they want to keep it secret from the people they are voting or not voting for.

  12. @StandBack

    But you already said that blogger Joanne had posted her ballot, so how is her privacy in any way violated?

    Irregardless, minimum anonymization would render that attack void

    Here is an example of how Joanne’s data would appear in an anonymized database
    qxgd3434,1,1,”Skin Game”
    qxgd3434,1,2,”Trial by Fire”
    chek4311,2,1,”Flow”
    zpok2100,12,1,BFA211

    Ballot IDs are only unique to category, and anything receiving less than 5 votes is replaced by a simple identifier.

  13. @cassy, yes it is a request, but it would be polite when declining such a request to state why, even if “because I don’t like you” is the reason.

    A closer analogy than your cookie example would be

    Moved to give all the fireworks to the classroom, including sparklers and smoke bombs, but not including anything that will blow your hand off.
    Resolved that releasing such safe and sane fireworks to the classroom will not endanger their safety.

    Then in denying the request the admins could say that they disagree and think sparklers could blow someones hand off.

  14. @mark, i fail to see anything unfounded in my postings, could you be more precise in what fault you find?

  15. MC DuQuesne : But you already said that blogger Joanne had posted her ballot, so how is her privacy in any way violated?

    Irregardless, minimum anonymization would render that attack void

    No, it wouldn’t.

    Suppose Joanne’s full Novel ballot looked like this:
    qxgd3434,1,1,”Skin Game”
    qxgd3434,1,2,”Trial by Fire”
    qxgd3434,1,3,”Novel featuring gay main character”
    qxgd3434,1,4,”Novel featuring gay main character”
    qxgd3434,1,5,”Novel featuring gay main character”

    Suppose that Joanne is the only voter who put the combination of both Skin Game and Trial by Fire on her ballot, and she’s published those choices on her blog. If there’s only one ballot with those 2 novels, everyone is going to know that it’s hers.

    Suppose also that she has not yet come out to her family, or most of her friends, as being gay.

    I can certainly see where this would be a huge privacy violation.

    Also, “irregardless” is not a real word.

  16. One of the reasons I don’t use a plonk script is because if someone posts a legitimate question addressing something I’ve posted, and I don’t respond to it, I look as though I’m deliberately avoiding responding because they’ve identified a flaw in what I’ve said, and I don’t have valid response.

    Which is pretty much how Brad Templeton looks in this thread.

  17. @MC DuQuesne

    You posted starting “The business meeting ruled that…”
    You’d had the status of what the business meeting actually did comprehensively explained to you before, which I what I linked you to.
    You now seem to be acknowledging in your reply to Cassy B that it was a request, not a binding ruling, so I am content.

  18. @jj The business meeting and I disagree with you on the validity of your hugely improbable imaginary scenario.

  19. MC DuQuesne wrote: @jj The business meeting and I disagree with you on the validity of your hugely improbable imaginary scenario.

    Neither the Business Meeting nor you have seen the raw data. An internationally respected security expert has. Whose judgment should we accept: yours or his, and why?

  20. @JJ,
    I use the plonkscript but with opacity set to 0.8.

    If you set opacity to 1.0, it completely blanks a comment out. At 0.8, it’s faded out enough (but not completely) that I can make out the identity of the commenter but it takes extra effort to actually read the comment. Whether I choose to read the comment is therefore a conscious decision. YMMV.

  21. Suppose Joanne’s full Novel ballot looked like this:
    qxgd3434,1,1,”Skin Game”
    qxgd3434,1,2,”Trial by Fire”
    qxgd3434,1,3,”Novel featuring gay main character”
    qxgd3434,1,4,”Novel featuring gay main character”
    qxgd3434,1,5,”Novel featuring gay main character”

    Yes, I pointed out this risk in my post (in not quite this way) and I do agree it’s a risk, though I rate its probability as low but not zero — the works in question need to all be on 88 or more ballots in the case of 2015, so while this precise scenario can’t happen, others are not impossible.

    A few options exist:

    a) Since PINs are just going out now, voters could be asked to check a box on the online ballot concerning privacy preferences. Done at the start, this could be opt-out.
    b) One could also mail the 2015 nominators and ask them to opt in. In the almost certain event that several do not opt in, their data could be released in aggregate only.

    In both cases, however, the puppies might decide not to opt in (or to opt out) since they might perceive the data release as being for use against slates. Though they are also complaining (some of them) that it’s unfair for JQ to get the data and not them.

    Longer term, it could be declared in advance what will be released, with a warning saying, “While the release will have your name removed, the names of works which get few nominees and the individual categories will be separated, you may want to exercise caution in any disclosure of part of your vote on a category if you wish complete privacy on the part you don’t disclose. (For example, you wish to hide works which may embarrass or compromise you, or works of people you know which are present or absent.) Your votes in other categories will not be compromised because of your disclosure about a different one.”

  22. MC Duquesne: The business meeting and I disagree with you on the validity of your hugely improbable imaginary scenario.

    Please point me to the exact verbiage where “the business meeting” said this. (Note that I was one of the people at the Business Meeting, so I know what was actually said — and what you’re claiming they said, they did not say.)

  23. @jj if the pdf of the motion passed by the business meeting that i quoted is incorrect you should inform the sasquan web admins so they can correct it.clearly states that category nomination information is not a privacy violation.

  24. It states that it’s their opinion. An opinion unsupported by actually, you know, seeing the data. Rather like yours. So are you going to explain why we should go with your opinion over that of an internationally respected security expert who has actually seen the data in question?

  25. BTW, while I have no idea what it would reveal, one could count the number of sub-ballots (categories on a ballot) which might reveal private info in this way. To meet that test, they would need to have on them from 1 to 4 entries which appear in the detailed report (plus any other entries not named) and they must be the only ballot to have that set of entries, and there must be other entries. That’s not going to happen for 1, obviously and it’s probably unlikely for 2. It probably happens for 3 a few times (though almost never for puppies.) It probably happens somewhat often for 4, but in that case, I ponder how likely it is that a person published 4 of their selections, and made a 5th, but wish to keep it secret. Either way, we could count how many there are, and if they are few, those could be pulled, and if there are modest numbers, they could be emailed to ask for opt-in. If there are lots, it’s a no-go.

    Of course, pulling out 10% of the ballots would have another effect, which that makes it much harder to connect the random names in the released data with the real names, though in some cases it will be still doable. If it weren’t for the fact that a slate ballot is unlikely to meet any uniqueness test, you could actually pull a lot of ballots and still have a meaningful dataset for some uses, though clearly not as good as a full set. You don’t have to pull the whole ballot, just the entry which passes the uniqueness test.

  26. MC Duquesne: if the pdf of the motion passed by the business meeting that i quoted is incorrect you should inform the sasquan web admins so they can correct it.clearly states that category nomination information is not a privacy violation.

    There’s nothing wrong with the PDF. The problem is with what you’re claiming it says.

     
    “Moved, that the WSFS Business Meeting requests that the Administrators of the 2015 and 2016 Hugo Awards make publicly available anonymized raw nominating data from the 2015 and 2016 Hugo Awards, including the works nominated on each ballot in each category, but not including any information that could be used to relate ballots to the members who cast them

    The WSFS is requesting that the Hugo Admins release the nominating data, as long as it can be fully anonymized.

     
    and Resolved, that it is the opinion of the WSFS Business Meeting that releasing such anonymized raw nominating data after the announcement of the results of the 2015 or 2016 Hugo Awards is not a violation of the privacy of members’ ballots.

    The WSFS is saying, IF the data can be fully anonymized, then releasing it does not violate the privacy of the members’ ballots.

     
    Nowhere does the WSFS say that they know the data can be fully anonymized — which is what you are claiming they said.

  27. @MC DuQuesne

    If I understand it correctly, the business meeting resolved that it was not inherently a violation of privacy if and only if the data could be anonymised. The privacy section at the end doesn’t negate or override the bit that requires anonymisation. The Administrators have said it couldn’t be anonymised, therefore the privacy statement within that pdf doesn’t apply.

  28. @cally, I see no need to argue the superiority of democratic processes over your characterization of the opinion of one person with access to data having unknown levels of anonymization.

  29. What “democratic processes” are you speaking of? The ones where the Business Meeting said that they recommended, but did not require, the data be released if it could be fully anonymized? Had I been at the Business Meeting, I, too would have voted for that. I had several friends and acquaintances who were there and did vote for that. But my hypothetical vote, and their actual votes, don’t mean that the data is fully anonymizable. That’s independent of whether or not you or the Business Meeting hoped or believed it is.
    And honestly, I’ll believe Bruce Schneier, who has seen that data and has no axe to grind, rather than you, who hasn’t, and does.

  30. “Moved, that the WSFS Business Meeting requests that the Administrators of the 2015 and 2016 Hugo Awards make publicly available anonymized raw nominating data from the 2015 and 2016 Hugo Awards, including the works nominated on each ballot in each category, but not including any information that could be used to relate ballots to the members who cast them”

    It clearly states that the works nominated on each ballot is not considered the information that could be used to relate ballots to members who cast them, such as member number, address, email, name, etc.

    It separates data into “relate ballots to the members who cast them”, and data that doesn’t, and explicitly puts the category nomination data in the non-relating category that should be released.

    Some individuals may disagree and posit wildly implausible scenarios where the category nomination data would relate to specific users, but the business meeting disagreed

  31. This feels as though a group of people agreed “It probably won’t rain on February 29, and if it doesn’t, we should have a picnic” and then someone who really really wants a picnic starts trying to get everyone to say what they’ll bring, and organize car-pooling. “Of course we’re having a picnic. It’s not going to rain that day, see, right here we agreed that it won’t.”

    Meanwhile, I’m thinking that a picnic would be nice, but we can’t be sure on February 15 what the weather will be two weeks from now, and rainfall is not subject to a popular vote. Not even a vote of trained meteorologists, which I and most of my friends are not.

    I would like it to be possible to safely de-anonymize the data, but there are a lot of things I would like, and only some of them are true.

  32. @MC DuQuesne

    It clearly states that the works nominated on each ballot is not considered the information that could be used to relate ballots to members who cast them

    No, it doesn’t. If it did, it would say that. What it says is that information that can be used to link ballot to person will not be released. Now we know that the nominations themselves can be linked to individuals, those nominations can’t be publicly released. Because the resolution says not to do that.

  33. The important question of Brad’s that hasn’t been answered is has anyone found an official statement from the Hugo admins as to why they aren’t releasing the data or allowing other NDA releases. The anonymity issue may be a red herring.

  34. @MC DuQuesne

    I linked to their statements about four comments above yours. Here it is again. Yes, it is about anonymising the ballots. It has always been about anonymising the ballots. Conspiracy theories need not apply, unless you have some pressing evidence – and I mean evidence, not guesses, not theories, not suppositions – of another reason?

  35. That is the previous hugo admin, who has passed the data to the current admins who released it to one or more groups.

  36. Actually, I don’t believe Bruce has made a declaration on the difficulty of anonymization of this data. I have talked to him and he wants to keep pretty silent for now until they publish their paper.

    I did not see the part of the meeting where they made the request to release the data. Certainly the wording can be taken both ways, ie. “We at the business meeting think stripping the names etc. is sufficient to protect privacy” or that they added that phrase redundantly. To me it sure reads the former way — I mean what exactly does the latter phrase mean if the former phrase means “don’t release what you can’t anonymize?” However, somebody who was in the room might have a better sense of the meaning of that one.

    Either way, I do agree there is a small, but non-zero risk of disclosing some PII. Pretty small, but there could be more attacks and it’s hard for me to declare what’s small for somebody else. I would be cool with, “Identify ballot entries with unique properties, and e-mail the authors of those ballots to see if they will approve release, otherwise aggregate.”

    There is another, more time consuming approach. Let people run programs on the data and see the output, but not the data. Output to be reviewed by Hugo admins to be sure it’s seriously aggregated. Programmer signs contractual pledge to do nothing but extract broad aggregate statistics. Attacks are still possible this way (in fact, many famous attacks have been on attempts to do exactly this approach) but we’re not talking super high risk data here.

  37. MC Duquesne: It clearly states that the works nominated on each ballot is not considered the information that could be used to relate ballots to members who cast them, such as member number, address, email, name, etc.

    I am sorry that accurate reading comprehension is such a challenge for you.

    It clearly states:
    “Moved, that the WSFS Business Meeting requests that the Administrators of the 2015 and 2016 Hugo Awards make publicly available anonymized raw nominating data from the 2015 and 2016 Hugo Awards, including the works nominated on each ballot in each category, but not including any information that could be used to relate ballots to the members who cast them”

    which means: if the works nominated on each ballot in each category is part of the information that could be used to relate ballots to the members who cast them, then that information will not be released.

    You can’t just leave off half of what they said, and then claim that they’re saying something different because you’ve omitted part of it.

  38. @meredith, never claimed I did. It’s been clearly established that they can decline the request for any reason, I was wondering if the current admins had stated what their reason was.

    @JJ, you’re ignoring the complete sentence. Let me put it into words that might relate to your daily experience. “I want a burger with everything, including bacon and cheese, but nothing spicy,” then since you think you saw jalapeno flavored bacon in a store once you don’t put regular bacon the burger. You will not have a happy customer when they come back to the drive thru.

  39. MC Duquesne: you’re ignoring the complete sentence. Let me put it into words that might relate to your daily experience. then since you think you saw jalapeno flavored bacon in a store once you don’t put regular bacon the burger. You will not have a happy customer when they come back to the drive “I want a burger with everything, including bacon and cheese, but nothing spicy”

    No, you’re the one who is ignoring the whole sentence. What they are saying is:
    “Give the requestors a burger with everything, including bacon and cheese, but nothing spicy” — and if the only kind of bacon you have is spicy, then you have to leave that off, too.

  40. @JJ, you’re ignoring the complete sentence. Let me put it into words that might relate to your daily experience. “I want a burger with everything, including bacon and cheese, but nothing spicy,” then since you think you saw jalapeno flavored bacon in a store once you don’t put regular bacon the burger. You will not have a happy customer when they come back to the drive thru.

    So you’re assuming that the person who’s assembling the burger and knows what all the ingredients are is wrong about whether or not there’s jalapeno in the bacon. Even though he’s seen the bacon, and you haven’t. You just assert, sight unseen, that the folks with more information than you have are wrong, and you’re right. Got it.

  41. Reading this thread is fun. Well now that I’m being ignored/plonked.

    It’s been repeated more times than I can count that the non-binding resolution only applied if the data could be fully anonymized. We have several times where the Hugo admins, backed by the EPH team, state it couldn’t be done (in this thread even). So people who have seen the actual data give facts based on their field of expertise and the raw data.

    A couple guys who don’t like that ignore those statements keep trying to twist what the non-binding agreement states. Ignoring facts.

    And they wonder why nobody wants them near the non-anonymized data. *head desk*

  42. Well, MC Duquesne and Templeton have proved quite comprehensively that they don’t have anything worthwhile to add to any conversation, so they get whited out.

  43. As one of the voters at the Business Meeting, it was certainly my opinion that a method for adequately anonymizing the data could be found. It was also my understanding that the Business Meeting had no authority whatsoever to substitute its judgment for that of the administrators, and that we were making a request, not a demand. Even if the motion unambiguously asserted that removing the nominators’ names from the data was enough to adequately anonymize it, if the administrators disagreed then they had not only the right but the duty to refuse.

    I feel the administrators had a duty—not a formal duty under the rules, but a moral duty—to consider and act upon the request of the Business Meeting. I’m satisfied that their response adequately addresses the spirit and the letter of the resolution.

  44. @Tasha
    There are clearly some people ignoring the resolutions word and facts.

    “non-binding resolution only applied if the data could be fully anonymized.” the words fully appear nowhere, and attempts to put an impossible standard on the data release.
    The actual words. first clause:

    “the WSFS Business Meeting requests that the Administrators of the 2015 and 2016 Hugo Awards make publicly available anonymized raw nominating data from the 2015 and 2016 Hugo Awards”

    Nothing about perfectly, fully, or any other adjective, just anonymized,
    second clause

    “including the works nominated on each ballot in each category”

    Specifically says the works in each category be included in the anonymized data. This clearly indicates that the data is still sufficiently anonymized even including this data. This data isn’t enough to identify voters.
    Third clause
    “but not including any information that could be used to relate ballots to the members who cast them”
    all the actually PII needs to be removed, name, address, phone number, etc. whether this is done with the data given to the EPH researchers is unknown at this point.

    This first paragraph details which data needs to be released in this anonymized data, and which should be removed.

    Second paragraph

    “and Resolved, that it is the opinion of the WSFS Business Meeting that releasing such anonymized raw nominating data after the announcement of the results of the 2015 or 2016 Hugo Awards is not a violation of the privacy of members’ ballots.”

    This part makes it clear that even though the anonymization isn’t perfect after removing the PII, this particular data release of the category nominations does not rise to the level a privacy violation

    Now clearly the Sasquan admins were either too incompetent to anonymize the data to the level requested, or decided that their opinion on what constituted a privacy violation differed from the democratically determined will of the business meeting.

  45. MC Duquesne, those are some epic-level phrase-twisting contortions you’re going through, to pretend that the resolution says something different than what it actually says.

    The WSFS was not making any statement about whether the data was anonymizable, or claiming to know how, exactly, anonymization could be accomplished. It was saying IF the data was anonymizable. And it was saying that if the data was not anonymizable, it was not to be released.

    You say that “fully” is not specified anywhere. Well, neither is “partially”. “Anonymized” is like “pregnant”. Either you are, or you aren’t.

    Because the category votes for a person actually could be used to relate a ballot to a person, that made them non-anonymizable in the judgment of the Hugo Admins.

    Please stop trying to claim that the WSFS resolution says something other than what it says. I was there. I know what it says — and it’s not what you are claiming.

Comments are closed.