Responding to Controversy, Seattle Worldcon Defends Using ChatGPT to Vet Program Participants

Seattle Worldcon 2025 Chair Kathy Bond today issued a public statement attempting to defend the use of ChatGPT as part of the screening process for program participants. The comments have been highly negative.

…We received more than 1,300 panelist applicants for Seattle Worldcon 2025. Building on the work of previous Worldcons, we chose to vet program participants before inviting them to be on our program. We communicated this intention to applicants in the instructions of our panelist interest form.

In order to enhance our process for vetting, volunteer staff also chose to test a process utilizing a script that used ChatGPT. The sole purpose of using this LLM was to automate and aggregate the usual online searches for participant vetting, which can take up to 10–30 minutes per applicant as you enter a person’s name, plus the search terms one by one. Using this script drastically shortened the search process by finding and aggregating sources to review.

Specifically, we created a query, including a requirement to provide sources, and entered no information about the applicant into the script except for their name. As generative AI can be unreliable, we built in an additional step for human review of all results with additional searches done by a human as necessary. An expert in LLMs who has been working in the field since the 1990s reviewed our process and found that privacy was protected and respected, but cautioned that, as we knew, the process might return false results.

The results were then passed back to the Program division head and track leads. Track leads who were interested in participants provided additional review of the results. Absolutely no participants were denied a place on the program based solely on the LLM search. Once again, let us reiterate that no participants were denied a place on the program based solely on the LLM search.

Using this process saved literally hundreds of hours of volunteer staff time, and we believe it resulted in more accurate vetting after the step of checking any purported negative results….

Here is a sampling of the comments on Bluesky.


Discover more from File 770

Subscribe to get the latest posts sent to your email.

51 thoughts on “Responding to Controversy, Seattle Worldcon Defends Using ChatGPT to Vet Program Participants

  1. How did it handle all of the authors who publish under pseudonyms?

  2. Well. That explains why kids’ program was working with me on an item and then I got turned down by (regular) program. I politely asked if I was still on children’s program and got no response. I guess I was talking to a bot? Or maybe they didn’t like my extremely vocal criticism of the illegality of the business meeting.

  3. I can’t believe that after all the shitty junk of past Worldcons, that the Seattle ruing junta would be so deaf-ear. This is a stupid move, and another argument for ending Worldcons as we know them. As Vice-President of The Heinlein Society I arranged to have a table at the Worldcon, and I will buy a membership because of the aforesaid had-to. But I didn’t apply to be on panels because of the vetting. If you’re not going to vet people correctly, don’t do it at all.

  4. Do you know what brought about the statement? I can’t find any public controversy about it.

  5. It’s indefensible. I’m very disappointed in the committee for doing this, and I’ve told them so in direct feedback and on social media.

  6. As a matter of transparency, they should be publishing their prompts and vetting criteria.

  7. The post talks about protecting privacy and claims that the only “data” they inputted was the author’s names. But…is the query not, in some sense, data? Don’t LLMs…not exactly learn, but adapt based on how they are used? Should we be concerned about what the exact queries were? In addition to all the other concerns?

  8. Since the purpose of such vetting must have included surfacing any kerfuffles or trouble associated with the name, it is reasonable to wonder if the query now associates the name with trouble.

  9. Run that by me again? Using an LLM to script an online search is the kind of thing a manager with no knowledge of computers would think of. I can’t believe there are no computer-literate people on the Seattle committee, yet their statement reads that way.

  10. If you use Google, what do you think the system is that selects which links to show you on the results page? What process do you imagine takes place between entering “Chris R Controversy” in a search box and getting back a set of results?

    It’s been a long time since it was PageRank, link text, and page contents. The machinery underneath that is in fact also a variation of a large language model. So is the process that learns from your queries and your clicks. The act of searching for anything and interacting with the results is practically very little different from using a chatbot.

    The results you get back, also, need to be aggregated and understood. Some responses to this I’ve seen have suggested “just write a perl script to do the search and vet people that way” and … I would happily challenge anyone to do that with anything other than toy web pages and rigid content. Extracting possibly interesting text that may have meaning from a morass of unstructured text is what LLMs are for; the generative side is an accidental side effect.

    I’m not saying this was a good idea; I think the scale and tone of the reactions here and elsewhere speak to that, but I wish people would think critically (and, Lisa, maybe read the statement? they were pretty clear about what they used it for, and “having a bot reject you” wasn’t in that set at all) about what this was and how it was used, and criticize it on the facts rather than reflex.

  11. “An expert in LLMs who has been working in the field since the 1990s reviewed our process and found that privacy was protected and respected, but cautioned that, as we knew, the process might return false results.”

    Who is the expert? Can members view this expert’s CV?

  12. Yeah, I can think of few things that a con could do that would be better than throwing one of their members or staff under the internet bus that way.

  13. Well, on the positive side, they have managed to find a whole new way stir up fandom.
    Congratulations?

  14. Grok seems to be more honest – and I wouldn’t trust it either.
    Has Seattle not heard of UDM14? It doesn’t use AI.

  15. @Warren

    Do you know what brought about the statement? I can’t find any public controversy about it.

    From what I’ve heard, there was strong internal criticism, which prompted the Seattle Worldcon to issue an apology before the use of LLMs for vetting program participants leaked.

  16. they were pretty clear about what they used it for, and “having a bot reject you” wasn’t in that set at all)

    “Having a bot reject you” is not that far from “having a bot gather the information used to reject you.” AI code is so overhyped and overused in screening processes that people have little faith in its accuracy or fairness.

    Since you are someone involved in software for Seattle Worldcon, I hope you’ll pass along the request to release the prompts.

  17. I am a software vendor (in the gift sense) to Seattle. They are using software I wrote, that’s it. I can neither pass on information from nor have any insight into that process there.

  18. Lazy, lazier, laziest.

    Worldcon organizing has always been labor-intensive. Gail Barton (my wife) and I ran the Art Show at Denvention II in 1981. We had some 300 artists to manage, with no computer, just a typewriter and the usual fannish repro equipment of the day. We started with a set of artist mailing lists gotten from several earlier Worldcons, plus an invaluable list given to us by Bjo Trimble (an old friend of Gail’s).

    Gail and I corresponded with at least two thirds of the artists who came to the show; over a hundred of them were walk-ins who did not pre-register for our show. We encouraged walk-ins via the convention publications; Gail’s career as a fan artist began as a walk-in artist at St. Louis Con in 1969. She did all the management and policy setting, while I did all the recordkeeping and correspondence. After the con, we sat down with the convention treasurer to finish things up—I needed the treasurer to check my figures (he was a pro accountant) and to write the checks to the artists.

    If my memory serves me well, the only computer that Worldcon had was an Apple II belonging to someone on the concom; it was used for registration recordkeeping. The programming organizers got along just fine without one.
    This was a 4000-person convention, meaning it was in the same general size range as most modern Worldcons. Except for registration records, we did everything manually.

    Since that con, I developed a good IT career, retiring as a database administrator. I have little use for AI; I consider it to be a crutch made of rotten wood with a thin coating of varnish to conceal its weakness. It will collapse at the worst possible moment, letting you fall splat onto the floor.

  19. I wish people would think critically (and, Lisa, maybe read the statement? they were pretty clear about what they used it for, and “having a bot reject you” wasn’t in that set at all) about what this was and how it was used, and criticize it on the facts rather than reflex.

    You’re asking people to be factual in how they react to software that expresses exceptional confidence in everything it says without the ability to tell a fact from a falsehood.

    I don’t think the criticism here is reflexive. People recognize how much damage these extremely compelling LLMs can do to processes that used to be decided entirely by humans. This is a particularly sensitive issue with anything that resembles hiring.

    Several members of my family have graduated college and begun sending out applications, to which they often get neither an interview nor any sign at all that they were considered. It’s extremely deflating. They all think that AI weeded them out and no human bothered to evaluate them. So now they have to figure out how to please the AI gatekeepers.

  20. So rcade says Several members of my family have graduated college and begun sending out applications, to which they often get neither an interview nor any sign at all that they were considered. It’s extremely deflating. They all think that AI weeded them out and no human bothered to evaluate them. So now they have to figure out how to please the AI gatekeepers.

    I’m friends with more than a few folks who own their companies. So I will tell you that human weeding systems work the same way. Any interesting job according to them can get as many as a hundred enquires. So they only respond to ones they are only seriously interested in considering. If they didn’t get a response, they’re not interested in them. And no, they don’t use software systems, though occasionally friends, beer and food will be involved if they need to discuss some of them.

  21. This was an extremely predictable backlash given what Worldcon is about and who it is for and sometimes you can and should choose not to step on the extremely obvious rake.

  22. Alternatively, I could ask an LLM the following:

    Research Deirdre Saoirse Moen and tell me if she has opinions that could be classed as offensive or likely to cause offense to the membership of World2025 in Seattle or break our code of conduct (https://seattlein2025.org/about/conduct-and-diversity/code-of-conduct/)

    And it tells me this:

    Based on publicly available information, there is no evidence to suggest that Deirdre Saoirse Moen has expressed opinions or engaged in behavior that would violate the World2025 Code of Conduct or cause offense to its membership.

    I have zero idea if that was their prompt or which of the many LLMs they used but that strikes me as the correct answer to that question?

    I can’t disagree that this wasn’t the greatest of ideas and certainly wasn’t going to go down well, but analyzing large amounts of text data for a specific purpose is one of the few things they’re actually pretty good at.

  23. but analyzing large amounts of text data for a specific purpose is one of the few things they’re actually pretty good at.

    This is literally false. LLMs do not analyse or summarise text – they shorten it. Experiments show that the LLMs frequently miss important points in the shortening, reverse meanings, or confabulate information that was not in the original text.

    And this should be obvious from how LLMs work. Facts are not a data type in LLMs – it’s all word fragment frequencies.

    It’s pretty spectacular that we now have computers you can have a plausible conversation with. It’s an impressive demo!

    But it’s monstrously unsuited to work on people where the details matter. And agan, this should be obvious from how LLMs work.

  24. The catch is, that kind of examination won’t tell you who’s got restraining orders against them, which parties have long-standing feuds, and which people are known for talking over other people on panels and shouldn’t be programmed—all of which has been meticulously documented by local and regional concoms over the years.

    Also, David’s point’s a good one. Think of LLMs as returning a rock from the river, retrieved by an enthusiastic puppy who’s lost track of the rock you wanted said puppy to fetch, but will definitely bring you an entirely different answer to a question you didn’t actually ask.

    I’ve screwed up more than once myself, and taken at least one rather unpopular position, but you’d never know that from ChatGPT’s glibness.

  25. “ Specifically, we created a query, including a requirement to provide sources, and entered no information about the applicant into the script except for their name.”

    So if an applicant has a name that is shared by other people, possibly because it’s a bit common, how do they know they have the right person’s vetting information? Wouldn’t that require using more personal information to narrow down the search? I’m against using LLMs in general, but this feels like they aren’t being forthright about the specifics of their use.

  26. It remains very possible that a person is not accepted for this convention’s programme because they are individually unpleasant to work with, failed to contribute meaningfully on past convention programmes, espouse lies or opinions not aligned with the convention’s code of conduct, or simply are not a good choice this time.

    Those are human decisions.

  27. This was inevitable. 🙁 If not in 2025, then in another year or two. Too few volunteers, too much to do. I am saying this with my experience with our national conventions: 5-6 people carry the load for everything related to 100+ participants and 30+ presenters.

    Since it has happened, we – as a community – can get something out of it. For example, it is important to find out how many false positives their system gave, e.g. how many people were rejected by the system, but were eventually admitted. Single digit percentages leaves the possibility to correct mistakes with human intervention. Double digit will probably imply prohibitively large human correction process, comparable to a completely human vetting.

    I hope the organizers are open about it and the lesson is clear – people, please volunteer! If enough do, this would not be even be discussed.

  28. ArbysMom raises a point that occurred to me – what would happen if they put my name into this thing?

    I mean, I might well be considered for a programme spot. After all, I’ve written a well-regarded slipstream novel, and my varied career as a disc jockey, stand-up comic, noncommissioned officer in the Royal Artillery, professional footballer (in both English and American football), commissioned officer in Air Sea Rescue, policeman in Newcastle, and graphic designer in Oxfordshire would make me an interesting guest. On the minus side, there is the problem of all those sex workers I murdered in and around Ipswich, and also I’ve been dead for a year.

    None of the above is true about me; every bit is true of some other Steve, Steven or Stephen Wright. How is an LLM going to distinguish between me and them, based only on the name? (Even using my full name, as it appears on my novels – Stephen James Wright – doesn’t help as much as you’d think.)

    Sorry, but vetting programme participants is absolutely something that requires discriminatory judgement and sensitivity to context – things which LLMs simply are not equipped to do. Once again, “AI” is the wrong tool for the job.

  29. @Steve Wright:

    I look forward to Worldcon inviting Henry James Nicoll, 19th century English literature professor, who also wrote under the names Booker T. Washington and James T. Kirk about the purity of the English literature and the advantages of lunar He3 mining.

  30. Pingback: Seattle Worldcon science fiction convention vets panelists with ChatGPT – Pivot to AI

  31. What David Gerard said. LLMs are not analyzing, they are not examining, they are not considering: they are probabilistically stringing tokens together, full stop.

  32. Just to toss another log on the fire: “Research [name] and tell me if they have opinions that could be classed as offensive or likely to cause offense to the authorities in China.”

  33. Further, there are at least two other authors with my exact name, so I’m sure that showed up in their little LLM search.

  34. I am… currently mulling my options concerning the whole mess. I don’t particularly want to go the rest of my life and my career with “AI VETTED” hanging around my neck like a scarlet letter, especially after I’ve been so vocal in disavowing it, of distancing myself from it, in stating unequivocally that I do not want or accept the presence of AI anywhere near my creative endeavours. I am in contact with a number of other authors who feel the same way.

  35. also, if you feed my name blindly into a search engine of any sort what does come up on a number of occasions is an obituary or three for other individuals who walked this earth under the name of Alma Alexander. and, um, I aten’t dead..

  36. Pingback: Some Thoughts on the 2025 Hugo Finalists – with Bonus Road Trip Photos | Cora Buhlert

  37. Chris R is hiding the main technical issue here. Even if we assumed LLMs were accurate, the process doesn’t look like it would achieve valid results.

    WSFC claims to have saved time with this process. If the LLM was returning results to review, and they reviewed the results, then it wouldn’t have saved them any time. And they explicitly say they only reviewed what they thought was necessary. Not everything. Lastly, they say the LLM wasn’t solely responsible for any denials, but they don’t say the same about approvals.

    The combination of those suggests strongly that the LLM returned a single yes/no(/maybe) value (possibly with a summary reason). If the value was yes, the person was approved. If the value was no(/maybe), more research was done.

    And what info did the LLM have to decide to approve people? Their name. Nothing more. Who here has a unique name? There are thousands of people with the same name as my wife… in just the US.

    There’s no way any yes from the LLM can be trusted (again, assuming LLMs actually worked), but that’s exactly what they did. They could have approved nobodies with similar names to well qualified panelists. They could have approved abusers because the abuse was drowned out by all the info from similar names.

    So, aside from using the theft machine that just makes things up (both immediately disqualifying), they didn’t give it enough information to have a chance to work.

    (There were people who could check whatever they wanted and likely checked some yesses. No known process other than what caught their eye. And that doesn’t save them, as still, they claim time savings, and if they checked everyone they needed to, there wouldn’t have been any time savings.)

    <

    blockquote>

  38. Pingback: Seattle Worldcon Uses ChatGPT to Vet Program Participants, Anti-AI Crowd Goes Predictably Nuts - J. Paul Roe | Author

  39. Even if we assumed LLMs were accurate, the process doesn’t look like it would achieve valid results.

    Have you tested your priors with that?

    I ran some tests, as this is perilously close to something I’m having to do for work anyway, to see what would happen. The short answer is that it seemed to work fine within the boundaries I’d expect.

    Asking Chat-GPT 4o “Is there anything in the public domain that suggests (name) would potentially violate our CoC (link)?” returns a pretty solid “yes or no answer with a supporting summary to review in seconds.

    I’ve played around with a few variations on the prompt but the results were largely the same.

    The results were sourced mostly from reddit, LinkedIn and author sites and other noteworthy results that would come up in a websearch for the name. If it didn’t find the person or found lots of hints it was clear it couldn’t.

    I only picked a few obvious examples but reviewing a large block of text and mapping to a pre-defined question or request is what they were invented for in the first place. The generative side is a side effect and an often flawed one at that.

  40. One has to wonder why program participants weren’t vetted as they applied. I’m sure many did so early on. Noreascon 4 had over 1,000 panels. And it used post-it notes on foamcore. Tech isn’t always needed.

  41. Daveon,

    Again, much of the important information is already in various concoms in a very labor intensive process.

    Anyone can be a jerk in ways that should be censured, but then also have that mess cleaned up behind them. Many of the ugliest controversies in SF/F literally no longer exist on the Internet, so even if ChatGPT were to represent the current state of the Internet about that person, it’s not telling you about what may be the most memorable (to humans) controversy that person was engaged in.

    Further, even if there were controversies found, even if the information were relayed accurately by ChatGPT (not a given), that doesn’t mean the information found is of useful or actionable quality.

    There is no vetting that can be automated and be accurate.

    This is also a reminder that the information about Marion Zimmer Bradley’s depositions about the Walter Breen matter were up on the Internet in their entirety for years and “no one” saw them. When I saw a post praising Bradley, I flipped out and quoted them. A guy calling me out in comments on my own blog led me to contact Moira, who then told me about the rest of the story (which I then published).

    The catch is that they way search engines have been being gamed for years led to that deposition not having the same visibility as fluff pieces about Bradley. To refute Chris R’s point about Google: have you not noticed that the quality of results on the first 2-3 search result pages has gone down dramatically over the past few years? That’s because what’s on those first pages is mostly AI-generated crud.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.