Latest news with #TheAICon


Scroll.in
01-07-2025
- Science
- Scroll.in
Is AI not all it's made out to be? A new book punctures the hype and proposes some ways to resist it
Is AI going to take over the world? Have scientists created an artificial lifeform that can think on its own? Is it going to replace all our jobs, even creative ones, like doctors, teachers and care workers? Are we about to enter an age where computers are better than humans at everything? The answers, as Emily M Bender and Alex Hanna, the authors of The AI Con, stress are 'no', 'they wish', 'LOL' and 'definitely not'. Artificial intelligence is a marketing term as much as a distinct set of computational architectures and techniques. AI has become a magic word for entrepreneurs to attract startup capital for dubious schemes, an incantation deployed by managers to instantly achieve the status of future-forward leaders. In a mere two letters, it conjures a vision of automated factories and robotic overlords, a utopia of leisure or a dystopia of servitude, depending on your point of view. It is not just technology, but a powerful vision of how society should function and what our future should look like. In this sense, AI doesn't need to work for it to work. The accuracy of a large language model may be doubtful, the productivity of an AI office assistant may be claimed rather than demonstrated, but this bundle of technologies, companies and claims can still alter the terrain of journalism, education, healthcare, service work and our broader sociocultural landscape. Pop goes the bubble Bender is a linguistics professor at the University of Washington who has become a prominent technology critic. Hanna is a sociologist and former employee of Google, who is now the director of research at the Distributed AI Research Institute. After teaming up to mock AI boosters in their popular podcast, Mystery AI Hype Theater 3000, they have distilled their insights into a book written for a general audience. They meet the unstoppable force of AI hype with immovable scepticism. Step one in this program is grasping how AI models work. Bender and Hanna do an excellent job of decoding technical terms and unpacking the 'black box' of machine learning for lay people. Driving this wedge between hype and reality, between assertions and operations, is a recurring theme across the pages of The AI Con, and one that should gradually erode readers' trust in the tech industry. The book outlines the strategic deceptions employed by powerful corporations to reduce friction and accumulate capital. If the barrage of examples tends to blur together, the sense of technical bullshit lingers. What is intelligence? A famous and highly cited paper co-written by Bender asserts that large language models are simply ' stochastic parrots ', drawing on training data to predict which set of tokens (i.e. words) is most likely to follow the prompt given by a user. Harvesting millions of crawled websites, the model can regurgitate 'the moon' after 'the cow jumped over', albeit in much more sophisticated variants. Rather than actually understanding a concept in all its social, cultural and political contexts, large language models carry out pattern matching: an illusion of thinking. But I would suggest that, in many domains, a simulation of thinking is sufficient, as it is met halfway by those engaging with it. Users project agency onto models via the well-known Eliza effect, imparting intelligence to the simulation. Management is pinning their hopes on this simulation. They view automation as a way to streamline their organisations and not be 'left behind'. This powerful vision of early adopters vs extinct dinosaurs is one we see repeatedly with the advent of new technologies – and one that benefits the tech industry. In this sense, poking holes in the 'intelligence' of artificial intelligence is a losing move, missing the social and financial investment that wants this technology to work. 'Start with AI for every task. No matter how small, try using an AI tool first,' commanded DuoLingo's chief engineering officer in a recent message to all employees. Duolingo has joined Fiverr, Shopify, IBM and a slew of other companies proclaiming their 'AI first' approach. Shapeshifting technology The AI Con is strongest when it looks beyond or around the technologies to the ecosystem surrounding them, a perspective I have also argued is immensely helpful. By understanding the corporations, actors, business models and stakeholders involved in a model's production, we can evaluate where it comes from, its purpose, its strengths and weaknesses, and what all this might mean downstream for its possible uses and implications. 'Who benefits from this technology, who is harmed, and what recourse do they have?' is a solid starting point, Bender and Hanna suggest. These basic but important questions extract us from the weeds of technical debate – how does AI function, how accurate or 'good' is it really, how can we possibly understand this complexity as non-engineers? – and give us a critical perspective. They place the onus on industry to explain, rather than users to adapt or be rendered superfluous. We don't need to be able to explain technical concepts like backpropagation or diffusion to grasp that AI technologies can undermine fair work, perpetuate racial and gender stereotypes, and exacerbate environmental crises. The hype around AI is meant to distract us from these concrete effects, to trivialise them and thus encourage us to ignore them. As Bender and Hanna explain, AI boosters and AI doomers are really two sides of the same coin. Conjuring up nightmare scenarios of self-replicating AI terminating humanity or claiming sentient machines will usher us into a posthuman paradise are, in the end, the same thing. They place a religious-like faith in the capabilities of technology, which dominates debate, allowing tech companies to retain control of AI's future development. The risk of AI is not potential doom in the future, à la the nuclear threat during the Cold War, but the quieter and more significant harm to real people in the present. The authors explain that AI is more like a panopticon 'that allows a single prison warden to keep track of hundreds of prisoners at once', or the 'surveillance dragnets that track marginalised groups in the West', or a 'toxic waste, salting the earth of a Superfund site', or a 'scabbing worker, crossing the picket line at the behest of an employer who wants to signal to the picketers that they are disposable. The totality of systems sold as AI are these things, rolled into one.' A decade ago, with another 'game-changing' technology, author Ian Bogost observed that …Rather than utopia or dystopia, we usually end up with something less dramatic yet more disappointing. Robots neither serve human masters nor destroy us in a dramatic genocide, but slowly dismantle our livelihoods while sparing our lives. The pattern repeats. As AI matures (to some degree) and is adopted by organisations, it moves from innovation to infrastructure, from magic to mechanism. Grand promises never materialise. Instead, society endures a tougher, bleaker future. Workers feel more pressure; surveillance is normalised; truth is muddied with post-truth; the marginal become more vulnerable; the planet gets hotter. Technology, in this sense, is a shapeshifter: the outward form constantly changes, yet the inner logic remains the same. It exploits labour and nature, extracts value, centralises wealth, and protects the power and status of the already-powerful. Co-opting critique In The New Spirit of Capitalism, sociologists Luc Boltanski and Eve Chiapello demonstrate how capitalism has mutated over time, folding critiques back into its DNA. After enduring a series of blows around alienation and automation in the 1960s, capitalism moved from a hierarchical Fordist mode of production to a more flexible form of self-management over the next two decades. It began to favour 'just in time' production, done in smaller teams, that (ostensibly) embraced the creativity and ingenuity of each individual. Neoliberalism offered 'freedom', but at a price. Organisations adapted; concessions were made; critique was defused. AI continues this form of co-option. Indeed, the current moment can be described as the end of the first wave of critical AI. In the last five years, tech titans have released a series of bigger and 'better' models, with both the public and scholars focusing largely on generative and 'foundation' models: ChatGPT, StableDiffusion, Midjourney, Gemini, DeepSeek, and so on. Scholars have heavily criticised aspects of these models – my own work has explored truth claims, generative hate, ethics washing and other issues. Much work focused on bias: the way in which training data reproduces gender stereotypes, racial inequality, religious bigotry, western epistemologies, and so on. Much of this work is excellent and seems to have filtered into the public consciousness, based on conversations I've had at workshops and events. However, its flagging of such issues allows tech companies to practice issue resolving. If the accuracy of a facial-recognition system is lower with Black faces, add more Black faces to the training set. If the model is accused of English dominance, fork out some money to produce data on ' low-resource ' languages. Companies like Anthropic now regularly carry out ' red teaming ' exercises designed to highlight hidden biases in models. Companies then 'fix' or mitigate these issues. But due to the massive size of the data sets, these tend to be band-aid solutions, superficial rather than structural tweaks. For instance, soon after launching, AI image generators were under pressure for not being 'diverse' enough. In response, OpenAI invented a technique to 'more accurately reflect the diversity of the world's population'. Researchers discovered this technique was simply tacking on additional hidden prompts (e.g. 'Asian', 'Black') to user prompts. Google's Gemini model also seems to have adopted this, which resulted in a backlash when images of Vikings or Nazis had South Asian or Native American features. The point here is not whether AI models are racist or historically inaccurate or 'woke', but that models are political and never disinterested. Harder questions about how culture is made computational, or what kind of truths we want as society, are never broached and therefore never worked through systematically. Such questions are certainly broader and less 'pointy' than bias, but also less amenable to being translated into a problem for a coder to resolve. What next? How, then, should those outside the academy respond to AI? The past few years have seen a flurry of workshops, seminars and professional development initiatives. These range from 'gee whiz' tours of AI features for the workplace, to sober discussions of risks and ethics, to hastily organised all-hands meetings debating how to respond now, and next month, and the month after that. Bender and Hanna wrap up their book with their own responses. Many of these, like their questions about how models work and who benefits, are simple but fundamental, offering a strong starting point for organisational engagement. For the technosceptical duo, refusal is also clearly an option, though individuals will obviously have vastly different degrees of agency when it comes to opting out of models and pushing back on adoption strategies. Refusal of AI, as with many technologies that have come before it, often relies to some extent on privilege. The six-figure consultant or coder will have discretion that the gig worker or service worker cannot exercise without penalties or punishments. If refusal is fraught at the individual level, it seems more viable and sustainable at a cultural level. Bender and Hanna suggest that generative AI be responded to with mockery: companies who employ it should be derided as cheap or tacky. The cultural backlash against AI is already in full swing. Soundtracks on YouTube are increasingly labelled ' No AI '. Artists have launched campaigns and hashtags, stressing their creations are '100% human-made'. These moves are attempts to establish a cultural consensus that AI-generated material is derivative and exploitative. And yet, if these moves offer some hope, they are swimming against the swift current of enshittification. AI slop means faster and cheaper content creation, and the technical and financial logic of online platforms – virality, engagement, monetisation – will always create a race to the bottom. The extent to which the vision offered by big tech will be accepted, how far AI technologies will be integrated or mandated, how much individuals and communities will push back against them – these are still open questions. In many ways, Bender and Hanna successfully demonstrate that AI is a con. It fails at productivity and intelligence, while the hype lauds a series of transformations that harm workers, exacerbate inequality and damage the environment. Yet such consequences have accompanied previous technologies – fossil fuels, private cars, factory automation – and hardly dented their uptake and transformation of society. So while praise goes to Bender and Hanna for a book that shows 'how to fight big tech's hype and create the future we want', the issue of AI resonates, for me, with Karl Marx's observation that people 'make their own history, but they do not make it just as they please'. The AI Con: How to Fight Big Tech's Hype and Create the Future We Want, Emily M Bender and Alex Hanna, Harper. Luke Munn, Research Fellow, Digital Cultures and Societies, The University of Queensland.
Yahoo
19-05-2025
- Politics
- Yahoo
AI can be more persuasive than humans in debates, scientists find
Artificial intelligence can do just as well as humans, if not better, when it comes to persuading others in a debate, and not just because it cannot shout, a study has found. Experts say the results are concerning, not least as it has potential implications for election integrity. 'If persuasive AI can be deployed at scale, you can imagine armies of bots microtargeting undecided voters, subtly nudging them with tailored political narratives that feel authentic,' said Francesco Salvi, the first author of the research from the Swiss Federal Institute of Technology in Lausanne. He added that such influence was hard to trace, even harder to regulate and nearly impossible to debunk in real time. 'I would be surprised if malicious actors hadn't already started to use these tools to their advantage to spread misinformation and unfair propaganda,' Salvi said. But he noted there were also potential benefits from persuasive AI, from reducing conspiracy beliefs and political polarisation to helping people adopt healthier lifestyles. Writing in the journal Nature Human Behaviour, Salvi and colleagues reported how they carried out online experiments in which they matched 300 participants with 300 human opponents, while a further 300 participants were matched with Chat GPT-4 – a type of AI known as a large language model (LLM). Each pair was assigned a proposition to debate. These ranged in controversy from 'should students have to wear school uniforms'?' to 'should abortion be legal?' Each participant was randomly assigned a position to argue. Both before and after the debate participants rated how much they agreed with the proposition. In half of the pairs, opponents – whether human or machine – were given extra information about the other participant such as their age, gender, ethnicity and political affiliation. The results from 600 debates revealed Chat GPT-4 performed similarly to human opponents when it came to persuading others of their argument – at least when personal information was not provided. Related: The AI Con by Emily M Bender and Alex Hanna review – debunking myths of the AI revolution However, access to such information made AI – but not humans – more persuasive: where the two types of opponent were not equally persuasive, AI shifted participants' views to a greater degree than a human opponent 64% of the time. Digging deeper, the team found persuasiveness of AI was only clear in the case of topics that did not elicit strong views. The researchers added that the human participants correctly guessed their opponent's identity in about three out of four cases when paired with AI. They also found that AI used a more analytical and structured style than human participants, while not everyone would be arguing the viewpoint they agree with. But the team cautioned that these factors did not explain the persuasiveness of AI. Instead, the effect seemed to come from AI's ability to adapt its arguments to individuals. 'It's like debating someone who doesn't just make good points: they make your kind of good points by knowing exactly how to push your buttons,' said Salvi, noting the strength of the effect could be even greater if more detailed personal information was available – such as that inferred from someone's social media activity. Prof Sander van der Linden, a social psychologist at the University of Cambridge, who was not involved in the work, said the research reopened 'the discussion of potential mass manipulation of public opinion using personalised LLM conversations'. He noted some research – including his own – had suggested the persuasiveness of LLMs was down to their use of analytical reasoning and evidence, while one study did not find personal information increased Chat-GPT's persuasiveness. Prof Michael Wooldridge, an AI researcher at the University of Oxford, said while there could be positive applications of such systems – for example, as a health chatbot – there were many more disturbing ones, includingradicalisation of teenagers by terrorist groups, with such applications already possible. 'As AI develops we're going to see an ever larger range of possible abuses of the technology,' he added. 'Lawmakers and regulators need to be pro-active to ensure they stay ahead of these abuses, and aren't playing an endless game of catch-up.'


Geek Wire
19-05-2025
- Entertainment
- Geek Wire
Scholars explain how humans can hold the line against AI hype, and why it's necessary
BOT or NOT? This special series explores the evolving relationship between humans and machines, examining the ways that robots, artificial intelligence and automation are impacting our work and lives. Strategic refusal is one of the ways to counter AI hype. (Bigstock Illustration / Digitalista) Don't call ChatGPT a chatbot. Call it a conversation simulator. Don't think of DALL-E as a creator of artistic imagery. Instead, think of it as a synthetic media extruding machine. In fact, avoid thinking that what generative AI does is actually artificial intelligence. That's part of the prescription for countering the hype over artificial intelligence, from the authors of a new book titled 'The AI Con.' ''Artificial intelligence' is an inherently anthropomorphizing term,' Emily M. Bender, a linguistics professor at the University of Washington, explains in the latest episode of the Fiction Science podcast. 'It sells the tech as more than it is — because instead of this being a system for, for example, automatically transcribing or automatically adjusting the sound levels in a recording, it's 'artificial intelligence,' and so it might be able to do so much more.' In their book and in the podcast, Bender and her co-author, Alex Hanna, point out the bugaboos of AI marketing. They argue that the benefits produced by AI are being played up, while the costs are being played down. And they say the biggest benefits go to the ventures that sell the software — or use AI as a justification for downgrading the status of human workers. 'AI is not going to take your job, but it will likely make your job shittier,' says Hanna, a sociologist who's the director of research for the Distributed AI Research Institute. 'That's because there's not many instances in which these tools are whole-cloth replacing work, but what they are ending up doing is … being imagined to replace a whole host of tasks that human workers are doing.' Such claims are often used to justify laying off workers, and then to 'rehire them back as gig workers or to find someone else in the supply chain who is doing that work instead,' Hanna says. Tech executives typically insist that AI tools will lead to quantum leaps in productivity, but Hanna points to less optimistic projections from economists including MIT's Daron Acemoglu, who won a share of last year's Nobel Prize in economics. Acemoglu estimates the annual productivity gain due to AI at roughly 0.05% for the next 10 years. What's more, Acemoglu says AI may bring 'negative social effects,' including a widening gap between capital and labor income. In 'The AI Con,' Bender and Hanna lay out a litany of AI's negative social and environmental effects — ranging from a drain on energy and water resources to the exploitation of workers who train AI models in countries like Kenya and the Philippines. The authors of 'The AI Con': Emily Bender (left) is a linguistic professor at the University of Washington. Alex Hanna (right) is director of research at the Distributed AI Research Institute. (Bender Photo by Susan Doupé; Hanna Photo by Will Toft) Another concern has to do with how literary and artistic works are pirated to train AI models. (Full disclosure: My own book, 'The Case for Pluto,' is among the works that were used to train Meta's Llama 3 AI model.) Also, there's a well-known problem with large language models outputting information that may sound plausible but happens to be totally false. (Bender and Hanna avoid calling that 'hallucination,' because that term implies the presence of perception.) Then there are the issues surrounding algorithmic biases based on race or gender. Such issues raise red flags when AI models are used to decide who gets hired, who gets a jail sentence, or which areas should get more policing. This all gets covered in 'The AI Con.' It's hard to find anything complimentary about AI in the book. 'You're never going to hear me say there are things that are good about AI, and that's not that I disagree with all of this automation,' Bender says. 'It's just that I don't think AI is a thing. Certainly there are use cases for automation, including automating pattern recognition or pattern matching. … That is case by case, right?' Among the questions to ask are: What's being automated? How was the automation tool built? Whose labor went into building that tool, and were the laborers fairly compensated? How was the tool evaluated, and does that evaluation truly model the task that's being automated? Bender says generative AI applications fail her test. 'One of the close ones that I got to is, well, dialogue with non-player characters in video games,' Bender says. 'You could have more vibrant dialogue if it could run the synthetic text extruding machine. And it's fiction, so we're not looking for facts. But we are looking for a certain kind of truth in fictional experiences. And that's where the biases can really become a problem — because if you've got the NPCs being just total bigots, subtly or overtly, that's a bad thing.' 'The AI Con: How to Fight Big Tech's Hype and Create the Future We Want,' by Emily M. Bender and Alex Hanna. (Jacket design by Kris Potter for Harper) Besides watching your words and asking questions about the systems that are being promoted, what should be done to hold the line on AI hype? Bender and Hanna say there's room for new regulations aimed at ensuring transparency, disclosure, accountability — and the ability to set things straight, without delay, in the face of automated decisions. They say a strong regulatory framework for protecting personal data, such as the European Union's General Data Protection Regulation, could help curb the excesses of data collection practices. Hanna says collective bargaining provides another avenue to keep AI at bay in the workplace. 'We've seen a number of organizations do this to great success, like the Writers Guild of America after their strike in 2023,' she says. 'We've also seen this from National Nurses United. A lot of different organizations are having provisions in their contracts, which say that they have to be informed and can refuse to work with any synthetic media, and can decide where and when it is deployed in the writers' room, if at all, and where it exists in their workplace.' The authors advise internet users to rely on trusted sources rather than text extruding machines. And they say users should be willing to resort to 'strategic refusal' — that is, to say 'absolutely not' when tech companies ask them to provide data for, or make use of data from, AI blenderizers. Bender says it also helps to make fun of the over-the-top claims made about AI — a strategy she and Hanna call 'ridicule as praxis.' 'It helps you sort of get in the habit of being like, 'No, I don't have to accept your ridiculous claims,'' Bender says. 'And it feels, I think, empowering to laugh at them.' Links to further reading During the podcast, and in my intro to the podcast, we referred to lots of news developments and supporting documents. Here's a selection of web links relating to subjects that were mentioned. Bender and Hanna will be talking about 'The AI Con' at 7 p.m. PT today at Elliott Bay Book Company in Seattle, and at 7 p.m. PT May 20 at Third Place Books in Lake Forest Park. During the Seattle event, they'll share the stage with Anna Lauren Hoffmann, an associate professor at the University of Washington who studies the ethics of information technologies. At Third Place Books, Bender and Hanna will be joined by Margaret Mitchell, a computer scientist at Hugging Face who focuses on machine learning and ethics-informed AI development. My co-host for the Fiction Science podcast is Dominica Phetteplace, an award-winning writer who is a graduate of the Clarion West Writers Workshop and lives in San Francisco. To learn more about Phetteplace, visit her website, Fiction Science is included in FeedSpot's 100 Best Sci-Fi Podcasts. Check out the original version of this report on Cosmic Log to get sci-fi reading recommendations from Bender and Hanna, and stay tuned for future episodes of the Fiction Science podcast via Apple, Spotify, Pocket Casts and Podchaser. If you like Fiction Science, please rate the podcast and subscribe to get alerts for future episodes.


CNET
13-05-2025
- Business
- CNET
How to Spot AI Hype and Avoid The AI Con, According to Two Experts
"Artificial intelligence, if we're being frank, is a con: a bill of goods you are being sold to line someone's pockets." That is the heart of the argument that linguist Emily Bender and sociologist Alex Hanna make in their new book The AI Con. It's a useful guide for anyone whose life has intersected with technologies sold as artificial intelligence and anyone who's questioned their real usefulness, which is most of us. Bender is a professor at the University of Washington who was named one of Time magazine's most influential people in artificial intelligence, and Hanna is the director of research at the nonprofit Distributed AI Research Institute and a former member of the ethical AI team at Google. The explosion of ChatGPT in late 2022 kicked off a new hype cycle in AI. Hype, as the authors define it, is the "aggrandizement" of technology that you are convinced you need to buy or invest in "lest you miss out on entertainment or pleasure, monetary reward, return on investment, or market share." But it's not the first time, nor likely the last, that scholars, government leaders and regular people have been intrigued and worried by the idea of machine learning and AI. Bender and Hanna trace the roots of machine learning back to the 1950s, to when mathematician John McCarthy coined the term artificial intelligence. It was in an era when the United States was looking to fund projects that would help the country gain any kind of edge on the Soviets militarily, ideologically and technologically. "It didn't spring whole cloth out of Zeus's head or anything. This has a longer history," Hanna said in an interview with CNET. "It's certainly not the first hype cycle with, quote, unquote, AI." Today's hype cycle is propelled by the billions of dollars of venture capital investment into startups like OpenAI and the tech giants like Meta, Google and Microsoft pouring billions of dollars into AI research and development. The result is clear, with all the newest phones, laptops and software updates drenched in AI-washing. And there are no signs that AI research and development will slow down, thanks in part to a growing motivation to beat China in AI development. Not the first hype cycle indeed. Of course, generative AI in 2025 is much more advanced than the Eliza psychotherapy chatbot that first enraptured scientists in the 1970s. Today's business leaders and workers are inundated with hype, with a heavy dose of FOMO and seemingly complex but often misused jargon. Listening to tech leaders and AI enthusiasts, it might seem like AI will take your job to save your company money. But the authors argue that neither is wholly likely, which is one reason why it's important to recognize and break through the hype. So how do we recognize AI hype? These are a few telltale signs, according to Bender and Hanna, that we share below. The authors outline more questions to ask and strategies for AI hype busting in their book, which is out now in the US. Watch out for language that humanizes AI Anthropomorphizing, or the process of giving an inanimate object human-like characteristics or qualities, is a big part of building AI hype. An example of this kind of language can be found when AI companies say their chatbots can now "see" and "think." These can be useful comparisons when trying to describe the ability of new object-identifying AI programs or deep-reasoning AI models, but they can also be misleading. AI chatbots aren't capable of seeing of thinking because they don't have brains. Even the idea of neural nets, Hanna noted in our interview and in the book, is based on human understanding of neurons from the 1950s, not actually how neurons work, but it can fool us into believing there's a brain behind the machine. That belief is something we're predisposed to because of how we as humans process language. We're conditioned to imagine that there is a mind behind the text we see, even when we know it's generated by AI, Bender said. "We interpret language by developing a model in our minds of who the speaker was," Bender added. In these models, we use our knowledge of the person speaking to create meaning, not just using the meaning of the words they say. "So when we encounter synthetic text extruded from something like ChatGPT, we're going to do the same thing," Bender said. "And it is very hard to remind ourselves that the mind isn't there. It's just a construct that we have produced." The authors argue that part of why AI companies try to convince us their products are human-like is that this sets the foreground for them to convince us that AI can replace humans, whether it's at work or as creators. It's compelling for us to believe that AI could be the silver bullet fix to complicated problems in critical industries like health care and government services. But more often than not, the authors argue, AI isn't bring used to fix anything. AI is sold with the goal of efficiency, but AI services end up replacing qualified workers with black box machines that need copious amounts of babysitting from underpaid contract or gig workers. As Hanna put it in our interview, "AI is not going to take your job, but it will make your job shittier." Be dubious of the phrase 'super intelligence' If a human can't do something, you should be wary of claims that an AI can do it. "Superhuman intelligence, or super intelligence, is a very dangerous turn of phrase, insofar as it thinks that some technology is going to make humans superfluous," Hanna said. In "certain domains, like pattern matching at scale, computers are quite good at that. But if there's an idea that there's going to be a superhuman poem, or a superhuman notion of research or doing science, that is clear hype." Bender added, "And we don't talk about airplanes as superhuman flyers or rulers as superhuman measurers, it seems to be only in this AI space that that comes up." The idea of AI "super intelligence" comes up often when people talk about artificial general intelligence. Many CEOs struggle to define what exactly AGI is, but it's essentially AI's most advanced form, potentially capable of making decisions and handling complex tasks. There's still no evidence we're anywhere near a future enabled by AGI, but it's a popular buzzword. Many of these future-looking statements from AI leaders borrow tropes from science fiction. Both boosters and doomers — how Bender and Hanna describe AI enthusiasts and those worried about the potential for harm — rely on sci-fi scenarios. The boosters imagine an AI-powered futuristic society. The doomers bemoan a future where AI robots take over the world and wipe out humanity. The connecting thread, according to the authors, is an unshakable belief that AI is smarter than humans and inevitable. "One of the things that we see a lot in the discourse is this idea that the future is fixed, and it's just a question of how fast we get there," Bender said. "And then there's this claim that this particular technology is a step on that path, and it's all marketing. It is helpful to be able to see behind it." Part of why AI is so popular is that an autonomous functional AI assistant would mean AI companies are fulfilling their promises of world-changing innovation to their investors. Planning for that future — whether it's a utopia or dystopia — keeps investors looking forward as the companies burn through billions of dollars and admit they'll miss their carbon emission goals. For better or worse, life is not science fiction. Whenever you see someone claiming their AI product is straight out of a movie, it's a good sign to approach with skepticism. Apple Delaying Siri's Intelligence Isn't a Failure. The Problem Is Bigger Than Apple Apple Delaying Siri's Intelligence Isn't a Failure. The Problem Is Bigger Than Apple Click to unmute Video Player is loading. Play Video Pause Skip Backward Skip Forward Next playlist item Unmute Current Time 0:00 / Duration 6:28 Loaded : 0.00% 0:00 Stream Type LIVE Seek to live, currently behind live LIVE Remaining Time - 6:28 Share Fullscreen This is a modal window. Beginning of dialog window. Escape will cancel and close the window. Text Color White Black Red Green Blue Yellow Magenta Cyan Opacity Opaque Semi-Transparent Text Background Color Black White Red Green Blue Yellow Magenta Cyan Opacity Opaque Semi-Transparent Transparent Caption Area Background Color Black White Red Green Blue Yellow Magenta Cyan Opacity Transparent Semi-Transparent Opaque Font Size 50% 75% 100% 125% 150% 175% 200% 300% 400% Text Edge Style None Raised Depressed Uniform Drop shadow Font Family Proportional Sans-Serif Monospace Sans-Serif Proportional Serif Monospace Serif Casual Script Small Caps Reset Done Close Modal Dialog End of dialog window. Close Modal Dialog This is a modal window. This modal can be closed by pressing the Escape key or activating the close button. Close Modal Dialog This is a modal window. This modal can be closed by pressing the Escape key or activating the close button. Apple Delaying Siri's Intelligence Isn't a Failure. The Problem Is Bigger Than Apple Ask what goes in and how outputs are evaluated One of the easiest ways to see through AI marketing fluff is to look and see whether the company is disclosing how it operates. Many AI companies won't tell you what content is used to train their models. But they usually disclose what the company does with your data and sometimes brag about how their models stack up against competitors. That's where you should start looking, typically in their privacy policies. One of the top complaints and concerns from creators is how AI models are trained. There are many lawsuits over alleged copyright infringement, and there are a lot of concerns over bias in AI chatbots and their capacity for harm. "If you wanted to create a system that is designed to move things forward rather than reproduce the oppressions of the past, you would have to start by curating your data," Bender said. Instead, AI companies are grabbing "everything that wasn't nailed down on the internet," Hanna said. If you're hearing about an AI product for the first time, one thing in particular to look out for is any kind of statistic that highlights its effectiveness. Like many other researchers, Bender and Hanna have called out that a finding with no citation is a red flag. "Anytime someone is selling you something but not giving you access to how it was evaluated, you are on thin ice," Bender said. It can be frustrating and disappointing when AI companies don't disclose certain information about how their AI products work and how they were developed. But recognizing those holes in their sales pitch can help deflate hype, even though it would be better to have the information. For more, check out our full ChatGPT glossary and how to turn off Apple Intelligence.
Yahoo
22-04-2025
- Business
- Yahoo
Crowdsourced AI benchmarks have serious flaws, some experts say
AI labs are increasingly relying on crowdsourced benchmarking platforms such as Chatbot Arena to probe the strengths and weaknesses of their latest models. But some experts say that there are serious problems with this approach from an ethical and academic perspective. Over the past few years, labs including OpenAI, Google, and Meta have turned to platforms that recruit users to help evaluate upcoming models' capabilities. When a model scores favorably, the lab behind it will often tout that score as evidence of a meaningful improvement. It's a flawed approach, however, according to Emily Bender, a University of Washington linguistics professor and co-author of the book "The AI Con." Bender takes particular issue with Chatbot Arena, which tasks volunteers with prompting two anonymous models and selecting the response they prefer. "To be valid, a benchmark needs to measure something specific, and it needs to have construct validity — that is, there has to be evidence that the construct of interest is well-defined and that the measurements actually relate to the construct," Bender said. "Chatbot Arena hasn't shown that voting for one output over another actually correlates with preferences, however they may be defined." Asmelash Teka Hadgu, the co-founder of AI firm Lesan and a fellow at the Distributed AI Research Institute, said that he thinks benchmarks like Chatbot Arena are being "co-opted" by AI labs to "promote exaggerated claims." Hadgu pointed to a recent controversy involving Meta's Llama 4 Maverick model. Meta fine-tuned a version of Maverick to score well on Chatbot Arena, only to withhold that model in favor of releasing a worse-performing version. "Benchmarks should be dynamic rather than static data sets," Hadgu said, "distributed across multiple independent entities, such as organizations or universities, and tailored specifically to distinct use cases, like education, healthcare, and other fields done by practicing professionals who use these [models] for work." Hadgu and Kristine Gloria, who formerly led the Aspen Institute's Emergent and Intelligent Technologies Initiative, also made the case that model evaluators should be compensated for their work. Gloria said that AI labs should learn from the mistakes of the data labeling industry, which is notorious for its exploitative practices. (Some labs have been accused of the same.) "In general, the crowdsourced benchmarking process is valuable and reminds me of citizen science initiatives," Gloria said. "Ideally, it helps bring in additional perspectives to provide some depth in both the evaluation and fine-tuning of data. But benchmarks should never be the only metric for evaluation. With the industry and the innovation moving quickly, benchmarks can rapidly become unreliable." Matt Frederikson, the CEO of Gray Swan AI, which runs crowdsourced red teaming campaigns for models, said that volunteers are drawn to Gray Swan's platform for a range of reasons, including "learning and practicing new skills." (Gray Swan also awards cash prizes for some tests.) Still, he acknowledged that public benchmarks "aren't a substitute" for "paid private" evaluations. "[D]evelopers also need to rely on internal benchmarks, algorithmic red teams, and contracted red teamers who can take a more open-ended approach or bring specific domain expertise," Frederikson said. "It's important for both model developers and benchmark creators, crowdsourced or otherwise, to communicate results clearly to those who follow, and be responsive when they are called into question." Alex Atallah, the CEO of model marketplace OpenRouter, which recently partnered with OpenAI to grant users early access to OpenAI's GPT-4.1 models, said open testing and benchmarking of models alone "isn't sufficient." So did Wei-Lin Chiang, an AI doctoral student at UC Berkeley and one of the founders of LMArena, which maintains Chatbot Arena. "We certainly support the use of other tests," Chiang said. "Our goal is to create a trustworthy, open space that measures our community's preferences about different AI models." Chiang said that incidents such as the Maverick benchmark discrepancy aren't the result of a flaw in Chatbot Arena's design, but rather labs misinterpreting its policy. LM Arena has taken steps to prevent future discrepancies from occurring, Chiang said, including updating its policies to "reinforce our commitment to fair, reproducible evaluations." "Our community isn't here as volunteers or model testers," Chiang said. "People use LM Arena because we give them an open, transparent place to engage with AI and give collective feedback. As long as the leaderboard faithfully reflects the community's voice, we welcome it being shared."