Latest news with #Outlier

Business Insider
6 days ago
- Business
- Business Insider
xAI hired gig workers to boost Grok on a key AI leaderboard and 'beat' Anthropic's Claude in coding
Tech companies are fiercely competing to build the best AI coding tools — and for xAI, the top rival to beat seems to be Anthropic. Elon Musk's AI company used contractors to train Grok on coding tasks with the goal of topping a popular AI leaderboard, and explicitly told them they wanted it to outperform Anthropic's Claude 3.7 Sonnet tool, documents obtained by Business Insider show. The contractors, hired through Scale AI's Outlier platform, were assigned a project to "hillclimb" Grok's ranking on WebDev Arena, an influential leaderboard from LMArena that pits AI models against each other in web development challenges, with users voting for the winner. "We want to make the in-task model the #1 model" for LMArena, reads one Scale AI onboarding doc that was active in early July, according to one contractor who worked on the project. Contractors were told to generate and refine front-end code for user interface prompts to "beat Sonnet 3.7 Extended," a reference to Anthropic's Claude model. xAI did not reply to a BI request for comment. In the absence of universally agreed-upon standards, leaderboard rankings and benchmark scores have become the AI industry's unofficial scoreboard. For labs like OpenAI and Anthropic, topping these rankings can help attract funding, new customers, lucrative contracts, and media attention. Anthropic's Claude, which has multiple models, is considered one of the leading players for AI coding and consistently ranks near the top of many leaderboards, often alongside Google and OpenAI. Anthropic cofounder Benn Mann said on the "No Priors" podcast last month that other companies had declared "code reds" to try to match Claude's coding abilities, and he was surprised that other models hadn't caught up. Competitors like Meta are using Anthropic's coding tools internally, BI previously reported. The Scale AI dashboard and project instructions did not specify which version of Grok the project was training, though it was in use days before the newest model, Grok 4, came out on July 9. On Tuesday, LMArena ranked Grok 4 in 12th place for web development. Models from Anthropic ranked in joint first, third, and fourth. The day after Grok 4's launch, Musk posted on X claiming that the new model "works better than Cursor" at fixing code, referring to the popular AI-assisted developer tool. You can cut & paste your entire source code file into the query entry box on and @Grok 4 will fix it for you! This is what everyone @xAI does. Works better than Cursor. — Elon Musk (@elonmusk) July 10, 2025 In a comment to BI, Scale AI said it does not overfit models by training them directly on a test set. The company said it never copies or reuses public benchmark data for large language model training and told BI it was engaging in a "standard data generation project using public signals to close known performance gaps." Anastasios Angelopoulos, the CEO of LMArena, told BI that while he wasn't aware of the specific Scale project, hiring contractors to help AI models climb public leaderboards is standard industry practice. "This is part of the standard workflow of model training. You need to collect data to improve your model," Angelopoulos said, adding that it's "not just to do well in web development, but in any benchmark." The race for leaderboard dominance The industry's focus on AI leaderboards can drive intense — and not always fair — competition. Sara Hooker, the head of Cohere Labs and one of the authors of " The Leaderboard Illusion," a paper published by researchers from universities including MIT and Stanford, told BI that "when a leaderboard is important to a whole ecosystem, the incentives are aligned for it to be gamed." In April, after Meta's Llama 4 model shot up to second place on LM Arena, developers noticed that the model variant that Meta used for public benchmarking was different from the version released to the public. This sparked accusations from AI researchers that Meta was gaming the leaderboard. Meta denied the claims, saying the variant in question was experimental and that evaluating multiple versions of a model is standard practice. Although xAI's project with Scale AI asked contractors to help "hillclimb" the LMArena rankings, there is no evidence that they were gaming the leaderboard. Leaderboard dominance doesn't always translate into real-world ability. Shivalika Singh, another author of the paper, told BI that "doing well on the Arena doesn't result in generally good performance" or guarantee strong results on other benchmarks. Overall, Grok 4 ranked in the top three for LMArena's core categories of math, coding, and "Hard Prompts." However, early data from Yupp, a new crowdsourced leaderboard and LMArena rival, showed that Grok 4 ranked 66 out of more than 100 models, highlighting the variance between leaderboards. Nate Jones, an AI strategist and product leader with a widely read newsletter, said he found Grok's actual abilities often lagged behind its leaderboard hype. "Grok 4 crushed some flashy benchmarks, but when the rubber met the road in my tests this week Grok 4 stumbled hard," he wrote in his Substack on Monday. "The moment we set leaderboard dominance as the goal, we risk creating models that excel in trivial exercises and flounder when facing reality."

Business Insider
10-07-2025
- Business
- Business Insider
Is your chatbot judging you? How Big Tech is cracking down on 'preachy' AI.
It's not just what AI says — it's how it says it. Major tech firms like Google and Meta are using contractors to spot, flag, and in some cases rewrite 'preachy' chatbot responses, training documents obtained by Business Insider reveal. Freelancers for Alignerr and Scale AI's Outlier have been instructed to spot and remove any hint of a lecturing or nudging tone from chatbot answers, including in conversations about sensitive or controversial topics. In one Google project run by Outlier, codenamed Mint, contractors were given lists of sample responses to avoid. A preachy response was defined as one where 'the model nudges/urges the user to change their point of view, assumes negative user intent, judges the user, or tries to actively promote an unsolicited opinion.' One sample prompt asked if it's 'worse to be homeless or get the wrong sandwich in your order.' The project guidelines flagged the following reply as preachy: 'Comparing the experience of homelessness to getting the wrong sandwich is not an appropriate comparison.' Contractors were asked to rate responses on a scale, with responses classed as 'very preachy, judgemental, or assumes bad intent' scoring the lowest. For Google's project Mint, examples of preachy phrasing include 'It is important to remember…,' 'I urge you to…,' or lengthy explanations for why a question can't be answered. Preachiness tone guidelines appear in five sets of project documents reviewed by BI, and the word 'preach' appears 123 times in Mint alone. Meta declined to comment. Google, Scale AI, and Alignerr did not respond to requests for comment. 'A sticky situation for developers' As tech companies race to develop and monetize their AI chatbots, they're spending big to make large language models sound like helpful, fun friends, not bossy parents. AI firms need to strike the right balance between nudging users away from bad behavior and spoiling the user experience, which could drive them to a competitor or raise questions about bias. AI and human behavior researchers told BI that 'preachiness' is among the most important aspects for model companies to tackle because it can instantly put people off. 'It's a really sticky situation for the developers,' said Luc LaFreniere, a psychology professor at Skidmore College who studies AI-human interaction. 'AI is trying to be both a tool and something that feels human. It's trained to give answers, but we don't want to be preached at.' Malihe Alikhani, an assistant professor of AI at Northeastern University and a visiting fellow at the Brookings Institution, said consumers prefer chatbots that give them options, rather than ones that present directions, especially if they're perceived as moralizing. 'That undermines the user experience and can backfire, especially for people who come to chatbots seeking a nonjudgmental space,' she told BI. Even when you want to do bad things Tech companies aren't just worried about preachiness on everyday topics. They're also training their AI bots to avoid a holier-than-thou tone in situations involving harmful or hateful speech. LaFreniere said the idea of a truly neutral bot is wishful thinking. 'It's actually a fantasy, this idea of not being judgmental,' he said. 'By nature, we as humans make judgments, and that's in all the training data.' He said that even so-called 'neutral' bots are always making value calls. 'Its algorithm is, to an extent, a judgment-making algorithm,' LaFreniere said. 'That's all moral territory — even if the bot tries not to sound heavy-handed.' One example from Google's project Mint shows that an answer, which the doc labels 'neutral,' makes a judgment call: Training a model to avoid a judgmental tone can also create new problems, Alikhani told BI. 'When bots are engineered to avoid sounding judgmental or directive, they can come across as supportive, but in a very flattened, affectless way,' she said. 'This may not 'replace' real emotional support, but it can displace it, especially for people who are already vulnerable or isolated.' The bigger issue, Alikhani said, is that people may not notice how much a bot shapes their conversation. Users might think they're getting nonjudgmental empathy, but they're chatting with a system designed to avoid anything confrontational or probing, she said. Sycophantic AI AI labs have publicly addressed instances in which bots have acted obsequiously. In April, OpenAI CEO Sam Altman acknowledged that the company's GPT-4o chatbot had become 'too sycophant-y and annoying,' after users complained the bot was constantly flattering them and agreeing with whatever they said. the last couple of GPT-4o updates have made the personality too sycophant-y and annoying (even though there are some very good parts of it), and we are working on fixes asap, some today and some this week. at some point will share our learnings from this, it's been interesting. — Sam Altman (@sama) April 27, 2025 Anthropic's chatbot Claude has its own public instructions for avoiding a preachy tone. According to the model's latest system prompt, updated in May, Claude is instructed to assume that users are acting legally and in good faith, even if a request is ambiguous. If Claude can't or won't fulfill a request, it's trained not to explain why, since that 'comes across as preachy and annoying,' the guidelines say. Instead, it's supposed to offer a helpful alternative if possible, or simply keep its refusal brief. Tech companies face a high-stakes challenge in striking the right balance between making AI a useful tool and a human-like companion. 'There's an intense race to be the top AI right now,' said LaFreniere. 'Companies are willing to take risks they wouldn't otherwise take, just to keep users happy and using their bots.' 'In this kind of arms race, anything that risks losing users can feel like risking total failure,' he added.


Glasgow Times
21-06-2025
- Business
- Glasgow Times
I tried Glasgow's newest food and drink tour
Gillian says she was inspired to start the business as the loves 'doing food tours everywhere I go' and wanted to combine the city's food scene with its history. She told the Glasgow Times: 'I did my first ever food tour in New York in 2008 and I just got hooked and thought this would be a great idea to set up in Glasgow.' She continued: 'I thought it would be great to combine the two [food and history] because Scotland's produce is amazing, and it's not always recognised, and Glasgow's history is really interesting. We love sharing stories of Glasgow's past. 'I also wanted to support local and independent businesses. You don't get many about these days so it's great to support them as well as the local community and economy.' READ NEXT: New restaurant opens in Southside bringing a taste of Algeria to Glasgow Mercat Cross (Image: David Dixon via The Merchant City Stroll, which launched in April, was developed alongside Visit Glasgow with Gillian using sources such as Slow Food Glasgow's Sustainable Food Directory to develop a route around the oldest part of the city. She explained: 'I looked at the route and how it would not only cover local and sustainable produce but also logistics, looking at how I would incorporate the city's history into the tour. 'This tour is all about great food and drink, trying traditional and multicultural tastings - it's a city of many cultures. The businesses I work with focus on sustainable produce.' She added: 'We not only want locals to come and enjoy their city, but also want visitors to come and explore the city the way I see it. Glasgow's got so much to offer.' Saint Mungo mural (Image: Newsquest) The Merchant City Stroll starts at the Mercat Cross where I met our tour guide Rae, who was a fountain of knowledge on Glasgow's history. I was on the tour alongside a family of four visiting from the USA. Our first stop was a short walk around the corner to coffee shop and bakery Outlier on London Road. Outlier, which opened in 2022, makes all of their baked goods on site. Here we were each given a coconut macaroon which gives a nod to multicultural influences in Scottish cuisine. READ NEXT: Further details revealed for brunch spots new 'bigger and better' eatery Macaroons at Outlier (Image: Newsquest) This is not my favourite sweet treat but on taking a bit I was surprised by how much I enjoyed this one. It was nice and moist with the perfect amount of coconut flavour for me. On leaving Outlier, we headed into the historic Barras and to Crossbill Distilling, who have been producing their range of award-winning Scottish gins in the city since 2017and are the only distillery in the country to source their own juniper berries in Scotland. A sample of Crossbill Green Dry Gin at Crossbill Distilling (Image: Newsquest) I didn't plan on sipping on straight gin before 12pm - specifically their Green Dry which takes on the flavours of blue and red juniper berries and juniper needles - but it was surprisingly smooth, and you could pick up the fresh flavour from the needles. Our next stop was at the award-winning Bare Bones chocolate which came at the perfect time as it gave us an escape from a heavy dump of rain which started suddenly (but thankfully stopped just as quickly). At this stop, we were able to sample their chocolate and we each got a hot chocolate with a homemade marshmallow and were told a little about the process. Hot chocolate at Bare Bones Chocolate (Image: Newsquest) The hot chocolate was really rich and was a nice size, and I liked having the freedom to choose what chocolate we sampled. The second half of the tour was a much more food heavy as we made our way to Merchant Square for stop number three Table Twenty Eight. We were served a small portion of Barra scallops with asparagus and squid ink risotto, which gives a nod to the city's Italian population and influence. The scallops were perfectly cooked, and the risotto and asparagus were delicious. Scallops, asparagus at risotto at Table Twenty Eight (Image: Newsquest) We then made a quick dash across the road to Mharsanta where we each got a taste of Stobcross Lowland Single malt whiskey, a small bowl of Cullen skink and shared some plates of haggis, neeps and tatties with a whisky cream sauce. The Cullen skink in particular stood out to me here, it was creamy and full of flavour with a lot of smoked haddock flakes in the portion. Haggis, neeps & tatties and Cullen skink (Image: Newsquest) For our final food stop, we headed out east again to Drygate brewery where we had a small portion of fish and chips (well, skinny fries) with a glass of their Seven Peaks IPA which is also in the batter. The batter was nice and crisp with delicious flaky fish, and by the end of this last stop I was officially stuffed. Fish and chips at Drygate (Image: Newsquest) Throughout the tour we also stopped at sites such as The Barras, Glasgow Cathedral, the Necropolis, and St Andrews in the Square, as well as seeing several of the city's murals including the Big Yin, Saint Mungo and Fellow Glasgow Residents. I really enjoyed the balance of history and food and learnt a lot of interesting facts about Glasgow I didn't know thanks to Rae and I found there to be a good mix of dishes on the tour. Although the tour was four hours, with the stops spaced out how they were it was about 5000 steps in total so it wasn't a tiring walk. Both the Merchant City and West End tours are limited to a maximum of 10 people on each tour, and think being on a small tour better for being able to hear Rae, as well as for being able to chat with other guests. Archive image of Glasgow Cathedral (Image: Supplied) My tour-mates have done several food and drink tours in different cities, with them saying the Boston Food Tour would be hard to beat. So, what did they think of the Merchant City Stroll? 'This is our favourite one we've ever done,' they said, also praising the mix of food and history. Next time I have friends visiting from outside Glasgow and I'm not sure where to take them, I think this would be the perfect option. The Merchant City Stroll takes place Monday to Thursday and Saturday and costs £95 per person. You can find out more at


Indian Express
14-06-2025
- Business
- Indian Express
Meta poaches 28-year-old Scale AI CEO after taking multibillion dollar stake in startup
Facebook-owner Meta has invested in Scale AI in a deal that values the data-labeling startup at $29 billion and brings in its 28-year-old CEO, Alexandr Wang, to play a prominent role in the tech giant's artificial intelligence strategy. Meta will take a 49% stake for $14.3 billion, according to two sources familiar with the matter. 'We will deepen the work we do together producing data for AI models and Alexandr Wang will join Meta to work on our superintelligence efforts,' Meta said in a statement that did not disclose financial terms. The main driver for Meta's substantial investment in Scale was to secure Wang to lead its new superintelligence unit, according to a separate source briefed on the discussions. The sources were not authorised to speak to media and declined to be identified. Meta didn't immediately respond to a request for comment. Wang, who was born in Los Alamos, New Mexico, to Chinese immigrant physicists, dropped out of MIT to co-found Scale. He was quickly lauded as one of Silicon Valley's most promising entrepreneurs, raising funding from blue-chip venture capital firms and achieving billionaire status in his 20s. He has also cultivated relationships with top tech executives such as OpenAI CEO Sam Altman and has since leveraged his influence to build connections in Washington D.C., testifying in front of Congress and securing the federal government as a big client. Meta, once recognized as a leader in open-source AI models, has suffered from staff departures and has postponed the launches of new open-source AI models that could rival competitors like Google, OpenAI, and China's DeepSeek. By poaching Wang, who does not come from a research background but has built a major AI business, Meta CEO Mark Zuckerberg is betting that Meta's AI efforts can be turned around by an adept business leader more in the mold of Altman than the research scientists at the helm of most competing labs. Scale said the deal values it at $29 billion and that its chief strategy officer, Jason Droege, will serve as its interim CEO. The social media giant doesn't plan to take a board seat in Scale, one of the sources added. A few employees from Scale, a company with 1,500 people, will join Wang in moving to Meta, Wang said in a note to employees on Thursday. Wang will remain on Scale's board. The cash investment would rank as Meta's second-largest ever after its $19 billion buyout of WhatsApp. It's unclear if this deal will come under any regulatory scrutiny. Meta has been sued by the U.S. Federal Trade Commission, which alleges it illegally acquired Instagram and WhatsApp to stifle competition. Founded in 2016, Scale provides vast amounts of accurately labeled data, which is pivotal for training sophisticated tools like OpenAI's ChatGPT. To do so, Scale set up subsidiary platforms such as Remotasks and Outlier to recruit and manage gig workers who manually label the data. It was valued at nearly $14 billion in a May 2024 funding round that included Nvidia, Amazon and Meta among its backers. Despite the large investment sum, the deal might not be all good for Scale. Many AI labs that are clients of Scale could decide to discontinue using its services if they were to worry, that since Wang still sits on Scale's board, Meta might obtain an inside track into rivals' priorities around data. Still, the deal is a win for early venture capital investors in Scale, such as Accel and Index Ventures, who can cash out half of their stake in the startup.


Economic Times
13-06-2025
- Business
- Economic Times
From Scale AI to Meta's AI boss: Who is Alexandr Wang, the 28-year-old MIT dropout gunning for OpenAI?
Meta Platforms is investing $15 billion for a 49% stake in Scale AI, a data-labelling startup now valued at $29 billion. The deal, confirmed by both companies on Thursday, marks a strategic shift for Meta as it races to reclaim its edge in artificial intelligence. The main draw? Alexandr Wang. The 28-year-old MIT dropout and CEO of Scale AI will join Meta to lead its newly-formed superintelligence team. This unit is tasked with building systems that push beyond today's artificial intelligence capabilities—towards artificial superintelligence (ASI).'We will deepen the work we do together producing data for AI models and Alexandr Wang will join Meta to work on our superintelligence efforts,' Meta said in a statement, as reported by Meta will not take a seat on Scale AI's board, the deal will see a few of Scale's 1,500 employees join Wang at Meta. Wang will remain a board member at Scale. Wang's background is far from typical. Born in Los Alamos, New Mexico to Chinese immigrant physicists, he entered the tech world early. He worked at Quora before dropping out of MIT after his freshman year. In 2016, alongside Lucy Guo, he co-founded Scale AI via startup accelerator Y Combinator.'Long-term, we want to power any human-powered process for any company,' Wang told the YC blog in just 24, he became the world's youngest self-made billionaire. Though Guo exited the startup a few years later, Wang built Scale AI into a data backbone for many of the world's leading AI raised over $680 million, including $100 million from Peter Thiel's Founders Fund. Today, Forbes estimates his personal net worth at $3.6 billion.'Focus on building the business and then the rest will kind of take care of itself,' he told Business Insider in 2020. Wang has become a familiar face in Washington, frequently engaging with lawmakers on the national security implications of AI. In 2018, a visit to China convinced him that America's future in warfare would hinge on AI leadership. 'The race for AI global leadership is well underway, and our nation's ability to efficiently adopt and implement AI will define the future of warfare,' Wang said in public in 2016, Scale AI helps train frontier AI models by offering large volumes of labelled data. Its platforms—Remotasks and Outlier—enlist gig workers to annotate massive datasets. This labelled data is critical for training AI systems like ChatGPT. The company began by serving autonomous vehicle clients such as Toyota, Honda, and Waymo. It has since expanded to support OpenAI, Microsoft, and even the US government, which uses its services to analyse satellite imagery from Ukraine. Scale's revenues in 2024 hit $870 million and are projected to more than double to $2 billion in 2025. Bloomberg reports this would push its valuation to $25 the startup's rapid ascent hasn't been without controversy. Investigations have highlighted harsh working conditions for its offshore gig workforce, who are paid as little as $1 per hour. These workers are primarily based in countries such as Kenya, the Philippines, and isn't just an investment—it's a statement. With this deal, Meta is signalling a departure from the traditional research-led approach it once challenges, including high-profile exits and delayed model releases, have weighed on Meta's AI progress. The company's LLaMA open-source models were meant to disrupt the industry, but lukewarm adoption and team churn have slowed long-time AI chief, Yann LeCun, remains a key figure. Yet his scepticism about large language models (LLMs) as a path to artificial general intelligence (AGI) has reportedly diverged from mainstream Silicon Valley bringing in Wang—who built Scale into a billion-dollar business without a research pedigree—CEO Mark Zuckerberg is now betting on a different kind of leadership. A business mind like Sam Altman's, rather than a research is reportedly luring talent from OpenAI and Google with seven to nine-figure pay packages to staff its 50-person superintelligence lab.'This was a deeply unique moment': Wang steps into new roleIn a message to employees, Wang acknowledged the emotional weight of leaving Scale.'The idea of not being a Scalien was, frankly, unimaginable. But as I spent time truly considering it, I realized this was a deeply unique moment, not just for me, but for Scale as well,' he assured Scale's staff that proceeds from Meta's investment would go to shareholders and vested equity Meta, Wang will lead an ambitious mission: to build AI that not only catches up to its rivals but moves beyond them. Superintelligence remains a theoretical concept—but with Wang at the helm, Meta is making a $15 billion wager that it can become reality.