logo
There is a vast hidden workforce behind AI

There is a vast hidden workforce behind AI

Mint09-06-2025
WHEN DEEPSEEK, a hotshot Chinese firm, released its cheap large language model late last year it overturned long-standing assumptions about what it will take to build the next generation of artificial intelligence (AI). This will matter to whoever comes out on top in the epic global battle for AI supremacy. Developers are now reconsidering how much hardware, energy and data are needed. Yet another, less discussed, input in machine intelligence is in flux too: the workforce.
To the layman, AI is all robots, machines and models. It is a technology that kills jobs. In fact, there are millions of workers involved in producing AI models. Much of their work has involved tasks like tagging objects in images of roads in order to train self-driving cars and labelling words in the audio recordings used to train speech-recognition systems. Technically, annotators give data the contextual information computers need to work out the statistical associations between components of a dataset and their meaning to human beings. In fact, anyone who has completed a CAPTCHA test, selecting photos containing zebra crossings, may have inadvertently helped train an AI.
This is the 'unsexy" part of the industry, as Alex Wang, the boss of Scale AI, a data firm, puts it. Although Scale AI says most of its contributor work happens in America and Europe, across the industry much of the labour is outsourced to poor parts of the world, where lots of educated people are looking for work. The Chinese government has teamed up with tech companies, such as Alibaba and JD.com, to bring annotation jobs to far-flung parts of the country. In India the IT industry body, Nasscom, reckons annotation revenues could reach $7bn a year and employ 1m people there by 2030. That is significant, since India's entire IT industry is worth $254bn a year (including hardware) and employs 5.5m people.
Annotators have long been compared to parents, teaching models and helping them make sense of the world. But the latest models don't need their guidance in the same way. As the technology grows up, are its teachers becoming redundant?
Data annotation is not new. Fei Fei Li, an American computer scientist known as 'the godmother of AI", is credited with firing the industry's starting gun in the mid-2000s when she created ImageNet, the largest image dataset at the time. Ms Li realised that if she paid college students to categorise the images, which was then how most researchers did things, the task would take 90 years. Instead, she hired workers around the world using Mechanical Turk, an online gig-work platform run by Amazon. She got some 3.2m images organised into a dataset in two and a half years. Soon other AI labs were outsourcing annotation work this way, too.
Over time developers got fed up with the low-quality annotation done by untrained workers on gig-work sites. AI-data firms, such as Sama and iMerit, emerged. They hired workers across the poor world. Informal annotation work continued but specialist platforms emerged for AI work, like those run by Scale AI, which tests and trains workers. The World Bank reckons that between 4.4% and 12.4% of the global workforce is involved in gig work, including annotation for AI. Krystal Kauffman, a Michigan resident who has been doing data work online for a decade, reckons that tech companies have an interest in keeping this workforce hidden. 'They are selling magic—this idea that all these things happen by themselves," Ms Kauffman, says. 'Without the magic part of it, AI is just another product."
A debate in the industry has been about the treatment of the workers behind AI. Firms are reluctant to share information on wages. But American annotators generally consider $10-20 per hour to be decent pay on online platforms. Those in poor countries often get $4-8 per hour. Many must use monitoring tools that track their computer activity and are penalised for being slow. Scale AI has been hit with several lawsuits over its employment practices. The firm denies wrongdoing and says: 'We plan to defend ourselves vigorously."
The bigger issue, though, is that basic annotation work is drying up. In part, this was inevitable. If AI was once a toddler who needed a parent to point things out and to help it make sense of the world around it, the technology has grown into an adolescent who needs occasional specialist guidance and advice. AI labs increasingly use pre-labelled data from other AI labs, which use algorithms to apply labels to datasets.
Take the example of self-driving tractors developed by Blue River Technology, a subsidiary of John Deere, an agricultural-equipment giant. Three years ago the group's engineers in America would upload pictures of farmland into the cloud and provide iMerit staff in Hubli, India, with careful instructions on what to label: tractors, buildings, irrigation equipment. Now the developers use pre-labelled data. They still need iMerit staff to check that labelling and to deal with 'edge cases", for example where a dust cloud obscures part of the landscape or a tree throws shade over crops, confusing the model. A process that took months now takes weeks.
From baby steps
The most recent wave of AI models has changed data work more dramatically. Since 2022, when OpenAI first let the public play with its ChatGPT chatbot, there has been a rush of interest in large language models. Data from Pitchbook, a research firm, suggest that global venture-capital funding for AI startups jumped by more than 50% in 2024 to $131.5bn, even as funding for other startups fell. Much of it is going into newer techniques for developing AI, which do not need data annotated in the same way. Iva Gumnishka at Humans in the Loop, a social enterprise, says firms doing low-skilled annotation for older computer-vision and natural-language-processing clients are being 'left behind".
There is still demand for annotators, but their work has changed. As businesses start to deploy AI, they are building smaller specialised models and looking for highly educated annotators to help. It has become fairly common for adverts for annotation jobs to require a PhD or skills in coding and science. Now that researchers are trying to make AI more multilingual, demand for annotators who speak languages other than English is growing, too. Sushovan Das, a dentist working on medical-AI projects at iMerit, reckons that annotation work will never disappear. 'This world is constantly evolving," he says. 'So the AI needs to be improved time and again."
New roles for humans in training AI are emerging. Epoch AI, a research firm, reckons the stock of high-quality text available for training may be exhausted by 2026. Some AI labs are hiring people to write chunks of text and lines of code that models can be trained on. Others are buying synthetic data, created using computer algorithms, and hiring humans to verify it. 'Synthetic data still needs to be good data," says Wendy Gonzalez, the boss of Sama, which has operations east Africa.
The other role for workers is in evaluating the output from models and helping to hammer it into shape. That is what got ChatGPT to perform better than previous chatbots. Xiaote Zhu at Scale AI provides an example of the sort of open-ended tasks being done on the firm's Outlier platform, which was launched in 2023 to facilitate the training of AI by experts. Workers are presented with two responses from a chatbot recommending an itinerary for a holiday to the Maldives. They need to select which response they prefer, rate it, explain why the answer is good or bad and then rewrite the response to improve it.
Ms Zhu's example is a fairly anodyne one. Yet human feedback is also crucial to making sure AI is safe and ethical. In a document that was published after the launch of ChatGPT in 2022, OpenAI said it had hired experts to 'qualitatively probe, adversarially test and generally provide feedback" on its models. At the end of that process the model refused to respond to certain prompts, such as requests to write social-media content aimed at persuading people to join al-Qaeda, a terrorist group.
Flying the nest
If AI developers had their way they would not need this sort of human input at all. Studies suggest that as much as 80% of the time that goes into the development of AI is spent on data work. Naveen Rao at Databricks, an AI firm, says he would like models to teach themselves, just as he would like his own children to do. 'I want to build self-efficacious humans," he says. 'I want them to have their own curiosity and figure out how to solve problems. I don't want to spoon-feed them every step of the way."
There is a lot of excitement about unsupervised learning, which involves feeding models unlabelled data, and reinforcement learning, which uses trial and error to improve decision-making. AI firms, including Google DeepMind, have trained machines to win at games like Go and chess by playing millions of contests against themselves and tracking which strategies work, without any human input at all. But that self-taught approach doesn't work outside the realms of maths and science, at least for the moment.
Tech nerds everywhere have been blown away by how cheap and efficient DeepSeek's model is. But they are less impressed by DeepSeek's attempt to train AI using feedback generated by computers rather than humans. The model struggled to answer open-ended questions, producing gobbledygook in a mixture of languages. 'The difference is that with Go and chess the desired outcome is crystal clear: win the game," says Phelim Bradley, co-founder of Prolific, another AI-data firm. 'Large language models are more complex and far-reaching, so humans are going to remain in the loop for a long time."
Mr Bradley, like many techies, reckons that more people will need to get involved in training AI, not fewer. Diversity in the workforce matters. When ChatGPT was released a few years ago, people noticed that it overused the word 'delve". The word became seen as 'AI-ese", a telltale sign that the text was written by a bot. In fact, annotators in Africa had been hired to train the model and the word 'delve" is more commonly used in African English than it is in American or British English. In the same way as workers' skills and knowledge are transferred to models, their vocabulary is, too. As it turns out, it takes more than just a village to raise a child.
Clarification: This article has been amended to reflect Scale AI's claim that most of its labour is based in America and Europe.
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Chinese techies return not to impact iPhone 17 production, Apple's ramp-up plan intact
Chinese techies return not to impact iPhone 17 production, Apple's ramp-up plan intact

Time of India

time31 minutes ago

  • Time of India

Chinese techies return not to impact iPhone 17 production, Apple's ramp-up plan intact

By Prasoon Srivastava The return of Chinese technology professionals from an Apple vendor's facility will have no impact on the production of the upcoming iPhone 17, sources aware of the development said on Tuesday. iPhone maker Apple continues to be on track to scale up production in India, sources said. According to the sources, Apple vendors in India, Foxconn and Tata Electronics , have also seen easing of capital goods sourcing from China. These capital goods are critical for the production of iPhones. "The return of Chinese professionals from Foxconn has had no impact on iPhone production. The production of iPhone 17 in India will be as per schedule," a source aware of the development on iPhone production told PTI. A query sent to Apple, Foxconn and Tata Electronics did not elicit any response on the matter. Multiple sources have shared that hundreds of Chinese professionals working at Foxconn India units have returned to China in the last two months. According to sources, these engineers were handling assembly lines, factory design and also involved in training talents to handle tools and machines for iPhone production. Another source said that there has been an easing of the supply of capital goods from China as well for iPhones. "Apple partners have seen an easing of supply of capital goods. So there is no impact on iPhones' production in India," the source said. The person said that there is no change in Apple's plan to ramp up production in India. According to multiple sources, Apple plans to increase iPhone production to 60 million units this year from about 35-40 million units that it produced in 2024-25. Apple CEO Tim Cook, during the second-quarter earnings call had announced that all iPhones sold in the US in the June quarter will be shipped from India. India-made iPhones are assembled in Taiwanese contract manufacturer Foxconn's factory in Tamil Nadu. Tata Electronics, which runs Pegatron Corp's operations in India, is the other key manufacturer. Tata and Foxconn are building new plants and adding production capacity to increase iPhone production. Apple assembled 60 per cent more iPhones, worth an estimated USD 22 billion, in India in the year ended March 31, 2025. According to an analysis by S&P Global, iPhone sales in the US were 75.9 million units in 2024, with exports in March from India at 3.1 million units, suggesting a need to double shipments either through new capacity or redirecting shipments bound for the domestic market. "Apple's Indian exports already headed predominantly to the United States, which represented 81.9 per cent of phones exported by the firm in the three months to February 28, 2025. That increased to 97.6 per cent in March 2025 as a result of a 219 per cent jump in exports, likely reflecting the firm looking to preempt higher tariffs," S&P Global Market Intelligence report said. Union Minister Ashwini Vaishnaw had in April announced that iPhones worth Rs 1.5 lakh crore were exported from India in fiscal year 2025. The Apple ecosystem in India is one of the biggest job creators in the country. It is estimated to have employed around 2 lakh people across various vendors in the country. India's smartphone exports are growing at a healthy rate and have become a "consistent and significant" player in the sector, an official said, adding that the country has become a major mobile manufacturing hub today.

Billionaires ditch Nvidia for this AI stock that's soared 2,000% since 2023
Billionaires ditch Nvidia for this AI stock that's soared 2,000% since 2023

Economic Times

timean hour ago

  • Economic Times

Billionaires ditch Nvidia for this AI stock that's soared 2,000% since 2023

Nvidia has seen massive growth due to the AI boom. However, some hedge fund managers are shifting focus to Palantir Technologies. Like Ken Griffin and Israel Englander, they have reduced their Nvidia holdings while increasing their Palantir stakes recently. Palantir's AI tools are gaining traction, with revenue and customer acquisition growing. Tired of too many ads? Remove Ads Hedge Fund Giants Make a Surprise AI Pivot Tired of too many ads? Remove Ads Nvidia's Growth Continues, But Challenges Loom Palantir's AI Tools Gain Traction Tired of too many ads? Remove Ads FAQs Nvidia has been among the largest beneficiaries of the artificial intelligence (AI) boom triggered by the introduction of ChatGPT late in 2022, with stock price soaring 715%, and its earnings per share rising 1,690% in the last 12 months, as per a report. But some of Wall Street's most influential hedge fund managers are turning their sights on a different AI stock that's recorded a 2,000% gain since January 2023 and that company is Palantir Technologies, as per a report by The Motley to the report, Citadel Advisors' billionaire Ken Griffin disposed of 1.5 million shares of Nvidia, slashing his stake by half. He was meanwhile building up his position in Palantir by 204%, adding 902,400 shares, as per The Motley Fool Israel Englander of Millennium Management also cut his stake in Nvidia by 7% and increased his stake in Palantir by 302% as he bought 986,400 shares in the firm, according to the transactions have led to a rising perception among top investors that Palantir might have even more upside in the artificial intelligence domain than Nvidia, as per The Motley Fool READ: Trump blasted for embarrassing typo in tariff letter — misgenders foreign leader Nvidia keeps posting solid performances, with revenue growing 69% from year earlier to $44 billion in the latest quarter, fuelled by record demand for AI chips, according to the report. Non-GAAP net income rose 33% to $0.81 per share, as per the report. CEO Jensen Huang attributed growth to "incredibly strong" demand for AI infrastructure, reported The Motley some investors became nervous earlier this year after a Chinese AI firm, DeepSeek, had reportedly trained a sophisticated language model on low-cost and less capable chips than those of Nvidia's top chips, as per the report. This may have been taken by some as a warning sign, but some think it might actually give Nvidia's business a boost by making AI affordable for more companies, according to The Motley is still a clear leader in both generative and physical AI, the latter driving applications such as autonomous vehicles and robotics, as per the Motley Fool reported that Wall Street has projected that Nvidia's earnings will rise at 28% annually over the next three to five READ: Trump's tariff bombshell sends copper prices through the roof, sends markets into frenzy — what's next? Palantir is growing fast as during the first quarter, the company saw revenue grow 39% to $884 million and non-GAAP earnings by 62% to $0.13 per diluted share, as reported by The Motley Fool. Customer acquisition also grew 39% in total clients and 124% in spending from repeat customers, according to the management also increased its full-year guidance and sales are now project to jump 36% in 2025, as per the firm designs analytics software for the commercial and government sectors and its core platforms, Gotham and Foundry, let customers integrate and query complex information with analytical applications and machine learning models, as reported by The Motley Research has recognised Palantir as a leader in artificial intelligence and machine learning platforms, awarding its AIP product higher scores than similar tools from Alphabet's Google and Microsoft, according to the READ: VantageScore 4.0 just got a major boost - what it means for your credit and loans Some hedge fund managers believe Palantir may now offer more growth potential in the AI space and are reallocating funds has jumped 2,000% since January 2023, making it one of the best-performing AI stocks.

Billionaires ditch Nvidia for this AI stock that's soared 2,000% since 2023
Billionaires ditch Nvidia for this AI stock that's soared 2,000% since 2023

Time of India

timean hour ago

  • Time of India

Billionaires ditch Nvidia for this AI stock that's soared 2,000% since 2023

Nvidia has seen massive growth due to the AI boom. However, some hedge fund managers are shifting focus to Palantir Technologies. Like Ken Griffin and Israel Englander, they have reduced their Nvidia holdings while increasing their Palantir stakes recently. Palantir's AI tools are gaining traction, with revenue and customer acquisition growing. Tired of too many ads? Remove Ads Hedge Fund Giants Make a Surprise AI Pivot Tired of too many ads? Remove Ads Nvidia's Growth Continues, But Challenges Loom Palantir's AI Tools Gain Traction Tired of too many ads? Remove Ads FAQs Nvidia has been among the largest beneficiaries of the artificial intelligence (AI) boom triggered by the introduction of ChatGPT late in 2022, with stock price soaring 715%, and its earnings per share rising 1,690% in the last 12 months, as per a report. But some of Wall Street's most influential hedge fund managers are turning their sights on a different AI stock that's recorded a 2,000% gain since January 2023 and that company is Palantir Technologies, as per a report by The Motley to the report, Citadel Advisors' billionaire Ken Griffin disposed of 1.5 million shares of Nvidia, slashing his stake by half. He was meanwhile building up his position in Palantir by 204%, adding 902,400 shares, as per The Motley Fool Israel Englander of Millennium Management also cut his stake in Nvidia by 7% and increased his stake in Palantir by 302% as he bought 986,400 shares in the firm, according to the transactions have led to a rising perception among top investors that Palantir might have even more upside in the artificial intelligence domain than Nvidia, as per The Motley Fool READ: Trump blasted for embarrassing typo in tariff letter — misgenders foreign leader Nvidia keeps posting solid performances, with revenue growing 69% from year earlier to $44 billion in the latest quarter, fuelled by record demand for AI chips, according to the report. Non-GAAP net income rose 33% to $0.81 per share, as per the report. CEO Jensen Huang attributed growth to "incredibly strong" demand for AI infrastructure, reported The Motley some investors became nervous earlier this year after a Chinese AI firm, DeepSeek, had reportedly trained a sophisticated language model on low-cost and less capable chips than those of Nvidia's top chips, as per the report. This may have been taken by some as a warning sign, but some think it might actually give Nvidia's business a boost by making AI affordable for more companies, according to The Motley is still a clear leader in both generative and physical AI, the latter driving applications such as autonomous vehicles and robotics, as per the Motley Fool reported that Wall Street has projected that Nvidia's earnings will rise at 28% annually over the next three to five READ: Trump's tariff bombshell sends copper prices through the roof, sends markets into frenzy — what's next? Palantir is growing fast as during the first quarter, the company saw revenue grow 39% to $884 million and non-GAAP earnings by 62% to $0.13 per diluted share, as reported by The Motley Fool. Customer acquisition also grew 39% in total clients and 124% in spending from repeat customers, according to the management also increased its full-year guidance and sales are now project to jump 36% in 2025, as per the firm designs analytics software for the commercial and government sectors and its core platforms, Gotham and Foundry, let customers integrate and query complex information with analytical applications and machine learning models, as reported by The Motley Research has recognised Palantir as a leader in artificial intelligence and machine learning platforms, awarding its AIP product higher scores than similar tools from Alphabet's Google and Microsoft, according to the READ: VantageScore 4.0 just got a major boost - what it means for your credit and loans Some hedge fund managers believe Palantir may now offer more growth potential in the AI space and are reallocating funds has jumped 2,000% since January 2023, making it one of the best-performing AI stocks.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store