Latest news with #ImageNet

Top AI researchers say language is limiting. Here's the new kind of model they are building instead.

Business Insider

13-06-2025

Business
Business Insider

Top AI researchers say language is limiting. Here's the new kind of model they are building instead.

As OpenAI, Anthropic, and Big Tech invest billions in developing state-of-the-art large-language models, a small group of AI researchers is working on the next big thing. Computer scientists like Fei-Fei Li, the Stanford professor famous for inventing ImageNet, and Yann LeCun, Meta's chief AI scientist, are building what they call "world models." Unlike large-language models, which determine outputs based on statistical relationships between the words and phrases in their training data, world models predict events based on the mental constructs that humans make of the world around them. "Language doesn't exist in nature," Li said on a recent episode of Andreessen Horowitz's a16z podcast. "Humans," she said, "not only do we survive, live, and work, but we build civilization beyond language." Computer scientist and MIT professor, Jay Wright Forrester, in his 1971 paper "Counterintuitive Behavior of Social Systems," explained why mental models are crucial to human behavior: Each of us uses models constantly. Every person in private life and in business instinctively uses models for decision making. The mental images in one's head about one's surroundings are models. One's head does not contain real families, businesses, cities, governments, or countries. One uses selected concepts and relationships to represent real systems. A mental image is a model. All decisions are taken on the basis of models. All laws are passed on the basis of models. All executive actions are taken on the basis of models. The question is not to use or ignore models. The question is only a choice among alternative models. If AI is to meet or surpass human intelligence, then the researchers behind it believe it should be able to make mental models, too. Li has been working on this through World Labs, which she cofounded in 2024 with an initial backing of $230 million from venture firms like Andreessen Horowitz, New Enterprise Associates, and Radical Ventures. "We aim to lift AI models from the 2D plane of pixels to full 3D worlds — both virtual and real — endowing them with spatial intelligence as rich as our own," World Labs says on its website. Li said on the No Priors podcast that spatial intelligence is "the ability to understand, reason, interact, and generate 3D worlds," given that the world is fundamentally three-dimensional. Li said she sees applications for world models in creative fields, robotics, or any area that warrants infinite universes. Like Meta, Anduril, and other Silicon Valley heavyweights, that could mean advances in military applications by helping those on the battlefield better perceive their surroundings and anticipate their enemies' next moves. The challenge of building world models is the paucity of sufficient data. In contrast to language, which humans have refined and documented over centuries, spatial intelligence is less developed. "If I ask you to close your eyes right now and draw out or build a 3D model of the environment around you, it's not that easy," she said on the No Priors podcast. "We don't have that much capability to generate extremely complicated models till we get trained." To gather the data necessary for these models, "we require more and more sophisticated data engineering, data acquisition, data processing, and data synthesis," she said. That makes the challenge of building a believable world even greater. At Meta, chief AI scientist Yann LeCun has a small team dedicated to a similar project. The team uses video data to train models and runs simulations that abstract the videos at different levels. "The basic idea is that you don't predict at the pixel level. You train a system to run an abstract representation of the video so that you can make predictions in that abstract representation, and hopefully this representation will eliminate all the details that cannot be predicted," he said at the AI Action Summit in Paris earlier this year. That creates a simpler set of building blocks for mapping out trajectories for how the world will change at a particular time. LeCun, like Li, believes these models are the only way to create truly intelligent AI. "We need AI systems that can learn new tasks really quickly," he said recently at the National University of Singapore. "They need to understand the physical world — not just text and language but the real world — have some level of common sense, and abilities to reason and plan, have persistent memory — all the stuff that we expect from intelligent entities."

Mint

09-06-2025

Business
Mint

There is a vast hidden workforce behind AI

WHEN DEEPSEEK, a hotshot Chinese firm, released its cheap large language model late last year it overturned long-standing assumptions about what it will take to build the next generation of artificial intelligence (AI). This will matter to whoever comes out on top in the epic global battle for AI supremacy. Developers are now reconsidering how much hardware, energy and data are needed. Yet another, less discussed, input in machine intelligence is in flux too: the workforce. To the layman, AI is all robots, machines and models. It is a technology that kills jobs. In fact, there are millions of workers involved in producing AI models. Much of their work has involved tasks like tagging objects in images of roads in order to train self-driving cars and labelling words in the audio recordings used to train speech-recognition systems. Technically, annotators give data the contextual information computers need to work out the statistical associations between components of a dataset and their meaning to human beings. In fact, anyone who has completed a CAPTCHA test, selecting photos containing zebra crossings, may have inadvertently helped train an AI. This is the 'unsexy" part of the industry, as Alex Wang, the boss of Scale AI, a data firm, puts it. Although Scale AI says most of its contributor work happens in America and Europe, across the industry much of the labour is outsourced to poor parts of the world, where lots of educated people are looking for work. The Chinese government has teamed up with tech companies, such as Alibaba and to bring annotation jobs to far-flung parts of the country. In India the IT industry body, Nasscom, reckons annotation revenues could reach $7bn a year and employ 1m people there by 2030. That is significant, since India's entire IT industry is worth $254bn a year (including hardware) and employs 5.5m people. Annotators have long been compared to parents, teaching models and helping them make sense of the world. But the latest models don't need their guidance in the same way. As the technology grows up, are its teachers becoming redundant? Data annotation is not new. Fei Fei Li, an American computer scientist known as 'the godmother of AI", is credited with firing the industry's starting gun in the mid-2000s when she created ImageNet, the largest image dataset at the time. Ms Li realised that if she paid college students to categorise the images, which was then how most researchers did things, the task would take 90 years. Instead, she hired workers around the world using Mechanical Turk, an online gig-work platform run by Amazon. She got some 3.2m images organised into a dataset in two and a half years. Soon other AI labs were outsourcing annotation work this way, too. Over time developers got fed up with the low-quality annotation done by untrained workers on gig-work sites. AI-data firms, such as Sama and iMerit, emerged. They hired workers across the poor world. Informal annotation work continued but specialist platforms emerged for AI work, like those run by Scale AI, which tests and trains workers. The World Bank reckons that between 4.4% and 12.4% of the global workforce is involved in gig work, including annotation for AI. Krystal Kauffman, a Michigan resident who has been doing data work online for a decade, reckons that tech companies have an interest in keeping this workforce hidden. 'They are selling magic—this idea that all these things happen by themselves," Ms Kauffman, says. 'Without the magic part of it, AI is just another product." A debate in the industry has been about the treatment of the workers behind AI. Firms are reluctant to share information on wages. But American annotators generally consider $10-20 per hour to be decent pay on online platforms. Those in poor countries often get $4-8 per hour. Many must use monitoring tools that track their computer activity and are penalised for being slow. Scale AI has been hit with several lawsuits over its employment practices. The firm denies wrongdoing and says: 'We plan to defend ourselves vigorously." The bigger issue, though, is that basic annotation work is drying up. In part, this was inevitable. If AI was once a toddler who needed a parent to point things out and to help it make sense of the world around it, the technology has grown into an adolescent who needs occasional specialist guidance and advice. AI labs increasingly use pre-labelled data from other AI labs, which use algorithms to apply labels to datasets. Take the example of self-driving tractors developed by Blue River Technology, a subsidiary of John Deere, an agricultural-equipment giant. Three years ago the group's engineers in America would upload pictures of farmland into the cloud and provide iMerit staff in Hubli, India, with careful instructions on what to label: tractors, buildings, irrigation equipment. Now the developers use pre-labelled data. They still need iMerit staff to check that labelling and to deal with 'edge cases", for example where a dust cloud obscures part of the landscape or a tree throws shade over crops, confusing the model. A process that took months now takes weeks. From baby steps The most recent wave of AI models has changed data work more dramatically. Since 2022, when OpenAI first let the public play with its ChatGPT chatbot, there has been a rush of interest in large language models. Data from Pitchbook, a research firm, suggest that global venture-capital funding for AI startups jumped by more than 50% in 2024 to $131.5bn, even as funding for other startups fell. Much of it is going into newer techniques for developing AI, which do not need data annotated in the same way. Iva Gumnishka at Humans in the Loop, a social enterprise, says firms doing low-skilled annotation for older computer-vision and natural-language-processing clients are being 'left behind". There is still demand for annotators, but their work has changed. As businesses start to deploy AI, they are building smaller specialised models and looking for highly educated annotators to help. It has become fairly common for adverts for annotation jobs to require a PhD or skills in coding and science. Now that researchers are trying to make AI more multilingual, demand for annotators who speak languages other than English is growing, too. Sushovan Das, a dentist working on medical-AI projects at iMerit, reckons that annotation work will never disappear. 'This world is constantly evolving," he says. 'So the AI needs to be improved time and again." New roles for humans in training AI are emerging. Epoch AI, a research firm, reckons the stock of high-quality text available for training may be exhausted by 2026. Some AI labs are hiring people to write chunks of text and lines of code that models can be trained on. Others are buying synthetic data, created using computer algorithms, and hiring humans to verify it. 'Synthetic data still needs to be good data," says Wendy Gonzalez, the boss of Sama, which has operations east Africa. The other role for workers is in evaluating the output from models and helping to hammer it into shape. That is what got ChatGPT to perform better than previous chatbots. Xiaote Zhu at Scale AI provides an example of the sort of open-ended tasks being done on the firm's Outlier platform, which was launched in 2023 to facilitate the training of AI by experts. Workers are presented with two responses from a chatbot recommending an itinerary for a holiday to the Maldives. They need to select which response they prefer, rate it, explain why the answer is good or bad and then rewrite the response to improve it. Ms Zhu's example is a fairly anodyne one. Yet human feedback is also crucial to making sure AI is safe and ethical. In a document that was published after the launch of ChatGPT in 2022, OpenAI said it had hired experts to 'qualitatively probe, adversarially test and generally provide feedback" on its models. At the end of that process the model refused to respond to certain prompts, such as requests to write social-media content aimed at persuading people to join al-Qaeda, a terrorist group. Flying the nest If AI developers had their way they would not need this sort of human input at all. Studies suggest that as much as 80% of the time that goes into the development of AI is spent on data work. Naveen Rao at Databricks, an AI firm, says he would like models to teach themselves, just as he would like his own children to do. 'I want to build self-efficacious humans," he says. 'I want them to have their own curiosity and figure out how to solve problems. I don't want to spoon-feed them every step of the way." There is a lot of excitement about unsupervised learning, which involves feeding models unlabelled data, and reinforcement learning, which uses trial and error to improve decision-making. AI firms, including Google DeepMind, have trained machines to win at games like Go and chess by playing millions of contests against themselves and tracking which strategies work, without any human input at all. But that self-taught approach doesn't work outside the realms of maths and science, at least for the moment. Tech nerds everywhere have been blown away by how cheap and efficient DeepSeek's model is. But they are less impressed by DeepSeek's attempt to train AI using feedback generated by computers rather than humans. The model struggled to answer open-ended questions, producing gobbledygook in a mixture of languages. 'The difference is that with Go and chess the desired outcome is crystal clear: win the game," says Phelim Bradley, co-founder of Prolific, another AI-data firm. 'Large language models are more complex and far-reaching, so humans are going to remain in the loop for a long time." Mr Bradley, like many techies, reckons that more people will need to get involved in training AI, not fewer. Diversity in the workforce matters. When ChatGPT was released a few years ago, people noticed that it overused the word 'delve". The word became seen as 'AI-ese", a telltale sign that the text was written by a bot. In fact, annotators in Africa had been hired to train the model and the word 'delve" is more commonly used in African English than it is in American or British English. In the same way as workers' skills and knowledge are transferred to models, their vocabulary is, too. As it turns out, it takes more than just a village to raise a child. Clarification: This article has been amended to reflect Scale AI's claim that most of its labour is based in America and Europe.

Why Data Curation Is The Key To Enterprise AI

Forbes

07-04-2025

Business
Forbes

Why Data Curation Is The Key To Enterprise AI

Nick Burling, Senior Vice President of Product at Nasuni. All the enterprise customers and end users I'm talking to these days are dealing with the same challenge. The number of enterprise AI tools is growing rapidly as ChatGPT, Claude and other leading models are challenged by upstarts like DeepSeek. There's no single tool that fits all, and it's dizzying to try to analyze all the solutions and determine which ones are best suited to the particular needs of your company, department or team. What's been lost in the focus on the latest and greatest models is the paramount importance of getting your data ready for these tools in the first place. To get the most out of the AI tools of today and tomorrow, it's important to have a complete view of your file data across your entire organization: the current and historical digital output of every office, studio, factory, warehouse and remote site, involving every one of your employees. Curating and understanding this data will help you deploy AI successfully. The potential of effective data curation is clear in the development of self-driving cars. Robotic vehicles can rapidly identify and distinguish between trees and cars in large part because of a dataset called ImageNet. This collection contains more than 14 million images of common everyday objects that have been labeled by humans. Scientists were able to train object recognition algorithms on this data because it was curated. They knew exactly what they had. Another example is the use of machine learning to identify early signs of cancer in radiological scans. Scientists were able to develop these tools in part because they had high-quality data (radiological images) and a deep understanding of the particulars of each image file. They didn't attempt to develop a tool that analyzed all patient data or all hospital files. They worked with a curated segment of medical data that they understood deeply. Now, imagine you're managing AI adoption and strategy at a civil engineering firm. Your goal is to utilize generative AI (GenAI) to streamline the process of creating proposals. And you've heard everyone in the AI world boasting about how this is a perfect use case. A typical civil engineering firm is going to have an incredibly broad range of files and complex models. Project data is going to be multimodal—a mix of text, video, images and industry-specific files. If you were to ask a standard GenAI tool to scan this data and produce a proposal, the result would be garbage. But let's say all this data was consolidated, curated and understood at a deeper level. Across tens of millions of files, you'd have a sense of which groups own which files, who accesses them often, what file types are involved and more. Assuming you had the appropriate security guardrails in place to protect the data, you could choose a tool specifically tuned for proposals and securely give that tool access to only the relevant files within your organization. Then, you'd have something truly useful that helps your teams generate better, more relevant proposals faster. Even with curation, there can be challenges. Let's say a project manager (PM) overseeing multiple construction sites wants to use a large language model (LLM) to automatically analyze daily inspection reports. At first glance, this would seem to be a perfect use case, as the PM would be working with a very specific set of files. In reality, though, the reports would probably come in different formats, ranging from spreadsheets to PDFs and handwritten notes. The dataset might include checklists or different phrasings representing the same idea. A human would easily recognize this collected data as variations of a site inspection report, but a general-purpose LLM wouldn't have that kind of world or industry knowledge. A tool like this would likely generate inaccurate and confusing results. Yet, having curated and understood this data, the PM would still be in a much better position. They'd recognize early that the complexity and variation in the inspection reports would lead to challenges and save the organization the expense and trouble of investing in an AI tool for this application. The opportunities that could grow out of organization-wide data curation stretch far beyond specific departmental use cases. Because most of your organization's data resides within your security perimeter, no AI model has been trained on those files. You have a completely unique dataset that hasn't yet been mined for insights. You could take the capabilities of the general AI models developed in training on massive, general datasets and (with the right security framework in place) fine-tune them to your organization's unique gold mine of enterprise data. This is already happening at an industry scale. The virtual paralegal Harvey has been fine-tuned on curated legal data, including case law, statutes, contracts, legal briefs and the rest. BioBERT, a model optimized for medical research, was trained on a curated dataset of biomedical texts. The researchers who developed this tool did so because biomedical texts have such a particular or specific language. Whether you want to embark on an ambitious project to create a fine-tuned model or select the right existing tool for a department or project team's needs, it all starts with data curation. In this period of rapid change and model evolution, the one constant is that if you don't know what sort of data you have, you're not going to know how to use it. Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Yahoo

03-04-2025

Science
Yahoo

The Definition of Transfer learning

Credit - Getty Images This article is published by a partner of TIME. Transfer learning is a machine learning technique that allows a model trained on one task to be repurposed or fine-tuned for a related task, drastically reducing the amount of data and computational resources needed. This method leverages pre-trained models on large datasets to perform well in new, often smaller, domains with limited labeled data. It has become increasingly popular in fields such as natural language processing, computer vision, and speech recognition, where vast amounts of data and time are typically required for training models from scratch. Pre-trained models: In transfer learning, models are initially trained on large datasets, often unrelated to the target task. For example, models like BERT and GPT-4o in natural language processing are pre-trained on diverse text information. Fine-tuning: After training on a large dataset, the model is fine-tuned on a smaller, domain-specific dataset. This involves adjusting the weights of the neural network to optimize performance for the new task. Feature extraction: One key characteristic is that lower layers of a neural network trained on a large dataset capture general features, while higher layers are fine-tuned to specific features related to the target task. Domain adaptation: Transfer learning allows models to adapt to tasks in a different but related domain. For example, a model trained on general images can be fine-tuned to identify specific objects like medical images or satellite imagery. Image classification: A model trained on a large image dataset such as ImageNet can be repurposed for a new, smaller dataset. Natural Language Processing (NLP): In NLP, large models like GPT-4o and BERT are trained on billions of words from the internet. These pre-trained models can then be fine-tuned for specific tasks such as sentiment analysis, question-answering, or text summarization with a much smaller amount of task-specific data. Speech recognition: A speech recognition system trained on a broad dataset can be fine-tuned for recognizing specific accents or dialects in different languages. For example, a general English speech recognition system could be adapted to recognize Australian English or Indian English with limited labeled data. Reduced training time: Since the model has already learned general features from the pre-training phase, the training process for a new task is much faster, often requiring fewer resources and less time. Less data required: Transfer learning allows models to achieve high performance even with a limited amount of labeled data, making it particularly useful in situations where data collection is expensive or time-consuming. Better performance with small datasets: Transfer learning often results in better performance on smaller datasets than training a model from scratch, because the model has already learned a robust representation from the large dataset. Cross-domain applicability: It enables the use of knowledge from one domain (e.g., image recognition) to be applied to another related domain (e.g., medical imaging), enabling a wider range of applications for pre-trained models. Task similarity requirement: Transfer learning works best when the source task (the one used to pre-train the model) is similar to the target task. If the two tasks are very different, transfer learning may not be effective or may even degrade performance. Overfitting risk: When fine-tuning a model on a small dataset, there is a risk of overfitting, where the model becomes too specialized on the limited new data and fails to generalize well to unseen examples. Computational resource requirements for pre-training: Although transfer learning reduces the resources needed for fine-tuning, pre-training large models on vast datasets is still computationally expensive and often requires high-performance hardware such as GPUs or TPUs. Knowledge transfer limitations: Not all knowledge learned from one domain can be transferred effectively to another. For instance, a model trained on natural images may not transfer well to more specialized areas, like recognizing satellite images, where features are quite different. Transfer learning is a powerful technique in machine learning, allowing models to adapt to new tasks efficiently by leveraging pre-trained knowledge. This approach not only reduces the need for large amounts of labeled data but also accelerates the development of AI systems across various domains, from healthcare to NLP. However, it does have its limitations, especially when the source and target tasks are not closely related or when the pre-training phase is highly resource-intensive. Despite these challenges, transfer learning remains one of the most effective methods for improving model performance and accelerating AI research in numerous fields. Copyright © by All Rights Reserved Contact us at letters@

CHM Makes AlexNet Source Code Available to the Public

Associated Press

20-03-2025

Science
Associated Press

CHM Makes AlexNet Source Code Available to the Public

Mountain View, California, March 20, 2025 (GLOBE NEWSWIRE) -- In partnership with Google, the Computer History Museum (CHM), the leading museum exploring the history of computing and its impact on the human experience, today announced the public release and long-term preservation of the source code for AlexNet, the neural network that kickstarted today's prevailing approach to AI. 'Google is delighted to contribute the source code for the groundbreaking AlexNet work to the Computer History Museum,' said Jeff Dean, chief scientist, Google DeepMind and Google Research. 'This code underlies the landmark paper 'ImageNet Classification with Deep Convolutional Neural Networks,' by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, which revolutionized the field of computer vision and is one of the most cited papers of all time.' For more information about the release of this historic source code, visit CHM's blog post here. By the late 2000s, Hinton's graduate students at the University of Toronto were beginning to use graphics processing units (GPUs) to train neural networks for image recognition tasks, and their success suggested that deep learning could be a solution to general-purpose AI. Sutskever, one of the students, believed that the performance of neural networks would scale with the amount of data available, and the arrival of ImageNet provided the opportunity. Completed in 2009, ImageNet was a dataset of images developed by Stanford professor Fei-Fei Li that was larger than any previous image dataset by several orders of magnitude. In 2011, Sutskever persuaded Krizhevsky, a fellow graduate student, to train a neural network for ImageNet. With Hinton serving as faculty advisor, Krizhevsky did so on a computer with two NVIDIA cards. Over the course of the next year, he continuously refined and retrained the network until it achieved performance superior to its competitors. The network would ultimately be named AlexNet, after Krizhevsky. In describing the AlexNet project, Hinton told CHM, 'Ilya thought we should do it, Alex made it work, and I got the Nobel Prize.' Before AlexNet, very few machine learning researchers used neural networks. After it, almost all of them would. Google eventually acquired the company started by Hinton, Krizhevsky and Sutskever, and a Google team led by David Bieber worked with CHM for five years to secure its release to the public. About CHM Software Source Code The Computer History Museum has the world's most diverse archive of software and related material. The stories of software's origins and impact on the world provide inspiration and lessons for the future to global audiences—including young coders and entrepreneurs. The Museum has released other historic source code such as APPLE II DOS, IBM APL, Apple MacPaint and QuickDraw, Apple Lisa, and Adobe Photoshop. Visit our website to learn more. About CHM The Computer History Museum's mission is to decode technology—the computing past, digital present, and future impact on humanity. From the heart of Silicon Valley, we share insights gleaned from our research, our events, and our incomparable collection of computing artifacts and oral histories to convene, inform, and empower people to shape a better future. Carina Sweet Computer History Museum (650) 810-1059 [email protected]