Latest news with #multimodal

Alibaba unveils new AI model for image creation, as open-source approach gains recognition

South China Morning Post

2 days ago

Business
South China Morning Post

Alibaba unveils new AI model for image creation, as open-source approach gains recognition

Alibaba Group Holding has launched a new artificial intelligence (AI) model, Qwen VLo, said to be capable of generating and editing images with a finesse akin to that of a human artist, intensifying the competition in multimodal models as the tech giant seeks to redefine itself as an AI leader. Released on Friday, Qwen VLo was a 'comprehensive upgrade' from previous models like QwenVL and Qwen2.5 VL, the company said. It could better understand input and create more precise images, accommodate open-ended instructions, and support multiple languages, including Chinese and English. A preview is now available on Qwen Chat. Qwen VLo also supports diverse input and output formats, offering increased flexibility for users and making it ideal for creating posters, illustrations, web banners, and social media covers. Alibaba owns the South China Morning Post. The new model adds to the intense competition in China's AI landscape, as rivals such as ByteDance and SenseTime strive to introduce their own multimodal models designed to interpret various types of input data, including text, video, and audio. In contrast, traditional AI models only handle one type of input. 10:41 How Hangzhou's 'Six Little Dragons' built a new Chinese tech hub How Hangzhou's 'Six Little Dragons' built a new Chinese tech hub Alibaba has been doubling down on AI and cloud computing, as it moves to streamline its sprawling operations. In February, the company pledged to invest more than 380 billion yuan (US$52 billion) in AI infrastructure over the next three years.

LanceDB raises $30 million for multimodal AI data infrastructure

Reuters

6 days ago

Business
Reuters

LanceDB raises $30 million for multimodal AI data infrastructure

SAN FRANCISCO, June 24 (Reuters) - Artificial intelligence data infrastructure company LanceDB raised a $30 million series A round, the company said Tuesday. The round was led by Theory Ventures, with participation from CRV, Y Combinator, Databricks Ventures and Runway. LanceDB's platform stores and processes the data that AI companies use to build advanced AI models, with a focus on multimodal AI models capable of processing and integrating various types of data including text, video, images and audio. As AI models get larger, software that helps companies manage data on cloud-based systems is increasingly important. Multimodal data sources have become more important as AI model makers have used the most easily accessible scraped data. "We are nowhere near close to being tapped out on multimodal data sources, and the next wave of AI must have sight and voice built-in," LanceDB CEO Chang She told Reuters. LanceDB's software is particularly useful for enterprises that need to store and process multimodal data like PDFs and videos to enable agentic AI workflows, where AI mimics humans to use software applications, She said. LanceDB works with AI model makers like Runway, Midjourney, World Labs, Harvey and to help manage multimodal data, the company said.

Yahoo

6 days ago

Business
Yahoo

LanceDB raises $30 million for multimodal AI data infrastructure

By Anna Tong SAN FRANCISCO (Reuters) -Artificial intelligence data infrastructure company LanceDB raised a $30 million series A round, the company said Tuesday. The round was led by Theory Ventures, with participation from CRV, Y Combinator, Databricks Ventures and Runway. LanceDB's platform stores and processes the data that AI companies use to build advanced AI models, with a focus on multimodal AI models capable of processing and integrating various types of data including text, video, images and audio. As AI models get larger, software that helps companies manage data on cloud-based systems is increasingly important. Multimodal data sources have become more important as AI model makers have used the most easily accessible scraped data. "We are nowhere near close to being tapped out on multimodal data sources, and the next wave of AI must have sight and voice built-in," LanceDB CEO Chang She told Reuters. LanceDB's software is particularly useful for enterprises that need to store and process multimodal data like PDFs and videos to enable agentic AI workflows, where AI mimics humans to use software applications, She said. LanceDB works with AI model makers like Runway, Midjourney, World Labs, Harvey and to help manage multimodal data, the company said. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Google's Gemini 2.5 Stable Build Released : An AI That Can Do It All

Geeky Gadgets

7 days ago

Business
Geeky Gadgets

Google's Gemini 2.5 Stable Build Released : An AI That Can Do It All

What if the future of artificial intelligence wasn't just smarter—but fundamentally more versatile? With the release of Gemini 2.5, Google has unveiled a new leap in AI technology, setting a new standard for what's possible. Imagine an AI capable of seamlessly analyzing text, audio, images, video, and even code—all in a single workflow. This isn't just an incremental update; it's a bold redefinition of how AI can integrate into our lives, from transforming app development to decoding the complexities of multilingual communication. In an era where efficiency and adaptability are paramount, Gemini 2.5 doesn't just meet expectations—it reshapes them. Matthew Berman explores how Gemini 2.5's multimodal capabilities and innovative training frameworks are pushing the boundaries of AI performance. From its ability to process sprawling datasets with a 1-million-token context window to its resource-efficient architecture, this release promises to empower developers, researchers, and businesses alike. But what truly sets Gemini 2.5 apart? Beyond its technical prowess, it's the model's real-world applications—like analyzing intricate video content or assisting with complex coding tasks—that make it a fantastic option. As we delve deeper, you'll discover why this release isn't just a milestone for Google but a pivotal moment for the AI landscape as a whole. Google Gemini 2.5 Overview What Sets Gemini 2.5 Apart? Gemini 2.5 is engineered to process complex, multimodal inputs, including text, audio, images, video, and even code repositories. This versatility unlocks a wide array of applications, ranging from software development to video content analysis. Key features that distinguish Gemini 2.5 include: 1-Million-Token Context Window: Enables the processing of extensive datasets while maintaining coherence over long contexts, making it ideal for tasks requiring in-depth analysis. Enables the processing of extensive datasets while maintaining coherence over long contexts, making it ideal for tasks requiring in-depth analysis. Dynamic Thinking Budgets: Optimizes computational resource allocation, improving reasoning capabilities and tool integration. Optimizes computational resource allocation, improving reasoning capabilities and tool integration. Sparse Mixture of Experts Architecture: Activates only the necessary components for specific tasks, making sure high performance with minimal resource consumption. These features make Gemini 2.5 not only a high-performing model but also a resource-efficient solution, addressing the growing demand for scalable and versatile AI systems. Performance and Practical Applications Gemini 2.5 is built for speed, cost efficiency, and adaptability, making it suitable for a wide range of real-world applications. Its advanced capabilities excel in areas such as: Translation and Classification: Processes multilingual content with exceptional accuracy, allowing seamless communication across languages. Processes multilingual content with exceptional accuracy, allowing seamless communication across languages. Coding and Development: Assists developers in integrated development environments (IDEs) and performs repository-level tasks with precision. Assists developers in integrated development environments (IDEs) and performs repository-level tasks with precision. Video Understanding: Analyzes intricate video content to extract actionable insights, supporting industries like media, security, and education. For example, Gemini 2.5 can streamline app development workflows, generate coherent outputs for coding projects, or analyze complex video data to uncover patterns and trends. Its ability to handle long-context reasoning and multimodal interactions makes it an indispensable tool for developers, researchers, and businesses. Google Gemini 2.5 Stable Build 2025 Watch this video on YouTube. Here is a selection of other guides from our extensive library of content you may find of interest on Google Gemini 2.5. Innovative Training Framework The exceptional performance of Gemini 2.5 is rooted in its robust training framework. By using diverse datasets that include text, code, images, audio, and video, the models achieve a comprehensive understanding of various data modalities. Key training innovations include: Reinforcement Learning with Verifiable Rewards: Improves reasoning accuracy and ensures reliable outputs. Improves reasoning accuracy and ensures reliable outputs. Distillation Techniques: Produces smaller, efficient models without sacrificing performance, making them accessible for a broader range of applications. These advancements enable Gemini 2.5 to deliver high-quality results while maintaining efficiency, making it a valuable asset for addressing complex AI challenges across industries. Commitment to AI Safety and Ethics Google has prioritized safety and ethical considerations in the development of Gemini 2.5, implementing measures to ensure responsible AI usage. These include: Automated Red Teaming: Identifies vulnerabilities and enhances the robustness of the models. Identifies vulnerabilities and enhances the robustness of the models. Low Memorization Rates: Minimizes the risk of reproducing sensitive or copyrighted information in outputs. Minimizes the risk of reproducing sensitive or copyrighted information in outputs. Factual Accuracy: Ensures that the models produce reliable and trustworthy results. These safeguards reflect Google's commitment to addressing concerns about data security, ethical AI use, and the potential risks associated with advanced AI technologies. Technical Innovations Driving Gemini 2.5 The Gemini 2.5 models are powered by Google's TPU V5P architecture, which serves as the computational backbone for their advanced capabilities. This architecture enhances several critical aspects of the models, including: Efficient Token Usage: Particularly advantageous for tasks like video understanding, where large datasets are common. Particularly advantageous for tasks like video understanding, where large datasets are common. Multimodal Reasoning: Assists seamless integration and analysis of diverse data types, allowing more comprehensive insights. Assists seamless integration and analysis of diverse data types, allowing more comprehensive insights. Generative Capabilities: Produces contextually relevant outputs across a variety of domains, from creative content generation to technical problem-solving. These technical advancements ensure that Gemini 2.5 remains a robust and reliable platform for tackling complex AI challenges with precision and speed. Addressing Limitations and Future Potential Despite its new features, Gemini 2.5 is not without limitations. Current challenges include: Screen Reading: Struggles with tasks requiring detailed text extraction from screens, which may limit its utility in certain scenarios. Struggles with tasks requiring detailed text extraction from screens, which may limit its utility in certain scenarios. Long-Context Generative Reasoning: May require external frameworks to optimize performance for tasks involving extended contexts. While these limitations highlight areas for improvement, ongoing research and development efforts are likely to address these challenges in future iterations. Even with these constraints, Gemini 2.5 remains a highly effective solution for most use cases. Demonstrating Versatility in Action The practical applications of Gemini 2.5 underscore its versatility and adaptability. The models have been successfully employed in tasks such as: Playing complex strategy games like Pokémon, showcasing their ability to handle intricate decision-making processes. Analyzing video content to derive actionable insights, supporting industries like marketing, security, and entertainment. Simulating operations such as solving a Rubik's Cube, demonstrating their problem-solving capabilities. These examples highlight the diverse and demanding tasks that Gemini 2.5 can handle, making it a valuable resource for developers, researchers, and businesses seeking innovative AI solutions. Media Credit: Matthew Berman Filed Under: AI, Top News Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

GSM Arena

11-06-2025

GSM Arena

Samsung teases Galaxy Z Fold7's AI camera features Comments

11 June 2025 The Galaxy Camera app on the Z Fold7 is getting new native multimodal AI search features.

Latest news with #multimodal

Alibaba unveils new AI model for image creation, as open-source approach gains recognition

LanceDB raises $30 million for multimodal AI data infrastructure

LanceDB raises $30 million for multimodal AI data infrastructure

Google's Gemini 2.5 Stable Build Released : An AI That Can Do It All

Samsung teases Galaxy Z Fold7's AI camera features Comments

Get Started Now: Download the App