Latest news with #Gemma3

Gemini goes local as Google courts Indian developers

Time of India

7 days ago

Business
Time of India

Gemini goes local as Google courts Indian developers

Bengaluru: When Indian developers previously queried Google's AI models, their requests travelled thousands of kilometres to servers in other countries before returning with responses. That status quo changed a bit when Google unveiled four announcements at its developer-focused I/O event in Bengaluru on Wednesday, with one of the announcements being the localisation of AI processing within India's borders. "Indian developers can now use the powerful AI capabilities of Gemini 2.5 Flash here in India," Bikram Singh Bedi, vice president of Google Cloud Asia Pacific, told TOI. "Processing will now be available in India, and this is going to be critical from a perspective of data residency as well as low latency." The announcement addresses two critical concerns for Indian businesses – data residency regulatory concerns and latency issues. Previously, queries to Google's AI models would route through servers in the US or other global regions, not anymore though. "Certain applications need low latency, especially the ones where you're looking for real-time responses," Bedi explained. You Can Also Check: Bengaluru AQI | Weather in Bengaluru | Bank Holidays in Bengaluru | Public Holidays in Bengaluru The importance of low latency becomes clear when considering real-world applications. by Taboola by Taboola Sponsored Links Sponsored Links Promoted Links Promoted Links You May Like This Could Be the Best Time to Trade Gold in 5 Years IC Markets Learn More Undo For video streaming services, even milliseconds of delay can mean the difference between smooth playback and frustrating buffering. Financial trading platforms require split-second responses, whilst customer service chatbots need immediate responses to maintain natural conversations. Manufacturing systems monitoring equipment breakdowns cannot afford delays that might result in costly production stoppages. The second major announcement centred on Firebase, Google's popular development platform. "We have deeply integrated the Gemini 2.5 Pro into our development platforms - both AI Studio and Firebase Studio," Bedi revealed. "Developers can now use multimodal prompts — video, image, speech, text — and they can build full-stack AI applications with AI-generated templates and powerful agentic features." The integration, Bedi said, allows developers to give simple prompts directly within the code editor to generate complete applications. Hardware constraints, a perennial concern for Indian developers targeting budget smartphones, formed the backdrop to announcement number three. Google unveiled Gemma 3, the newest member of its open-source family, and highlighted the Gemma 3n variant optimised for devices with as little as 2 giga byte of RAM. "Gemma 3 is significantly ahead of anything else out there and they're supporting 140 languages, including six Indian languages," Bedi said. Skills, rather than silicon, framed the final set piece. Last year's Gen AI Exchange programme—an online academy and hackathon series launched by Google and supported by the central govt—registered 270,000 learners and reached five million developers through satellite events. "Courses completed have topped thirty thousand, but that is only the warm-up," Bedi said, announcing a second edition hackathon that opens for entries next month. Winners will receive Google Cloud credits, mentoring, and a fast track to showcase their projects at next year's I/O. The exchange, launched first in India and now spreading worldwide, is designed to close what Google and other analysts peg as a severe skills gap across IT and security roles. According to Bedi, enterprises in India are rapidly adopting AI in a slew of different verticals. "Look at Federal Bank of India - they are leveraging our AI to improve customer service. They have this friendly AI personal assistant called Fedi," Bedi explained. "They are seeing a 25% rise in customer satisfaction and 50% saving in customer care cost." Mahindra & Mahindra is another example of a large Indian conglomerate leveraging AI in diverse ways, said Bedi. "They are using our Google Cloud Vertex AI platform for cutting-edge work in R&D, engineering, simulations, and manufacturing plants. They're looking at use cases like zero breakdown, energy consumption optimization, among others," Bedi said. Uttar Pradesh, Bedi said, is building an open agricultural network on Gemini "to put micro climate data and market prices in every farmer's pocket". Such examples, he argued, show that generative AI has moved from a curiosity to a basic requirement for organisations and state govts that want to stay competitive.

From comics to chatbots, startups adopt Google AI for local impact

Business Standard

7 days ago

Business
Business Standard

From comics to chatbots, startups adopt Google AI for local impact

Eight Indian startups demonstrated applications built on Google's AI platforms at the Google I/O Connect India 2025 conference, showcasing how local companies are leveraging the tech giant's cloud infrastructure to tackle challenges across education, governance, commerce and media. The demonstrations highlighted how India's entrepreneurial ecosystem has embraced Google's AI Studio, Cloud and Vertex AI services to build scalable solutions tailored to the country's diverse market needs. One such startup, Sarvam—selected for the INDIAai Mission—is building AI tools tailored to India's cultural and linguistic diversity. Its open-source Sarvam-Translate model, built on Google's Gemma 3, delivers accurate, context-rich translations across all 22 official Indian languages. Gemma 3's multilingual efficiency helped cut training and inference costs, enabling Sarvam to scale the model, which now handles over 100,000 translation requests weekly. The API also powers Samvaad, Sarvam's conversational AI platform, which has processed more than 10 million conversation turns in Indian languages. 'Gemma breaks down Indian language text into fewer tokens on average, which directly improves the model's ability to represent and learn from these languages efficiently,' said Pratyush Kumar, founder, Sarvam. In the entertainment sector, Dashverse is using Google's Veo 3, Lyria 2 on Vertex AI, and Gemini to build Dashtoon Studio and Frameo—AI-native platforms that turn text prompts into comics and cinematic videos. These tools support its consumer apps, Dashtoon and Dashreels, which now serve over 2 million users. The company has also produced a 90-minute AI-generated Indian mythology epic using Veo 3, and is using Lyria 2 to help users create soundtracks that adapt in real time to narrative pacing on platforms like Dashreels. Similarly, Toonsutra is using Google's Lyria 2 and Gemini 2.5 Pro on Vertex AI to add dynamic music and lifelike character speech to its Indian-language webcomics. Images are animated with Veo 3's image-to-video feature, creating a more immersive and interactive storytelling experience. By combining advanced AI with culturally rooted narratives, Toonsutra is pushing the boundaries of vernacular digital entertainment. On the enterprise side, AI startup CoRover is using Google's Gemini to power customisable, multilingual chatbots for businesses, enabling communication in over 100 languages with near 99 per cent accuracy. Its solutions, including BharatGPT, have supported more than 1 billion users and facilitated over 20 billion interactions across 25,000 enterprises and developers.

From 2GB to 1TB: How to Maximize AI on Any Local Desktop Setup

Geeky Gadgets

27-05-2025

Business
Geeky Gadgets

From 2GB to 1TB: How to Maximize AI on Any Local Desktop Setup

What if your local desktop could rival the power of a supercomputer? As AI continues its meteoric rise, the ability to run complex models locally—on setups ranging from modest 2GB systems to innovative machines with a staggering 1TB of memory—is no longer a distant dream. But here's the catch: not all hardware is created equal, and choosing the wrong configuration could leave you stuck with sluggish performance or wasted potential. From lightweight models like Gemma3 to the resource-hungry Deepseek R1, the gap between what your hardware can handle and what your AI ambitions demand is wider than ever. So, how do you navigate this rapidly evolving landscape and make the most of your setup? This comprehensive comparison by Dave, unpacks the hidden trade-offs of running AI locally, from the surprising efficiency of entry-level systems to the jaw-dropping capabilities of high-end configurations. You'll discover how memory, GPUs, and CPUs shape the performance of AI workloads, and why token generation speed could be the metric that transforms your workflow. Whether you're a curious hobbyist or a professional looking to optimize large-scale deployments, this deep dive will help you decode the hardware puzzle and unlock the full potential of local desktop AI. After all, the future of AI isn't just in the cloud—it's sitting right on your desk. Optimizing AI on Desktops Why Run AI Models Locally? Running AI models on local hardware offers several distinct advantages over cloud-based solutions. It provides greater control over data, making sure privacy and security, while also reducing long-term costs associated with cloud subscriptions. Additionally, local deployment eliminates latency issues, allowing faster processing for time-sensitive tasks. However, the success of local AI deployment depends heavily on aligning your hardware's specifications with the demands of the AI models you intend to use. For instance, lightweight models like Gemma3 can operate effectively on systems with minimal resources, making them ideal for basic applications. In contrast, advanced models such as Deepseek R1 require robust setups equipped with substantial memory and processing power to function efficiently. Understanding these requirements is essential for achieving optimal performance. The Role of Memory in AI Performance Memory capacity plays a pivotal role in determining the performance of AI models. Tests conducted on systems ranging from 2GB to 1TB of memory reveal significant trade-offs between cost, speed, and scalability. Here's how different setups compare: 2GB systems: These are suitable for lightweight tasks such as license plate recognition or basic image classification. However, they struggle with larger, more complex models due to limited memory bandwidth. These are suitable for lightweight tasks such as license plate recognition or basic image classification. However, they struggle with larger, more complex models due to limited memory bandwidth. 8GB systems: Capable of handling mid-sized models, these setups offer moderate performance but experience slower token generation speeds, particularly with larger datasets. Capable of handling mid-sized models, these setups offer moderate performance but experience slower token generation speeds, particularly with larger datasets. 128GB and above: High-memory configurations excel at running advanced models, offering faster processing speeds and greater scalability for demanding workloads. One critical metric to consider is token generation speed, which improves significantly with higher memory configurations. Systems with more memory are better equipped to process large datasets and execute complex models, making them indispensable for tasks such as natural language processing, image generation, and predictive analytics. Local Desktop AI Compared : 2GB to 1024GB Watch this video on YouTube. Dive deeper into AI models with other articles and guides we have written below. Hardware Configurations: Matching Systems to Workloads Different hardware configurations cater to varying AI workloads, and selecting the right setup is crucial for achieving efficient performance. Below is a breakdown of how various configurations perform: Low-end systems: Devices like the Jetson Orin Nano (2GB RAM) are limited to lightweight models and basic applications, such as object detection or simple automation tasks. Devices like the Jetson Orin Nano (2GB RAM) are limited to lightweight models and basic applications, such as object detection or simple automation tasks. Mid-range GPUs: Options such as the Tesla P40 (8GB RAM) and RTX 6000 ADA (48GB RAM) strike a balance between cost and performance. These systems can handle larger models with moderate efficiency, making them suitable for small to medium-scale AI projects. Options such as the Tesla P40 (8GB RAM) and RTX 6000 ADA (48GB RAM) strike a balance between cost and performance. These systems can handle larger models with moderate efficiency, making them suitable for small to medium-scale AI projects. High-end systems: Machines like the Apple M2 Mac Pro (128GB RAM) and 512GB Mac M4 are designed for advanced models like Deepseek R1. These setups provide the memory and processing power needed for large-scale AI workloads, including deep learning and complex simulations. CPU-only setups, while less common, can also support massive models when paired with extensive memory. For example, systems equipped with 1TB of RAM can handle computationally intensive tasks, though they may lack the speed and efficiency of GPU-accelerated configurations. This highlights the importance of matching hardware capabilities to the specific computational demands of your AI tasks. AI Models: Size and Complexity Matter The size and complexity of AI models are key factors influencing their hardware requirements. Smaller models, such as Gemma3 with 1 billion parameters, are well-suited for low-memory setups and can perform tasks like text summarization or basic image recognition. These models are ideal for users with limited hardware resources or those seeking cost-effective solutions. In contrast, larger models like Deepseek R1, which scale up to 671 billion parameters, demand high-memory systems and advanced GPUs or CPUs to function efficiently. These models are designed for tasks requiring significant computational power, such as advanced natural language understanding, generative AI, and large-scale data analysis. The disparity in hardware requirements underscores the importance of tailoring your setup to the specific needs of your AI applications. Key Performance Insights Testing AI models across various hardware configurations has revealed several critical insights that can guide your decision-making: Memory capacity: Higher memory directly correlates with improved processing speed and scalability, making it a crucial factor for running complex models. Higher memory directly correlates with improved processing speed and scalability, making it a crucial factor for running complex models. Unified memory architecture: Found in Apple systems, this feature enhances AI workloads by allowing seamless access to shared memory resources, improving overall efficiency. Found in Apple systems, this feature enhances AI workloads by allowing seamless access to shared memory resources, improving overall efficiency. Consumer-grade hardware: While affordable, these systems often struggle with large-scale models due to limitations in memory and processing power, making them less suitable for demanding applications. These findings emphasize the need to carefully evaluate your hardware options based on the size, complexity, and computational demands of your AI tasks. Optimizing Local AI Deployment To achieve efficient and cost-effective AI performance on local desktop hardware, consider the following strategies: Ensure your hardware configuration matches the size and complexity of the AI models you plan to run. This alignment is critical for avoiding performance bottlenecks. Use tools like Olama to simplify the process of downloading, configuring, and running AI models locally. These tools can streamline deployment and reduce setup time. to simplify the process of downloading, configuring, and running AI models locally. These tools can streamline deployment and reduce setup time. Invest in high-memory systems if your workload involves large-scale models or extensive data processing. While the upfront cost may be higher, the long-term benefits in performance and scalability are significant. By following these recommendations, you can maximize the performance of your local AI deployments while staying within budget and making sure efficient resource utilization. Challenges and Future Developments Despite recent advancements, consumer hardware still faces limitations when supporting the largest AI models. Memory constraints, processing speed, and scalability remain significant challenges, particularly for users with budget-friendly setups. However, ongoing developments in GPUs, CPUs, and memory architectures are expected to address these issues, paving the way for more powerful and accessible AI systems. Emerging technologies, such as quantum computing and next-generation GPUs, hold the potential to transform local AI deployment. These advancements promise to deliver unprecedented processing power and efficiency, allowing broader adoption of AI across industries and applications. Media Credit: Dave's Garage Filed Under: AI, Guides Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

Gemma 3n: All about Google's open model for on-device AI on phones, laptops

Business Standard

22-05-2025

Business
Business Standard

Gemma 3n: All about Google's open model for on-device AI on phones, laptops

At its annual Google I/O conference, Google unveiled the Gemma 3n, a new addition to its Gemma 3 series of open AI models. The company said that the model is designed to run efficiently on everyday devices like smartphones, laptops, and tablets. Gemma 3n shares its architecture with the upcoming generation of Gemini Nano, the lightweight AI model that already powers several on-device AI features on Android devices such as voice recorder summaries on Pixel smartphones. Gemma 3n model: Details Google says Gemma 3n makes use of a new technique called Per-Layer Embeddings (PLE), which allows the model to consume much less RAM than similarly sized models. Although the model has 5 billion and 8 billion parameters (5B and 8B), this new memory optimisation brings its RAM usage closer to that of a 2B or 4B model. In practical terms, this means Gemma 3n can run with just 2GB to 3GB of RAM, making it viable for a much wider range of devices. Gemma 3n model: Key capabilities Audio input: The model can process sound-based data, enabling applications like speech recognition, language translation, and audio analysis. Multimodal input: With support for visual, text, and audio inputs, the model can handle complex tasks that involve combining different types of data. Broad language support: Google said that the model is trained in over 140 languages. 32K token context window: Gemma 3n supports input sequences up to 32,000 tokens, allowing it to handle large chunks of data in one go—useful for summarising long documents or performing multi-step reasoning. PLE caching: The model's internal components (embeddings) can be stored temporarily in fast local storage (like the device's SSD), helping reduce the RAM needed during repeated use. Conditional parameter loading: If a task doesn't require audio or visual capabilities, the model can skip loading those parts, saving memory and speeding up performance. Gemma 3n model: Availability As part of the Gemma open model family, Gemma 3n is provided with accessible weights and licensed for commercial use, allowing developers to tune, adapt, and deploy it across a variety of applications. Gemma 3n is now available as a preview in Google AI Studio.

Gemma 3n AI model brings real-time multimodal power to mobiles

Techday NZ

22-05-2025

Business
Techday NZ

Gemma 3n AI model brings real-time multimodal power to mobiles

Gemma 3n, a new artificial intelligence model architected for mobile and on-device computing, has been introduced as an early preview for developers. Developed in partnership with mobile hardware manufacturers, Gemma 3n is designed to support real-time, multimodal AI experiences on phones, tablets, and laptops. The model extends the capabilities of the Gemma 3 family by focusing on performance and privacy in mobile scenarios. The new architecture features collaboration with companies such as Qualcomm Technologies, MediaTek, and Samsung System LSI. The objective is to optimise the model for fast, responsive AI that can operate directly on device, rather than relying on cloud computing. This marks an extension of the Gemma initiative towards enabling AI applications in everyday devices, utilising a shared foundation that will underpin future releases across platforms like Android and Chrome. According to information provided, Gemma 3n is also the core of the next generation of Gemini Nano, which is scheduled for broader release later in the year, bringing expanded AI features to Google apps and the wider on-device ecosystem. Developers can begin working with Gemma 3n today as part of the early preview, helping them to build and experiment with local AI functionalities ahead of general availability. The model has performed strongly in chatbot benchmark rankings. One chart included in the announcement ranks AI models by Chatbot Arena Elo scores, with Gemma 3n noted as ranking highly amongst both popular proprietary and open models. Another chart demonstrates the model's mix-and-match performance with respect to model size. Gemma 3n benefits from Google DeepMind's Per-Layer Embeddings (PLE) innovation, which leads to substantial reductions in RAM requirements. The model is available in 5 billion and 8 billion parameter versions, but, according to the release, it can operate with a memory footprint comparable to much smaller models—2 billion and 4 billion parameters—enabling operation with as little as 2GB to 3GB of dynamic memory. This allows the use of larger AI models on mobile devices or via cloud streaming, where memory overhead is often a constraint. The company states, "Gemma 3n leverages a Google DeepMind innovation called Per-Layer Embeddings (PLE) that delivers a significant reduction in RAM usage. While the raw parameter count is 5B and 8B, this innovation allows you to run larger models on mobile devices or live-stream from the cloud, with a memory overhead comparable to a 2B and 4B model, meaning the models can operate with a dynamic memory footprint of just 2GB and 3GB." Additional technical features of Gemma 3n include optimisations that allow the model to respond approximately 1.5 times faster on mobile devices compared to previous Gemma versions, with improved output quality and lower memory usage. The announcement highlights innovations such as Per Layer Embeddings, KVC sharing, and advanced activation quantisation as contributing to these improvements. The model also supports what the company calls "many-in-1 flexibility." Utilizing a 4B active memory footprint, Gemma 3n incorporates a nested 2B active memory footprint submodel through the MatFormer training process. This design allows developers to balance performance and quality needs without operating separate models, composing submodels on the fly to match a specific application's requirements. Upcoming technical documentation is expected to elaborate on this mix-and-match capability. Security and privacy are also prioritised. The development team states that local execution "enables features that respect user privacy and function reliably, even without an internet connection." Gemma 3n brings enhanced multimodal comprehension, supporting the integration and understanding of audio, text, images, and video. Its audio functionality supports high-quality automatic speech recognition and multilingual translation. Furthermore, the model can accept inputs in multiple modalities simultaneously, enabling the parsing of complex multimodal interactions. The company describes the expansion in audio capabilities: "Its audio capabilities enable the model to perform high-quality Automatic Speech Recognition (transcription) and Translation (speech to translated text). Additionally, the model accepts interleaved inputs across modalities, enabling understanding of complex multimodal interactions." A public release of these features is planned for the near future. Gemma 3n features improved performance in multiple languages, with notable gains in Japanese, German, Korean, Spanish, and French. This is reflected in benchmark scores such as a 50.1% result on WMT24++ (ChrF), a multilingual evaluation metric. The team behind Gemma 3n views the model as a catalyst for "intelligent, on-the-go applications." They note that developers will be able to "build live, interactive experiences that understand and respond to real-time visual and auditory cues from the user's environment," and design advanced applications capable of real-time speech transcription, translation, and multimodal contextual text generation, all executed privately on the device. The company also outlined its commitment to responsible development. "Our commitment to responsible AI development is paramount. Gemma 3n, like all Gemma models, underwent rigorous safety evaluations, data governance, and fine-tuning alignment with our safety policies. We approach open models with careful risk assessment, continually refining our practices as the AI landscape evolves." Developers have two initial routes for experimentation: exploring Gemma 3n via a cloud interface in Google AI Studio using browser-based access, or integrating the model locally through Google AI Edge's suite of developer tools. These options enable immediate testing of Gemma 3n's text and image processing capabilities. The announcement states: "Gemma 3n marks the next step in democratizing access to cutting-edge, efficient AI. We're incredibly excited to see what you'll build as we make this technology progressively available, starting with today's preview."

Latest news with #Gemma3

Gemini goes local as Google courts Indian developers

From comics to chatbots, startups adopt Google AI for local impact

From 2GB to 1TB: How to Maximize AI on Any Local Desktop Setup

Gemma 3n: All about Google's open model for on-device AI on phones, laptops

Gemma 3n AI model brings real-time multimodal power to mobiles

Get Started Now: Download the App