logo
#

Latest news with #Docs

Google Drive could soon help you decode documents faster (APK teardown)
Google Drive could soon help you decode documents faster (APK teardown)

Android Authority

timea day ago

  • Android Authority

Google Drive could soon help you decode documents faster (APK teardown)

Edgar Cervantes / Android Authority TL;DR Google Drive on Android is working on introducing Gemini-based PDF summary capabilities, similar to those found on Drive on the web. Users will soon be able to access PDF summaries directly within the PDF viewer via a three-dot menu or header icon. The app is also working on multi-file and limited folder summarization features that allow content across various Docs and PDFs to be summarized simultaneously. Google Drive has received many Gemini-related features over the past months. However, many of these features first come to Google Drive on the Web before making their way over to the Android app. We've now spotted that Google Drive on Android could soon serve PDF summaries through the PDF viewer. Further, the ability to summarize files through the file viewer window will also support selecting multiple files to summarize them together. Authority Insights story on Android Authority. Discover You're reading anstory on Android Authority. Discover Authority Insights for more exclusive reports, app teardowns, leaks, and in-depth tech coverage you won't find anywhere else. An APK teardown helps predict features that may arrive on a service in the future based on work-in-progress code. However, it is possible that such predicted features may not make it to a public release. Google Drive on the Web already serves automatic PDF summaries. Now, Google Drive v2.25.250 for Android includes code for PDF summarization through Gemini. We managed to activate the feature ahead of its release to give you an early look: In the near future, users will likely be able to summarize PDFs that are either stored in their Google Drive or received through Google Drive. For this, they won't need to exit the PDF viewer and ask Gemini within Drive to summarize the file — instead, they will be able to trigger Gemini from within the PDF viewer. Users will be able to click on the three-dot icon on the top right and choose the Summarize this file option. They can also tap the Gemini icon in the header and type in their request to summarize the PDF. Note that this feature is unlikely to work with offline PDFs that you are opening through Google Drive's PDF viewer, so don't try summarizing downloaded files. Your best bet would be to keep the PDF files on Drive to get a summary. Unfortunately, these PDF summaries are not automatically displayed yet, as they are on the web. However, this won't be a big deal, as you will be able to manually initiate a summary for any PDF you need summarized. We've previously spotted Google Drive also working on allowing users to summarize any folder from the file viewer window. We then spotted the app working on letting users summarize individual PDF and Doc files too. In addition to all of this, the file summarization feature will also make it easier for users to summarize multiple files and folders. Selecting multiple files will be very easy, as users will likely be able to select multiple PDFs and Docs by long-pressing them. We managed to activate the feature ahead of its release for an early look: Users will also be able to select a folder along with the files, but they may be restricted to selecting only one folder. We couldn't get the process to work when multiple folders were selected, as you can see in the last screenshot above, where the Gemini icon no longer stays highlighted. Thankfully, there seems to be no limit to how many PDFs and Docs you can select, which should easily serve most needs. Got a tip? Talk to us! Email our staff at Email our staff at news@ . You can stay anonymous or get credit for the info, it's your choice.

Gemini AI Explained: A Deep Dive Into Google's Multimodal Assistant
Gemini AI Explained: A Deep Dive Into Google's Multimodal Assistant

Yahoo

time3 days ago

  • Business
  • Yahoo

Gemini AI Explained: A Deep Dive Into Google's Multimodal Assistant

Generative AI has rapidly moved from science fiction to everyday utility, transforming the way we work, learn, and create. In this evolving landscape, Google's multimodal AI platform, Gemini, stands out as a highly capable AI system wrapped in a chatbot interface. Gemini is engineered to process text, images, audio, and code with an eye toward real-world problem-solving and deep integration across Google's ever-changing ecosystem. Gemini is built to support multimodal interaction, meaning it can interpret and generate not just text, but also images, audio, and code. More than a simple chatbot, Gemini operates as a foundational platform across Google's digital ecosystem. It's accessible on the web or through apps, browser extensions, and integration in Workspace tools like Gmail and Docs, where it can assist with summarization, drafting, analysis, and more. The big idea is flexibility: Gemini is designed to adapt to a wide range of user needs across personal, creative, and professional tasks. To get a sense of just what Gemini is striving toward, what it's "trying to be," Google's longer-term vision for Gemini points beyond one-off interactions. It's being positioned as a general-purpose AI interface—something that can function as a bridge between people and increasingly complex digital environments. In that light, Gemini represents not just an AI product, but a look at how Google imagines users will interact with technology in the years ahead. Credit: ExtremeTech/Gemini Gemini actually started out as a totally different and more limited Google AI, named Bard. Gemini's development began as such with its initial release in December 2023. The first version, Gemini 1.0, introduced Google's vision for a multimodal AI assistant—capable of processing not just text, but also images and audio. It marked a foundational shift from Bard, which was primarily text-based, to a more versatile platform built for a wider variety of uses. Most recently, in June 2025, Google launched Gemini 2.5 Pro and 2.5 Flash. These models introduced enhanced multimodal reasoning, native support for audio and video inputs, and a more refined 'thinking budget' system that allows the model to dynamically allocate compute resources based on task complexity. Flash, in particular, was optimized for low-latency, high-throughput tasks, making it (according to Gemini itself) "ideal for enterprise-scale deployments." Gemini 2.5 also extended integration across Google's ecosystem, reinforcing its role as a general-purpose assistant for both individual users and enterprise teams. Like other major AI chatbots in its class, Gemini is powered by a large language model (LLM). In Gemini's case, it's based on Google DeepMind's Gemini 1.5 Pro and 2.5 Pro models, which are part of a greater family of large multimodal models built on a Mixture-of-Experts (MoE) transformer architecture. This design allows the system to dynamically route tasks to specialized 'experts' within the model, improving efficiency and performance across a wide range of inputs. It's worth noting that Google researchers were instrumental in developing the original transformer architecture back in 2017—a breakthrough that laid the foundation for nearly all modern large language models, including Gemini. Transformers are a type of neural network architecture that excels at bulk processing many different types of information—especially language, images, and audio. Originally developed by Google researchers in 2017, transformers work by analyzing the relationships between elements in a sequence (like words in a sentence or frames in a video) all at once, as opposed to one by one. Although they first gained traction in natural language processing, transformers are now widely used in audio and visual applications, powering everything from speech recognition and music generation to real-time closed captioning and video analysis. Mixture-of-experts systems, meanwhile, let multiple "sub-AIs" join forces to handle different parts of the same task, in order to produce a higher-quality, more polished result. Together, these technologies empower modern AI heavyweights like Gemini, Copilot, ChatGPT, and their kin. Thanks to its technological ancestry, one of Gemini's core strengths is its ability to handle multimodal input and output. Users can upload a photo, a video clip, a spreadsheet, or a block of code, whereupon Gemini can interpret the content, reason about it, and generate a relevant response. Gemini can do things like summarize a PDF, analyze a chart, or generate a caption for an image. According (again) to Gemini itself, the model's design emphasizes "fluid, context-aware interaction across formats and Workspaces, rather than siloed, single-mode tasks." On the server side, Gemini uses a resource management mechanism known as a 'thinking budget'—a configurable system that allocates additional computational resources to queries that require a more thorough analysis. For example, when tackling a multi-step math problem, interpreting a legal document, or generating code with embedded logic, Gemini can spend more time and processing power to improve accuracy and coherence. This feature is especially prominent in the Gemini 2.5 Pro and Flash models, where developers can either let the model decide when to think more deeply or manually configure the budget to balance speed and depth. As a generative AI, Gemini is built from the ground up to generate a novel response to prompts or queries from the user. Ask it a question and it'll answer you, with a limited memory for your conversation history. You can ask it to explain current events, use it to find vacation destinations, work through an idea, and so on. Its Deep Research feature allows you to have a bit of back-and-forth with the AI to refine your research plan before it really gets into the work of answering your question, like a really stiff, corporate guy Friday. Think "Alfred Pennyworth, but sponsored by Google instead of Wayne Industries." To give a sense of Gemini's image creation capabilities, we've included a few examples of images created using Gemini. For example, here's what you get when you ask it for photorealism: Credit: ExtremeTech/Gemini We asked Gemini to create a photorealistic image of "a tufted titmouse sitting on a branch of an oak tree," and it filled in the rest of the details. It can also handle surrealism: in this case, we had it produce a surrealist image of three cubes, one of raw amethyst, one of patinaed copper, and one of... watermelon. Credit: ExtremeTech/Gemini Gemini also has a knack for rendering a subject of choice in wildly different artistic styles, such as art nouveau, sumi-e, impressionist painting, pointilism, et cetera. Credit: ExtremeTech/Gemini We tried out Gemini's Deep Research feature and found it to produce a thorough and well-sourced research report on several unrelated topics, such as motor vehicle safety, the Hubble telescope, the safety and efficacy of various herbal supplements, and (on a whim) mozzarella cheese. Despite its technical sophistication, Gemini still faces the same limitations that affect other large language models. Issues include hallucination—confidently generating incorrect or misleading information—as well as occasional struggles with ambiguous prompts or complicated reasoning. While Gemini's long memory for context helps reduce some of these issues, it can't eliminate them entirely. The model's performance can also vary depending on what it's doing: for instance, interpreting complex images or audio clips may yield less consistent results than text-based tasks. Google continues to refine Gemini's output quality, but anyone using generative AI should verify any load-bearing or otherwise critical information, especially in high-stakes or professional contexts. On the ethical-AI front, Gemini has encountered its share of controversy. For example, in early 2024, its image-generation feature was briefly suspended after users discovered that it produced historically inaccurate or racially incongruous depictions, like racially diverse Nazis—an overcorrection in an attempt to promote diversity. The incident spotlighted the difficulty of balancing inclusivity with factual accuracy and raised broader questions about how AI systems are trained, tested, and deployed. Google responded by pausing the feature and committing to more rigorous oversight, but the episode underscores the ongoing challenge of aligning AI behavior with social expectations and ethical norms. And then there's privacy. Uff da. Listen, you probably already knew this, but in case you didn't: Google absolutely strip-mines your every keystroke, click, and search query for data it can use to 1) make more money and 2) improve its products, in that order. That's the bargain, and they're not subtle about it. Say what you will about the mortifying ordeal of being known—right above the input box, Gemini places a standard disclaimer that chats are reviewed in order to analyze their contents and improve the UX. That may or may not matter for your purposes, but—better the devil you know, eh? As of June 2025, Google offers three tiers of service for Gemini. Gemini offers free access to the service to anyone with a Google account. Google's paid AI Pro subscription will run you $20 a month. The Pro tier includes access to AI video creation and filmmaking tools like Flow and Whisk, powered by Google's video creation model, Veo 2. It also includes the ability to use Gemini through your Gmail and Google Docs, plus a couple terabytes of storage. College students get a free upgrade to the Pro tier through the end of finals 2026. For seriously committed AI users on the enterprise level, there's also a "Google AI Ultra" subscription, available for between $125 and $250, depending on whether it's on sale. The Ultra subscription offers additional perks, including 30 TB of storage, early access to Project Mariner (an "agentic research prototype"), and a YouTube Premium individual plan. Google has laid out an ambitious vision for Gemini: to evolve it into a universal AI assistant capable of reasoning, planning, and acting across devices and modalities. According to DeepMind CEO Demis Hassabis, the long-term goal is to develop Gemini into a 'world model'—an AI system that can simulate aspects of the real world, understand context, and take action on behalf of users. This includes integrating capabilities like video understanding, memory, and real-time interaction, with early versions already appearing in Gemini Live and Project Astra demos. In the near term, Google is working to weave Gemini in from top to bottom throughout its ecosystem, from Android and Chrome to Google search and smart devices. The assistant is expected to become more proactive, context-aware, and personalized—surfacing recommendations, managing tasks, and even controlling hardware like smart glasses. Some of these developments are already underway, with Gemini 2.5 models supporting audio-visual input, native voice output, and long-context reasoning for more complex workflows. All this product integration is very shiny and impressive, but if you're familiar with Google's track record (or the "Google graveyard"), it also starts to feel a little precarious. Cantilevered, even. Google's history of launching and later discontinuing high-profile products—ranging from Google Reader to Stadia—has earned it a somewhat checkered reputation. While Gemini currently enjoys strong internal support and integration across flagship services, its long-term survival will depend on sustained user adoption, successful monetization, and Google's willingness to iterate rather than pivot. For now, Gemini represents one of the company's most comprehensive and promising bets on the longevity of AI—but in the Google ecosystem, even the most promising tools aren't guaranteed to last.

OpenAI goes after Google Docs, Microsoft's Word for web with new ChatGPT features
OpenAI goes after Google Docs, Microsoft's Word for web with new ChatGPT features

Economic Times

time4 days ago

  • Business
  • Economic Times

OpenAI goes after Google Docs, Microsoft's Word for web with new ChatGPT features

OpenAI has developed features that let people collaborate on documents and communicate via chat in ChatGPT, The Information reported, citing two people who have seen the for collaboration features suggest OpenAI could even consider developing related productivity features such as file storage, the report said. These features compete with rival Google's Docs and Word for the web by Microsoft, OpenAI's biggest shareholder and business partner. OpenAI has already eaten into traffic for Google's flagship Search, with ChatGPT becoming popular for web searches. Google recently rolled out its 'AI Mode' for Search in India, after announcing the feature during its I/O event last month. The feature employs Gemini 2.5 to improve Google search results and experience. Meanwhile, the AI major has been hurting business for Microsoft's Copilot AI with its ChatGPT Enterprise, Bloomberg reported. The two tech companies also compete for customers of AI coding assistants and AI models accessed via application programming interface (API) by more advanced developers. Notably, Microsoft's multibillion investments in OpenAI were crucial in making the latter a forerunner in the AI space. It's still not clear if OpenAI will release these features, The Information reported. But in the event it does, it will open a new front in the rivalry with Google, and further complicate its ties with Microsoft. OpenAI is trying to secure Microsoft's blessings for a restructuring plan for its for-profit unit that oversees ChatGPT. Last week, Financial Times reported that Microsoft is prepared to abandon its high-stakes negotiations with OpenAI over the future of its alliance. Also Read: Group that opposed OpenAI's restructuring raises concerns about new revamp planAttractive bundle A combination of enterprise AI and document collaboration features would make ChatGPT more attractive for businesses, who prefer productivity bundles like Google Workspace and Microsoft 365 for their workforce. OpenAI says it has increasingly generated revenue from ChatGPT Team, and even offered discounts on such subscriptions recently. The company has projected roughly $15 billion in revenue from business subscriptions to ChatGPT in 2030, up from $600 million in revenue in 2024 In the works OpenAI product chief Kevin Weil and other company executives first discussed and showed off designs for the document collaboration feature nearly a year ago, the report said. Not much came of it due to workforce limitations and other feature got a second wind in October with OpenAI's Canvas, a ChatGPT feature to draft documents and code with AI. More recently, OpenAI developed but hasn't launched software allowing multiple ChatGPT customers to communicate about their shared work in the app, The Information said. Last week, OpenAI announced the rollout of ChatGPT Record mode for ChatGPT Pro, Enterprise, and Edu users. The feature, however, has limited utility since ChatGPT doesn't offer file storage or other productivity features.

OpenAI goes after Google Docs, Microsoft's Word for web with new ChatGPT features
OpenAI goes after Google Docs, Microsoft's Word for web with new ChatGPT features

Time of India

time4 days ago

  • Business
  • Time of India

OpenAI goes after Google Docs, Microsoft's Word for web with new ChatGPT features

OpenAI has developed features that let people collaborate on documents and communicate via chat in ChatGPT, The Information reported, citing two people who have seen the designs. Designs for collaboration features suggest OpenAI could even consider developing related productivity features such as file storage, the report said. These features compete with rival Google 's Docs and Word for the web by Microsoft , OpenAI's biggest shareholder and business partner. OpenAI has already eaten into traffic for Google's flagship Search, with ChatGPT becoming popular for web searches. Google recently rolled out its 'AI Mode' for Search in India, after announcing the feature during its I/O event last month. The feature employs Gemini 2.5 to improve Google search results and experience. Meanwhile, the AI major has been hurting business for Microsoft's Copilot AI with its ChatGPT Enterprise , Bloomberg reported. The two tech companies also compete for customers of AI coding assistants and AI models accessed via application programming interface (API) by more advanced developers. Notably, Microsoft's multibillion investments in OpenAI were crucial in making the latter a forerunner in the AI space. It's still not clear if OpenAI will release these features, The Information reported. But in the event it does, it will open a new front in the rivalry with Google, and further complicate its ties with Microsoft. OpenAI is trying to secure Microsoft's blessings for a restructuring plan for its for-profit unit that oversees ChatGPT. Last week, Financial Times reported that Microsoft is prepared to abandon its high-stakes negotiations with OpenAI over the future of its alliance. Discover the stories of your interest Blockchain 5 Stories Cyber-safety 7 Stories Fintech 9 Stories E-comm 9 Stories ML 8 Stories Edtech 6 Stories Also Read: Group that opposed OpenAI's restructuring raises concerns about new revamp plan Attractive bundle A combination of enterprise AI and document collaboration features would make ChatGPT more attractive for businesses, who prefer productivity bundles like Google Workspace and Microsoft 365 for their workforce. OpenAI says it has increasingly generated revenue from ChatGPT Team, and even offered discounts on such subscriptions recently. The company has projected roughly $15 billion in revenue from business subscriptions to ChatGPT in 2030, up from $600 million in revenue in 2024 In the works OpenAI product chief Kevin Weil and other company executives first discussed and showed off designs for the document collaboration feature nearly a year ago, the report said. Not much came of it due to workforce limitations and other priorities. The feature got a second wind in October with OpenAI's Canvas, a ChatGPT feature to draft documents and code with AI. More recently, OpenAI developed but hasn't launched software allowing multiple ChatGPT customers to communicate about their shared work in the app, The Information said. Last week, OpenAI announced the rollout of ChatGPT Record mode for ChatGPT Pro, Enterprise, and Edu users. The feature, however, has limited utility since ChatGPT doesn't offer file storage or other productivity features.

Leak reveals Grok might soon edit your spreadsheets
Leak reveals Grok might soon edit your spreadsheets

Yahoo

time5 days ago

  • Business
  • Yahoo

Leak reveals Grok might soon edit your spreadsheets

Leaked code suggests xAI is developing an advanced file editor for Grok with spreadsheet support, signaling the company's push to compete with OpenAI, Google, and Microsoft by embedding AI copilots into productivity tools. 'You can talk to Grok and ask it to assist you at the same time you're editing the files!' writes reverse engineer Nima Owji, who leaked the finding. TechCrunch has reached out to xAI to confirm the findings and learn more. xAI hasn't explicitly detailed its strategy for pursuing interactive, multimodal AI workspaces, but it has dropped a series of announcements that point to how the company is thinking about these tools. In April 2025, xAI launched Grok Studio, a split-screen workspace that lets users collaborate with Grok on generating documents, code, reports, and browser games. It also launched the ability to create Workspaces that let you organize files and conversations in a single place. While OpenAI and Microsoft have similar tools, Google's Gemini Workspace for Sheets, Docs, and Gmail appears to be the most similar to what xAI is reportedly building. Google's tools can edit Docs and Sheets and allow you to chat with Gemini while looking at or editing documents. The difference is that Gemini Workspace only works within Google's own ecosystem. It's not clear what types of files xAI's editor might support aside from spreadsheets, or whether xAI plans to build a full productivity suite that could compete with Google Workspace or Microsoft 365. If Owji's findings are true, the advanced editor would be a step towards Elon Musk's ambitions to turn X into an 'everything app' that includes docs, chat, payments, and social media. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store