Latest news with #ProjectMariner

AI isn't just standing by. It's doing things — without guardrails

Los Angeles Times

2 days ago

Los Angeles Times

AI isn't just standing by. It's doing things — without guardrails

Just two and a half years after OpenAI stunned the world with ChatGPT, AI is no longer only answering questions — it is taking actions. We are now entering the era of AI agents, in which AI large language models don't just passively provide information in response to your queries, they actively go into the world and do things for — or potentially against — you. AI has the power to write essays and answer complex questions, but imagine if you could enter a prompt and have it make a doctor's appointment based on your calendar, or book a family flight with your credit card, or file a legal case for you in small claims court. An AI agent submitted this op-ed. (I did, however, write the op-ed myself because I figured the Los Angeles Times wouldn't publish an AI-generated piece, and besides I can put in random references like I'm a Cleveland Browns fan because no AI would ever admit to that.) I instructed my AI agent to find out what email address The Times uses for op-ed submissions, the requirements for the submission, and then to draft the email title, draft an eye-catching pitch paragraph, attach my op-ed and submit the package. I pressed 'return,' 'monitor task' and 'confirm.' The AI agent completed the tasks in a few minutes. A few minutes is not speedy, and these were not complicated requests. But with each passing month the agents get faster and smarter. I used Operator by OpenAI, which is in research preview mode. Google's Project Mariner, which is also a research prototype, can perform similar agentic tasks. Multiple companies now offer AI agents that will make phone calls for you — in your voice or another voice — and have a conversation with the person at the other end of the line based on your instructions. Soon AI agents will perform more complex tasks and be widely available for the public to use. That raises a number of unresolved and significant concerns. Anthropic does safety testing of its models and publishes the results. One of its tests showed that the Claude Opus 4 model would potentially notify the press or regulators if it believed you were doing something egregiously immoral. Should an AI agent behave like a slavishly loyal employee, or a conscientious employee? OpenAI publishes safety audits of its models. One audit showed the o3 model engaged in strategic deception, which was defined as behavior that intentionally pursues objectives misaligned with user or developer intent. A passive AI model that engages in strategic deception can be troubling, but it becomes dangerous if that model actively performs tasks in the real world autonomously. A rogue AI agent could empty your bank account, make and send fake incriminating videos of you to law enforcement, or disclose your personal information to the dark web. Earlier this year, programming changes were made to xAI's Grok model that caused it to insert false information about white genocide in South Africa in responses to unrelated user queries. This episode showed that large language models can reflect the biases of their creators. In a world of AI agents, we should also beware that creators of the agents could take control of them without your knowledge. The U.S. government is far behind in grappling with the potential risks of powerful, advanced AI. At a minimum, we should mandate that companies deploying large language models at scale need to disclose the safety tests they performed and the results, as well as security measures embedded in the system. The bipartisan House Task Force on Artificial Intelligence, on which I served, published a unanimous report last December with more than 80 recommendations. Congress should act on them. We did not discuss general purpose AI agents because they weren't really a thing yet. To address the unresolved and significant issues raised by AI, which will become magnified as AI agents proliferate, Congress should turn the task force into a House Select Committee. Such a specialized committee could put witnesses under oath, hold hearings in public and employ a dedicated staff to help tackle one of the most significant technological revolutions in history. AI moves quickly. If we act now, we can still catch up. Ted Lieu, a Democrat, represents California's 36th Congressional District.

Opera's new 'fully agentic' browser can surf the web for you

Engadget

28-05-2025

Business
Engadget

Opera's new 'fully agentic' browser can surf the web for you

It was only earlier this year Norway's Opera released a new browser, and now it's adding yet another offering to an already crowded field. Opera is billing Neon as a "fully agentic browser." It comes with an integrated AI that can chat with users and surf the web on their behalf. Compared to competing agents, the company says Neon is faster and more efficient at navigating the internet on its own due to the fact it parses webpages by analyzing their layout data. Building on Opera's recent preview of Browser Operator, Neon can also complete tasks for you, like filling out a form or doing some online shopping. The more you use Neon to write, the more it will learn your personal style and adapt to it. All of this happens locally, in order to ensure user data remains private. To view this content, you'll need to update your privacy settings. Please click here and view the "Content and social-media partners" setting to do so. Additionally, Neon can make things for you, including websites, animations and even game prototypes, according to Opera. If you ask Neon to build something particularly complicated or time-consuming, it can continue the task even when you're offline. This part of the browser's feature set depends on a connection to Opera's servers in Europe where privacy laws are more robust than in North America. "Opera Neon is the first step towards fundamentally re-imagining what a browser can be in the age of intelligent agents," the company says. If all of this sounds familiar, it's because other companies, including Google and OpenAI, have been working on similar products. In the case of Google, the search giant began previewing Project Mariner, an extension that adds a web-surfing agent to Chrome, last December. OpenAI, similarly, has been working on its own "Operator" mode since the start of the year. Neon, therefore, sees Opera attempting to position itself as an innovator in hopes of claiming market share, but the company has a difficult task ahead. According to data from StatCounter, only about 2.09 percent of internet users use Opera to access the web. Chrome, by contrast, commands a dominant 66.45 percent of the market. That's a hard hill to climb when your competitors are working on similar features. It's also worth asking if an agentic browser is something people really want. Opera suggests Neon is smart enough to book a trip for you. That sounds great in theory, but what if the agent makes an error and books the wrong connecting flight. A certain amount of friction ensures users pay attention and check things on their own. If you want to try Neon for yourself, you can join the wait list.

Can AI really replace your keyboard and mouse?

Digital Trends

28-05-2025

Business
Digital Trends

Can AI really replace your keyboard and mouse?

'Hey ChatGPT, left-click on the enter password field in the pop-up window appearing in the lower left quadrant of the screen and fill XUS&(#($J, and press Enter.' Fun, eh? No, thanks. I'll just move my cheap mouse and type the 12 characters on my needlessly clicky keyboard, instead of speaking the password out loud in my co-working space. Recommended Videos It's pretty cool to see ChatGPT understand your voice command, book a cheap ticket for eight people to watch a Liverpool match at Anfield, and land you at the checkout screen. But hey, will you trust it with the password? Or, won't you just type the password with a physical keyboard? Imagine going all-in on AI, only to realize that the last-mile step, where you REALLY need a keyboard or mouse, is not possible, and you're now stuck. But that's exactly the question many have been asking after seeing flashy AI agents and automation videos from the likes of Google, OpenAI, and Anthropic. It's a legitimate question AI was the overarching theme at Google's I/O event earlier this year. By the end of the keynote, I was convinced that Android smartphones are not going to be the same again. And by that extension, any platform where Gemini is going to land — from Workspace apps such as Gmail to navigation on Google Maps while sitting in a car. The most impressive demo was Project Mariner, and the next research prototype of Project Astra. Think of it as a next-gen conversational assistant that will have you talk and get real stuff done, without ever tapping on the screen or pulling up the keyboard. You can shift your queries from a user manual hosted on a brand's website to instructional YouTube videos, without ever repeating the context. It's almost as if the true concept of memory has arrived for AI. In a web browser, it's going to book you tickets, landing you on the final page where you simply have to confirm if all the details are as requested, and you proceed with the payment. That leads one to wonder whether the keyboard and mouse are dead concepts for digital inputs as voice interactions come to the forefront of AI. The burden of error Now, as odd as that sounds, your computer already comes with voice-based control for navigating through the operating system. On Windows PCs and macOS, you can find the voice access tools as part of the accessibility suite. There are a handful of shortcuts available to speed up the process, and you can create your own, as well. With the advent of next-gen AI models, we're talking about ditching the keyboard and mouse for everyone, and not just pushing it as an assistive technology. Imagine a combination of Claude Computer Use and the eye-tracked input from Apple's Vision Pro headset coming together. In case you're unfamiliar, Anthropic's Computer Use is a, well, computer use agent. Anthropic says it lets the AI 'use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text.' Now, think of a scenario where your intent is given as voice to Claude, picked up by the onboard mics, and the task is executed. For whatever final step is required of you, gestures fill the gap. The Vision Pro has demonstrated that eye-tracked controls are possible and work with a high degree of accuracy. Away from headsets, voice-controlled AI can still work on an average computer. Hume AI, in partnership with Anthropic, is building a system called Empathetic Voice Interface 2 (EVI 2) that turns voice commands into computer input. It's almost like talking to Alexa, but instead of ordering broccoli, the AI assistant understands what we are saying and turns it into keyboard or mouse input. All that sounds terrific, but let's think of a few realistic scenarios. You will need a keyboard for fine-tuned media edits. Making minor changes to a coding canvas. Filling cells in a sheet. Imagine saying, 'Hey Gemini, put four thousand eight hundred and ninety-five dollars in cell D5 and label it as air travel expense?' Yeah, I know. I'd just type it, too. The last mile, not the end If you go through demos of AI Mode in Search, the Project Mariner agent, and Gemini Live, you will get a glimpse of voice computing. All these AI advancements sound stunningly convenient, until they're not. For example, at what time does it get too irritating to say things like 'Move to the dialog box in the top-left corner and left click on the blue button that says Confirm.' It's too cumbersome, even if all the steps before it were performed autonomously by an AI. And let's not forget the elephant in the room. AI has a habit of going haywire. 'At this stage, it is still experimental—at times cumbersome and error-prone,' warns Anthropic about Claude Computer Use. The situation is not too dissimilar from OpenAI's Operator Agent, or a similar tool of the same name currently in development at Opera, the folks behind a pretty cool web browser. Removing the keyboard and mouse from an AI-boosted computer is like driving a Tesla with full self-driving (FSD) enabled, but you no longer have the steering and the controls available are the brake and accelerator pedals. The car is definitely going to take you somewhere, but you need to take over if some unexpected event transpires. In the computing context, think of the troubleshooter, where you MUST be in the driving seat. But let's assume that an AI model, driven primarily by voice (and captured by the mic on your preferred computing machine), lands you at the final step where you need to close the workflow, like making a payment. Even with Passkeys, you will need to at least confirm your identity by entering the password, opening an authenticator app, or touching a fingerprint sensor? No OS-maker or app developer (especially dealing with identity verification) would let an AI model have open control over handling this critical task. It's just too risky to automate with an AI agent, even with conveniences like Passkeys coming into the picture. Google often says the Gemini will learn from memory and your own interactions. But it all begins with actually letting it monitor your computer usage, which is fundamentally reliant on keyboard and mouse input. So yeah, we're back to square one. Go virtual? It's a long wait When we talk about replacing the computer mouse and keyboard with AI (or any other advancement), we are merely talking about substituting them with a proxy. And then landing at a familiar replacement. There is plenty of research material out there talking about virtual mice and keyboard, dating back at least a decade, long before the seminal 'transformers' paper was released and pushed the AI industry into the next gear. In 2013, DexType released an app that tapped into the tiny Leap Motion hardware to enable a virtual typing experience in the air. No touch screen required, or any fancy laser projector like the Humane AI Pin. Leap Motion died in 2019, but the idea didn't. Meta is arguably the only company that has a realistic software and hardware stack ready for an alternative form of input-output on computing, something it calls human-computer interaction (HCI). The company has been working on wrist-worn wearables that enable an entirely different form of gesture-based control. Instead of tracking the spatial movement of fingers and limbs, Meta is using a technique called electromyography (EMG). It turns electrical motor nerve signals generated in the wrist into digital input for controlling devices. And yes, cursor and keyboard input are very much part of the package. At the same time, Meta also claims that these gestures will be faster than a typical key press, because we are talking about electrical signals traveling from the hand straight to a computer, instead of finger movement. 'It's a much faster way to act on the instructions that you already send to your device when you tap to select a song on your phone, click a mouse or type on a keyboard today' says Meta. Fewer replacements, more repackaging There are two problems with Meta's approach, with or without AI coming into the picture. The concept of a cursor is still very much there, and so is the keyboard, even though in a digital format. We are just switching from the physical to virtual. The replacement being pushed by Meta sounds very futuristic, especially with Meta's multi-modal Llama AI models coming into the picture. Then there's the existential dilemma. These wearables are still very much in the realm of research labs. And when they come out, they won't be cheap, at least for the first few years. Even barebones third-party apps like WowMouse are bound to subscriptions and held back by OS limitations. I can't imagine ditching my cheap $100 keyboard with an experimental device for voice or gesture-based input, and imagine it replacing the full keyboard and mouse input for my daily workflow. Most importantly, it will take a while before developers embrace natural language-driven inputs into their apps. That's going to be a long, drawn-out process. What about alternatives? Well, we already have apps such as WowMouse, which turns your smartwatch into a gesture recognition hub for finger and palm movements. However, it only serves as a replacement for cursor and tap gestures, and not really a full-fledged keyboard experience. But again, letting apps access your keyboard is a risk that OS overlords will protest. Remember keyloggers? At the end of the day, we are at a point where the conversational capabilities of AI models and their agentic chops are making a huge leap. But they would still require you to go past the finish line with a mouse click or a few key presses, instead of fully replacing them. Also, they're just too cumbersome when you can hit a keyboard shortcut or mouse instead of narrating a long chain of voice commands. In a nutshell, AI will reduce our reliance on physical input, but won't replace it. At least, not for the masses.

Project Mariner AI Web Browser : First Tests and Impressions

Geeky Gadgets

28-05-2025

Business
Geeky Gadgets

Project Mariner AI Web Browser : First Tests and Impressions

What if your browser could think for itself—retrieving data, navigating websites, and even running code—all without you lifting a finger? That's the bold promise behind Google's Project Mariner, an experimental AI agent designed to tackle browser-based tasks with minimal human intervention. But does it deliver on this vision of autonomy, or does it stumble under the weight of its ambition? In its first five tests, Project Mariner showcased moments of brilliance, such as extracting YouTube metrics with ease, but also revealed critical flaws, particularly when faced with secure platforms or complex interactions. These early trials offer a fascinating glimpse into the future of AI-driven productivity—and the hurdles we'll need to overcome to get there. All About AI explores the strengths and shortcomings of Project Mariner across five diverse scenarios, from retrieving live stream details to executing Python code. Along the way, you'll discover where this AI agent shines—like its ability to handle basic form interactions—and where it falters, such as its struggles with external AI tools like ChatGPT. Whether you're intrigued by the potential of browser-based automation or curious about the challenges of creating a truly autonomous agent, these insights will leave you pondering just how close we are to a future where AI can seamlessly navigate the digital world on our behalf. Project Mariner Overview Task 1: Retrieving Video Metrics Project Mariner successfully retrieved the view count of a specific YouTube video, showcasing its ability to navigate websites and extract relevant data. This task highlighted the agent's competence in basic web navigation and information retrieval. By efficiently locating the video and extracting the desired metrics, it demonstrated a solid foundation for handling straightforward search tasks. However, its success in this scenario also raises questions about how it might perform when faced with more complex or dynamic web environments. Task 2: Email and Live Stream Information Retrieval The agent achieved partial success in gathering details about a live stream event but encountered significant difficulties with email-related tasks. When tasked with logging into Gmail to send an email, Project Mariner struggled to complete the process autonomously. Even with manual login assistance, it was unable to navigate the platform effectively. This limitation highlights its current inability to handle secure platform interactions, which is a critical area for improvement. The challenges faced in this scenario emphasize the need for enhanced capabilities in managing authentication protocols and executing tasks within secure environments. Project Mariner AI Agent Browser First Impressions Watch this video on YouTube. Find more information on AI agents by browsing our extensive range of articles, guides and tutorials. Task 3: Website Navigation and Form Interaction In this scenario, Project Mariner navigated to the DeepMind diffusion model page and interacted with a waitlist form. It successfully located the form and modified its fields, demonstrating its capability for basic form interaction. However, certain actions required user input, indicating a reliance on manual intervention for more complex tasks. While its performance in locating and modifying form elements was commendable, the agent's limited autonomy in this area suggests that further development is needed to enable it to handle more intricate interactions independently. Task 4: Python Code Execution Project Mariner identified an online platform for executing Python code and successfully ran a simple script. This task underscored its ability to locate suitable platforms and perform basic code execution. However, the agent required additional user instructions to complete the task, suggesting that its problem-solving capabilities in coding environments are still evolving. Despite these limitations, its performance in this area was among the most promising of the five tests, indicating potential for further development in programming-related tasks. Task 5: Interaction with ChatGPT When tasked with accessing ChatGPT for a discussion on software engineering, the agent encountered navigation errors and failed to complete the task. This revealed significant challenges in interacting with external AI tools, particularly when navigating complex interfaces or meeting platform-specific requirements. The inability to complete this task underscores a critical gap in Project Mariner's functionality, highlighting the need for improved adaptability and error-handling mechanisms when engaging with external systems. Key Observations Project Mariner's performance across the five tests revealed a combination of strengths and weaknesses. These observations provide a clearer understanding of its current capabilities and the areas that require further development. Strengths: The agent demonstrated effectiveness in retrieving information, navigating websites, and executing simple scripts, showcasing its potential for handling straightforward tasks. The agent demonstrated effectiveness in retrieving information, navigating websites, and executing simple scripts, showcasing its potential for handling straightforward tasks. Weaknesses: It struggled with secure platform interactions, email automation, and navigating external AI tools, highlighting critical gaps in its functionality. It struggled with secure platform interactions, email automation, and navigating external AI tools, highlighting critical gaps in its functionality. Its limited autonomy in handling complex tasks often necessitated user intervention, reducing its overall efficiency and independence. Occasional errors in task execution, particularly in scenarios involving intricate interfaces or multi-step processes, further emphasized the need for refinement. Future Prospects and Development Needs Project Mariner demonstrates significant potential as a browser-based AI agent, particularly for tasks involving basic web navigation and simple code execution. However, its current limitations in handling secure platforms, interacting with external AI tools, and executing autonomous operations indicate that substantial improvements are required. Addressing these challenges will be essential for unlocking its full potential and allowing it to handle more complex and independent tasks effectively. By focusing on enhancing its problem-solving capabilities, adaptability, and error-handling mechanisms, Project Mariner could evolve into a more robust and versatile tool for a wide range of applications. Media Credit: All About AI Filed Under: AI, Reviews, Top News Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

Google Makes AI Agent Prototype Available to US Users

Yahoo

27-05-2025

Business
Yahoo

Google Makes AI Agent Prototype Available to US Users

Google has released its AI agent research prototype Project Mariner to users in the US. Jaclyn Konzelmann, director of product management for Google Labs, speaks about the human-AI agent interactions with Bloomberg's Jackie Davalos at Google I/O. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Latest news with #ProjectMariner

AI isn't just standing by. It's doing things — without guardrails

Opera's new 'fully agentic' browser can surf the web for you

Can AI really replace your keyboard and mouse?

Project Mariner AI Web Browser : First Tests and Impressions

Google Makes AI Agent Prototype Available to US Users

Get Started Now: Download the App