Latest news with #AIvideo


GSM Arena
5 days ago
- GSM Arena
Gemini can now turn images into videos
Today Google has added a new feature to Veo 3, Gemini's AI video-making arm. Starting today, you can turn an image into a video. These will be eight-second video clips and not longer, and they will have sound too. To use this, select Videos from the tool menu in the prompt box and upload an image, then describe the scene and add any audio instructions you might have. That's it, your image will turn into a video. Once it's complete, you can of course share it or download it. You can also use the thumbs up and down buttons on your generated videos to give Google feedback about them, which it will use to improve the experience further. This photo-to-video capability is starting to roll out to Google AI Pro and Ultra subscribers in "select countries around the world", and the same capabilities are available in Flow, Google's AI filmmaking tool. Google says over 40 million Veo 3 videos were already generated through the Gemini app and Flow over the last seven weeks, and this new functionality will undoubtedly make that number skyrocket in the near future. Note that these AI generated videos include a visible watermark to show they are AI generated, as well as an invisible SynthID digital watermark. Source


CNET
6 days ago
- CNET
Over 40 Million AI Videos Have Been Made With Google Veo 3 Since May: How My Expert Testing Went
Google's most advanced AI video model continues to be dominant. Veo 3 made a splash when it was released at Google I/O in May, boasting one big, audible difference from other AI video generators: It can create sound, synchronized to a scene's action. And people have been taking advantage of it -- more than 40 million Veo 3 videos have been created in the last seven weeks, across Gemini and its new filmmaker-focused tool called Flow, Google said in a blog post on Thursday. Upgrades like the ones also announced Thursday mean that number will likely only increase. Google is introducing a new photo-to-video capability through the Gemini app. With the feature, you can upload a picture you took, or one you generated through Imagen, and Veo 3 will animate it. In your prompt, you can describe the audio you want in the video, like dialogue. The image-to-video capability is rolling out now on the web and throughout the week on mobile. Even before these updates, Veo 3 users were quick to share their impressive-looking videos online. I spend a lot of time testing and reviewing AI, specifically image and video generators, and I've seen enough slop and hallucinations to approach it all with skepticism. But after seeing the videos, I knew I had to dive in and put Veo 3 to the test. Without spoiling anything, I walked away from Veo feeling like this was the next natural step for Google, with one feature in particular giving the company an edge that might make it a more serious contender in the AI creative space. But there are serious limits and annoyances that I hope are addressed soon. Here's how my experience went and what you need to know. Veo 3 availability, pricing and privacy There are a couple of different ways to access Veo 3. Unfortunately, all of them will require you to pay up in some way. Veo 3 is currently available through Google AI Pro, Google AI Ultra, Flow and Google Vertex. Google recently expanded access to a version of Veo 3 (Veo 3 fast) to its cheaper $20 per month plan, Google AI Pro. Pro users get limited access to Veo 3, which is good if you just want to play around with it. To get full access, you'll need Google AI Ultra -- the newest, priciest tier at $250 per month. (It's currently half off for $125 per month for three months.) Flow is Google's new filmmaking-focused AI tool, available for those paying Pro and Ultra subscribers. Vertex is Google's AI enterprise platform, and you'll know if you have access to it. Google's Gemini privacy policy says the company can collect your info to improve its technologies, which is why it recommends not sharing any confidential information with Gemini. You also agree to Google's prohibited use policy, which outlaws the creation of abusive or illegal content. My wild ride with Veo 3 The most impressive thing about Veo 3 is its new audio generation capabilities. You don't have to tell Gemini in your prompt that you want sound; it will automatically add it. This is a first among competitors like OpenAI's Sora and Adobe's Firefly and it certainly gives Google a huge edge. While the AI audio is a nice perk, it isn't perfect. If you're familiar with the somewhat clunky nature of AI-generated music and dialogue, you'll be able to identify it immediately. But there were times when it flowed more naturally. The clashing metal sounds and grunts in my alien fight scene were timed perfectly to their attacks, something that would've been difficult to add on my own afterward. But the dinosaur-like aliens also literally say "roar" and "hiss" instead of making those noises. My kayaker's paddling very nearly matched up with the water sloshing sound. The nature ambience in that video was particularly lovely and added a layer of depth that's been missing from AI videos. To give Veo a challenge, I wanted overlapping sound in this beach bonfire party scene. What I got was fine, but nothing show stopping. My dream beach bonfire partiers didn't sound like any party I've ever been to, but still, points for being first and relatively unproblematic. Of course, while the audio was nice, it doesn't take away from the weird eccentricities that continue to plague AI generators. I ran into a few hiccups, mostly with people's faces, a notoriously hard thing for AI to mimic. But compared to the glaringly obvious errors I ran into with Veo 2, the new generation does appear to have made real improvements as Google claimed it did. I run into hallucinations a lot when I'm testing AI image and video generators, so the first thing I do is look for whether a service gives me the ability to edit it. Veo 3 doesn't offer any of these, which is a bummer. It's certainly something that's going to make it less useful for professional creators, who are used to more fine-tuning editing tools and need to make precise tweaks for their projects. You can send a follow-up prompt asking for specific changes. For example, I asked Veo to change the angle in the previous video so I could see her face, which the program handled well. With Veo 3, you'll typically have to wait 3 to 5 minutes for a new, edited video to load, though. Veo 3 has the longest generation time of any AI video generator I've tested. But the addition of audio to the videos excuses the longer wait time in my eyes. The worst part of Veo 3 is how quickly I hit my daily generation limit. After only five videos, I was barred for an entire 24-hour period -- something that really annoyed me and made it much harder to assess. Google's VP of Gemini and Google Labs, Josh Woodward, said in a post on X/Twitter that Ultra subscribers like me have the highest number of generations that reset daily, in the regular Gemini app and in Flow. And for me, that limit in Gemini was five videos. Flow's limit is 125, according to Woodward. I reached out to Google to get clarity on what the daily limit is for Ultra users creating through Gemini that Woodward mentions. Here's the response: "Google AI Ultra subscribers get the highest level of access to Veo 3, our state-of-the-art video generation model, which they can use in both the Gemini app and Flow, our new AI filmmaking tool." The limits are another sign that this isn't a tool meant for professional creation and iterative editing. You need to spend time thoughtfully crafting your prompt and if Google flubs a face or glitches, you're likely to run out of credits fast and end up out of luck. Veo 3 is better suited for AI enthusiasts who want to dip their toes in video creation, not creators experimenting with AI. Is Veo 3 worth the cost? After an underwhelming experience with Veo 2, I had reservations about what to expect in the usefulness and accuracy of Veo 3. But the new model was impressive, the audio especially, even though it's still missing some key features. Let me be clear: There is no rational reason to spend hundreds of dollars on a Google AI Ultra plan only to use Veo 3. If you want to dabble for fun, I recommend starting with the cheaper Google AI Pro plan, or use Veo 2 for hundreds less per month. The Ultra plan does offer other features, like YouTube Premium, 30 terabytes of space and access to the newest Gemini models. So if you want any of those things, then, yeah, pay up and go play around with Veo 3. But it's not worth it on its own. Veo 3 isn't the revolutionary upgrade those social media posts might lead you to believe. It's the next generation, better than last month's Veo 2, and it shows real promise in Google's future AI video endeavors. But be prepared to pay up if you want to try it out.

Fast Company
6 days ago
- Entertainment
- Fast Company
These Pixar and Apple alums want to change the way you create generative AI video
Intangible is the first tool that could make generative AI video truly usable. The new web app—created by Pixar, Apple, Google, and Unity alumni—is trying to change the user experience of generative AI video by letting you fully control your video using a 3D interface, thus solving the lack of control of current text prompts. Think about it as a 3D animation program that lets you control the stage, characters, and camera in your film, with a generative AI rendering engine that will turn those elements into reality. Intangible's current version feels half-baked, and it will not produce The Godfather yet, but it's definitely a step in the right direction for the generative AI video user experience. 'To deliver professional-grade results in creative industries like film, advertising, events, and games, the directors, producers, and every creative on the team needs control over set design, shot composition, art direction, pacing, cameras, and more to deliver on the creative vision,' Intangible chief product officer Charles Migos tells me over email. 'Current AI models are reliant on extensive prompting, and language alone isn't enough to convey creative intent. By providing generative AI models with spatial intelligence, Intangible allows creatives to get closer to professional-grade results with less prompting, more feel, and more control.' Migos is right that we need a better way to control the imagination of generative AI video engines. While generative AI video is getting to the point at which it is truly indistinguishable from reality, creating it is like rolling the dice. There's still a chasm between the vision in your mind and what comes out of Google's Veo 3 or Kling. This makes it pretty much unusable for everything but memes, skits, storyboards, and the occasional ad stunt. While some AI models let you set camera paths or define some characters and objects using images, the prompts that 'create' the videos are inherently limited by the interpretable nature of language. Every person and AI visualizes any given text differently. That's the beauty of reading a book, but it's a limitation when it comes to creating what you have in mind. That's why Alfred Hitchcock meticulously planned his films using storyboards, so that everyone in the production could truly visualize the 'intangible' nature of his imagination to faithfully capture Cary Grant's desperation as a biplane tried to kill him in North by Northwest. Spatial intelligence Migos and CEO Bharat Vasan believe that to truly unleash the power of generative AI for video production, we must add 'spatial intelligence' to the interface. Computer vision expert Fei-Fei Li, known as the godmother of artificial intelligence, has defined spatial intelligence as the ability, both in humans and artificial intelligence systems, to perceive, interpret, reason about, and interact with the three-dimensional world. This involves not just recognizing objects, but understanding their positions, relationships, and functions within a physical space, and being able to act upon that understanding. 'By building in interactive 3D from the outset, Intangible's world model gives generative AI image and video generation models the ability to be more precise, without extensive prompting,' Vasan says. This precision is what current text-to-video tools fundamentally lack. When you describe a scene in words, you're forcing the AI to interpret spatial relationships through language—an inherently imprecise translation that often results in the AI changing things and adding objects or actions that you didn't have in mind. Intangible grounds generative AI models in structured 3D scenes with real camera control and spatial logic, which Vasan says 'provides best-in-class coherence in the results, which we further improve with object descriptions, reference imagery, and fine-tuning models [LoRAs, or low-rank adaptations]. The goal is to address one of the biggest complaints about current AI video tools: the lack of coherence and continuity between frames.' How it works The platform allows users to build custom 3D scenes using drag-and-drop objects, set up cameras, and control them. The interface is pretty simple: You can start from a preset scene or with a blank world. There's a general viewport that shows you the scene, with a ground ready for you to start dropping buildings, characters, and other objects from a library of more than 5,000 assets. At the bottom of the interface, a toolbox gives you access to all you need. To the left, icons allow you to open a scene panel in which you can add and reorder all the shots that will form your final video. In the center, a central prompt allows you to add new objects using text. To its left, there are three icons to add objects to the scene. The first one allows you to display a palette to pick an object from the library of premade assets. Then there is an icon to add primitives—like spheres, cubes, or pyramids—to create your own basic objects. Finally, a third button lets you add what the company calls 'interactables': cameras, characters, waypoints to tell the camera where to move, and 'populators,' which will fill your scene with variations of the same objects, like bushes or shrubs in a forest. Working in this interface is pretty straightforward. Objects in the scene can be moved around with standard 3D handles, with arrows to move, cubes to scale, and arches to rotate the objects in all three axes. The interface—at least using Chrome in my Macbook Air 15 with M2 chip—was sluggish but usable, with some serious pauses at the beginning of the session, which got better later on. To the right of the prompt field, there are two icons that switch between edit and visualization modes. The latter opens a side panel on the right of the screen that contains all you need to tell the generative AI how to render your scene: how the objects look, how they interact with each other, what the lighting and the atmosphere look like, and anything else you want to define. There are also options to set up the time of the day or the final look of your video, which includes modes like photorealism, 3D cartoon, or film noir. Once you write your prompt, click the 'generate' button . . . and that's it. The idea is good. I tried it (here, it's free for now), and it works- ish. I started from one of the templates, a Roman urban scene. I quickly added an elephant, positioned and scaled it up with the object handles, and then I clicked on the visualization icon to set the prompt (a premade one was already there), and clicked on 'generate.' The results were just okay. Intangible does what the company claims, but it still makes mistakes. You can see it in the way it rendered this scene with a giant elephant in a Roman street. The Colosseum is gone, replaced by a mountain and some pointy things I can't identify. There are rendering mistakes as well, and the people are wearing the wrong clothes—that is, unless I missed the history class in which they teach that Romans wore jeans and Daisy Dukes. Once you have your shot, you can turn it into a video. This is where things get disappointing. I thought Intangible would use its own generative AI engine to directly interpret the 3D scene itself—as Nvidia demonstrated six years ago —and turn it into a final photorealistic video using the objects to guide the final rendering. In reality, it feeds your still image to the latest version of Kling—a popular, pretty realistic rendering engine from China that can turn any image into a living video, following a prompt. If you are a 3D artist, you will be better off combining your current workflow using Kling or any other image-to-video generative AI (as some people are already doing). If you are starting from scratch with 3D software, Intangible can work for you even if it is nowhere near perfect. The software will get better: 'In the next three years, we expect tools like Intangible will be able to cover all aspects of preproduction and digital production for existing forms of media,' Migos and Vasan tell me. They also believe that 'AI tools bring an opportunity to expand visual storytelling as an art form, creating new categories that human creativity thrives in, as linear, interactive, and immersive media blend. . . . We expect tools like Intangible to be both simple and powerful enough that it empowers a new generation of creatives, not just those who are technical or prompting experts.' For now, despite the glitches, Intangible's premise is the right one: People need a better way to control AI video because text is not a good interface when you are trying to visualize an idea. Spatial intelligence may be the key to solving it. At the very least, this new software shows that, when it comes to artificial intelligence, we still need to work on a better, more natural, and precise user experience. The super-early-rate deadline for Fast Company's Most Innovative Companies Awards is Friday, July 25, at 11:59 p.m. PT. Apply today.


CNET
13-05-2025
- CNET
Sony Xperia 1 VII Lets You Shoot Video Without Looking
Sony's latest flagship Android phone, the Xperia 1 VII, packs a variety of exciting tech features from its Snapdragon 8 Elite chip to its promise of AI-based audio quality upscaling. But it's the video tools that really caught my attention, in particular the 'AI Camerawork' and 'Autoframing' functions that apparently let you shoot steady, professional-looking video without even looking at your phone. It certainly sounds like a novel idea, but this phone needs novel ideas -- and plenty of them -- to justify its whopping price tag. At £1,399, the Xperia 1 VII is significantly more expensive than both the equivalent iPhone 16 Pro and the Samsung Galaxy S25 Ultra. Both those phones have seriously impressed us in their full reviews so Sony will have its work cut out for it if it hopes to pry that much cash out of your hands. Read more: Best Android Phone in 2025 The phone is up for the preorder today in the UK, although Sony currently has no plans to bring it to the US. For reference, that £1,399 UK price converts to about $1,860. Ouch. AI video shooting The AI video tools certainly seem to be the big reason to choose this phone. While I have yet to test it myself, Sony's press materials suggest that it works by using a wide-angle lens, AI-based subject tracking and "posture estimation technology" to keep your subject in frame. The idea is that you only need to roughly point your phone in the vague direction of your subject and the phone will do the rest. It sounds like it could be great for things like skateboarding videos where you and your friend are speeding down the street, although how it really performs in such high-paced scenarios remains to be seen. A wired headphone jack in *checks calendar* 2025?! Sony It's not just the video camera that's been given the AI treatment. Sony says the phone has its "best sound quality to date" thanks to AI-based algorithms that actively upscale compressed, streamed music to make it sound as good as it can. Sony has even equipped the phone with a wired headphone jack to keep audiophiles happy. Sony also says it uses technology inherited from its Bravia TVs for better looking colors on its 6.5-inch display. I'm quite surprised at its low resolution though; the Xperia 1 VII's 1,080x2,340-pixel resolution gives it a pixel density of only 396 pixels per inch (ppi). That's quite a bit below the iPhone 16 Pro's 460ppi or the S25 Ultra's 501ppi and for the Xperia's price, I'd have expected more. Still, I'll reserve judgment on the overall quality until I'm able to see it for myself. A close up on the cameras. Not pictured: AI things. Sony Other features include a triple rear camera system, IP68 water resistance and a 5,000mAh battery. While Sony's spec sheet simply states "USB PD fast charging," it makes no reference to the actual speed it'll charge. The company does say it'll support the phone with six years of security updates, which is fair, although a year less than what Samsung or Google offer for their much cheaper handsets.