Latest news with #AndrejKarpathy

This startup thinks email could be the key to usable AI agents

TechCrunch

2 days ago

Business
TechCrunch

This startup thinks email could be the key to usable AI agents

AI companies are pushing agents as the next Great Workplace Disruptor, but experts say they're still not ready for primetime. They often struggle with autonomous decision-making, can't cooperate with other agents, fail at confidentiality awareness, and integrate poorly into existing systems. Industry pioneers like Andrej Karpathy and Ali Ghodsi have said that, like the deployment of autonomous vehicles, humans need to be in the loop in order for agents to succeed. Startup Mixus understands that and has built an AI agent platform that not only keeps humans in the workflow, but also allows those humans to interact with agents directly from their email or Slack. 'We're meeting customers where they are today,' Elliot Katz, Mixus co-founder, told techCrunch. 'Where is every person in the workforce today? For the most part, they're on email. And so because we can do this through email, we believe that's a way we can democratize access [to agents].' Mixus only beta-launched out of Stanford in late 2024, but it has already raised $2.3 million in pre-seed funding and brought on some customers, including clothing store chain Rainbow Shops, and others across finance and tech. Ease of use is Mixus's biggest selling point, from how it helps create agents to how users can interact with them. Users can create an agent or multiple agents from simple text prompts. For someone in sales, that prompt might look like: 'Create an agent that finds all open tasks in Jira in project mixus-dummy, and send me a report with information on all tasks that are overdue. Draft emails to all the assignees who have overdue tasks, and have me review them in the chat and with simple clear formatting for email (no attachments/docs). Once I verify, send the emails. Run it now. And moving forward, run it every Monday at 7am PST.' If Mixus works reliably, this is a huge unlock for the AI agent space. Most agentic AI tools today either give you a pre-built assistant, a la ChatGPT or Gemini, or require developers to build custom against using frameworks like LangChain, AutoGen, or crewAI. Techcrunch event Tech and VC heavyweights join the Disrupt 2025 agenda Netflix, ElevenLabs, Wayve, Sequoia Capital — just a few of the heavy hitters joining the Disrupt 2025 agenda. They're here to deliver the insights that fuel startup growth and sharpen your edge. Don't miss the 20th anniversary of TechCrunch Disrupt, and a chance to learn from the top voices in tech — grab your ticket now and save up to $675 before prices rise. Tech and VC heavyweights join the Disrupt 2025 agenda Netflix, ElevenLabs, Wayve, Sequoia Capital — just a few of the heavy hitters joining the Disrupt 2025 agenda. They're here to deliver the insights that fuel startup growth and sharpen your edge. Don't miss the 20th anniversary of TechCrunch Disrupt, and a chance to learn from the top voices in tech — grab your ticket now and save up to $675 before prices rise. San Francisco | REGISTER NOW With Mixus, users can set up their agents within Mixus's platform via a chat function – through written or vocal prompts – or by simply emailing instructions to agent@ Then Mixus will build, run, and manage single- or multi-step agents directly from the inbox. 'Most of the world, most of America, doesn't even know what an AI agent is or why it's helpful for them, and they've definitely never used one,' Katz said, noting that older workers might have an especially hard time learning how to use agents. 'We're trying to reach all these people that have never used [agents], but could very much benefit from an AI.' Image Credits:Mixus Katz and his co-founder Shai Magzimof demoed the technology for me, showing how easy it is to add human verifiers for your agents by simply instructing at which step they should come to you for oversight. For example, they ran an agent to do research on TechCrunch reporters before pitching them. The agent would identify and gather the latest technology news and trends, analyze the information to identify potential story angles for a TechCrunch reporter, and compile a research report summarizing the findings. At the last stage, the agent was directed to send the information to Katz for verification. Once approved, the agent would send the completed research report to his Magzimof. The founders stressed that humans can be in the loop as much or as little as a business or enterprise dictates – Magzimof said organizations can set up company-wide rules, like ensuring an email gets checked by a human if it's being sent externally. Mixus doesn't always require human oversight. So, for example, if an agent has already run a Jira integration hundreds of times and hasn't messed it up yet, a human may trust it to continue that task autonomously. Or as Katz put it: 'We enable colleague oversight. We don't mandate colleague oversight.' Bringing other colleagues into the workflow is as easy as tagging them in the chat with an agent or even copying them on the email to the agent. That's another standout compared to agents on the markets today. Most models are single-user, and while Notion AI and Slack GPT allow users to collaborate in shared spaces, they don't take it that step further of letting the AI manage conversations and tasks between teammates in real time. Another core feature of Mixus is its ability to store memory. 'We created Spaces so that every team, every person, every group of people can have a shared memory,' Magzimof said. 'Then all my agents, all my files, all the people can be in that very specific Space's memory.' While ChatGPT and Claude both support memory, their enterprise plans don't yet support shared agent memory across users. What else can Mixus do? A running list of Mixus's capabilities as an AI agent. Image Credits:Mixus The founders ran me through roughly an hour-long demo showing a range of use cases and abilities. Its agents do seem miraculous, reflecting a high degree of autonomy and memory that put Mixus on the high end of the AI agent spectrum. That is, if the product works as reliably as it did in the demo. Like other agents, Mixus can integrate with other tools, from Gmail to Jira, and users can trigger agents to run immediately or on a schedule. Agents in Mixus can run and edit documents or spreadsheets inline, which is similar to ChatGPT, Microsoft Copilot, and Google Gemini, but those are often limited to sandboxed environments. Mixus also enables agents to autonomously navigate organizational context – like figuring out who in an organization owns a particular task by looking through Jira tickets. That kind of cross-tool, org-aware reasoning is still rare among today's agent platforms. Built on a combination of Anthropic's Claude 4 and OpenAI's o3, Mixus agents also have access to the web, which Magzimof says can be used for tasks like live research or monitoring. He described it as 'Google Alerts on steroids.' Taken together, Mixus appears to be less of a productivity tool and more like a tireless digital colleague – one of the most ambitious attempts yet to reimagine AI as a true collaborator. If it works as advertised, your next 'coworker' might not be human, but it might get through your inbox faster than you do. Got a sensitive tip or confidential documents? We're reporting on the inner workings of the AI industry — from the companies shaping its future to the people impacted by their decisions. Reach out to Rebecca Bellan at and Maxwell Zeff at For secure communication, you can contact us via Signal at @rebeccabellan.491 and @mzeff.88.

Axios

15-07-2025

Business
Axios

AI's elusive coding speedup

A surprising new study finding that AI tools can reduce programmers' productivity is upending assumptions about the technology's world-changing potential. Why it matters: Software runs our civilization, and AI is already transforming the business of making it — but no one really knows whether AI will decimate programming jobs, or turn every coder into a miracle worker, or both. Driving the news: The study by METR, a nonprofit independent research outfit, looked at experienced programmers working on large, established open-source projects. It found that these developers believed that using AI tools helped them perform 20% faster — but they actually worked 19% slower. The study appears rigorous and well-designed, but it's small (only 16 programmers participated, completing 246 tasks). Zoom out: For decades, industry visionaries have dreamed of a holy grail called "natural language programming" that would allow people to instruct computers using everyday speech, without needing to write code. As large language models' coding prowess became evident, it appeared this milestone had been achieved. "The hottest new programming language is English," declared AI guru (and OpenAI cofounder) Andrej Karpathy on X early in 2023, soon after ChatGPT's launch. In February, Karpathy also coined the term "vibe-coding" — meaning the quick creation of rough-code prototypes for new projects by just telling your favorite AI to whip up something from scratch. The most fervent believers in software's AI-written future say that human beings will do less and less programming, and engineers will turn into some combination of project manager, specifications-refiner and quality-checker. Either that, or they'll be unemployed. Zoom in: AI-driven coding tends to be more valuable in building new systems from the ground up than in extending or refining existing systems, particularly when they're big. While innovative new products get the biggest buzz and make the largest fortunes, the bulk of software work in most industries consists of more mundane maintenance labor. Anything that makes such work more efficient could save enormous amounts of time and money. Yes, but: This is where the METR study found AI actually slowed experienced programmers down. One key factor was that human developers found AI-generated code unreliable and ended up devoting extra time to reviewing, testing and fixing it. "One developer notes that he 'wasted at least an hour first trying to [solve a specific issue] with AI' before eventually reverting all code changes and just implementing it without AI assistance," the study says. Between the lines: The study authors note that AI coding tools are improving at a rapid enough rate that their findings could soon be obsolete. They also warn against generalizing too broadly from their findings and note the many counter-examples of organizations and projects that have made productivity gains with coding tools. One notable caution that's inescapable from the study's findings: Don't trust self-reporting of productivity outcomes. We're not always the best judges of our own efficiency. Another is that it's relatively easy to measure productivity in terms of "task completion" but very hard to assess total added value in software-making. Thousands of completed tickets can be meaningless — if, for instance, a program is about to be discontinued. Meanwhile, one big new insight can change everything in ways no productivity metric can capture. The big picture: The software community is divided over whether to view the advent of AI coding with excitement or dread.

How CTOs Can Rein In Vibe Coding Cybersecurity Risks

Forbes

14-07-2025

Business
Forbes

How CTOs Can Rein In Vibe Coding Cybersecurity Risks

Founder & CEO of Excellent Webworld. A tech innovator with 12+ years of experience in IT, leading 900+ successful projects globally. In 2025, "vibe coding"—creating software simply by describing your requirements in plain English (i.e., writing a prompt)—has become the IT industry's biggest buzzword. AI tools like Cursor, Lovable and Firebase AI have democratized software creation, enabling even nontechnical users to launch apps and prototypes at unprecedented speed. Andrej Karpathy, who coined the term "vibe coding" in February 2025, explains: "It's not really coding—I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works." While AI-generated code delivers speed and faster time to market, a darker reality is emerging: The same ease that lets anyone spin up a website in minutes allows cybercriminals to do the same. For example: • In 2025, attackers exploited GitLab Duo's AI coding assistant through hidden prompts, causing AI-generated code to leak private source and inject malicious HTML. • Similarly, a Stanford student used prompt injection on Bing Chat to reveal hidden system instructions, exposing sensitive internal data. A direct result of AI-generated responses trusting manipulated user prompts. Urgent action is needed to safeguard digital assets. In this article, I'll share my thoughts on the dark side of vibe coding based on my readings and analysis and why business leaders must rethink their cybersecurity strategies. How Vibe Coding Accelerates Cyberattacks AI-powered vibe coding tools often import external software components automatically. However, these components aren't always thoroughly checked, creating significant business risks. Some of these software components may even be malicious in disguise. Hackers use "slopsquatting" and "typosquatting," or uploading fake software packages with names nearly identical to trusted ones. If a company's AI tool pulls in one of these malicious packages, it can trigger data breaches, system failures or costly downtime. Another significant threat is that, as a recent study found, major AI code tools produce insecure code. Nearly 48% of AI-generated code snippets had exploitable vulnerabilities. These aren't just theoretical risks. One prominent case involved the Storm-2139 cybercrime group, which hijacked Azure OpenAI accounts by exploiting stolen API credentials. They bypassed Microsoft's security measures, generating policy-violating and potentially harmful outputs at scale. As a result, security teams are facing large consequences from AI coding. For example, a recent survey found that, while accidentally installing malicious code was relatively rare, 60% of these incidents were rated as highly significant when they did occur. The Human Factor: Overreliance And Erosion Of Security Skills Vibe coding enables people without technical backgrounds—business managers, marketers and more—to build apps using AI tools. However, many lack cybersecurity training, so critical safety steps are often skipped. This problem grows when teams trust AI-generated code too much, believing it's safe just because a machine produced it. As organizations lean on AI, they risk losing essential security skills and oversight. Without human review and ongoing training, hidden threats can slip through, putting the entire business at risk. In my experience advising digital transformation projects, I've seen teams skip code reviews when using AI tools, assuming the technology is infallible. This overconfidence can be costly; one overlooked vulnerability can compromise an entire system. The Real-World Business Impact Of Security Breaches From Vibe Coding Compliance violations will likely grow as AI-generated code can fail to meet stringent regulatory standards. With the advent of the EU AI Act and stricter U.S. cybersecurity frameworks, regulators now require organizations to demonstrate robust controls over AI-generated software. Noncompliance can mean monetary penalties, restricted market access and lasting reputational damage that can be very difficult to overcome. For enterprise leaders, the message is clear: Unchecked AI-generated code introduces systemic vulnerabilities that threaten financial performance and long-term resilience, which are crucial for any organization to thrive in today's digital economy. What Business Leaders Must Do To Prevent This Nightmare? Business leaders face a crossroads as AI-enabled "vibe coding" reshapes software development. The convenience and speed are undeniable, as are the hidden cybersecurity risks. To protect your organization, take these proactive steps: • Deploy automated security scanning tools to catch vulnerabilities in real time. • Mandate human code reviews for all AI-generated outputs. • Schedule regular, independent security audits to detect hidden threats. • Embed security checks throughout the software development life cycle. • Educate all teams about the risks of AI-driven code to build a security-first culture. • Closely monitor AI tool usage; treat every new code as a potential risk. • Establish clear policies for AI code adoption and escalation protocols. These steps must be continuous, not just periodic, to keep pace with evolving threats. As AI redefines what's possible, those prioritizing security will not only mitigate risk but also unlock new growth opportunities. Companies that thrive will treat cybersecurity as a catalyst for innovation, embedding trust and resilience into every digital initiative. The choice is clear: Lead the charge in securing the AI-driven era, or risk being left vulnerable. Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

How DevSecOps is powering the vibecoding movement in Indian tech

Time of India

07-07-2025

Time of India

How DevSecOps is powering the vibecoding movement in Indian tech

India 's tech scene is shifting in ways few could have imagined just a few years ago. One of the buzzwords driving this change – vibecoding - isn't just hype. It marks a deeper shift in how software gets built: less about rigid syntax, more about expressing ideas and letting AI do the heavy lifting. Developers, both seasoned and new, are working alongside tools like GitHub Copilot to move from thought to code in a matter of seconds. But with this evolution comes a big caveat: security. When code is being spun out so quickly, old security models that came in at the end of the process just don't cut it anymore. That's where DevSecOps finds its footing. It's not some optional extra; it's becoming the scaffolding that holds this entire AI-driven building process together. So, what exactly is vibecoding? The term itself comes from Andrej Karpathy , a well-known voice in AI. Vibecoding describes a new style of software development - one where you tell the machine what you want, and it gets you there without needing to type every semicolon yourself. Think of it as coding by intention. Tools like Copilot or Codex make this possible by turning natural language into functioning code. In India, where there's no shortage of tech talent and curiosity, vibecoding is taking off. It opens the door for people who may not have deep coding backgrounds, enabling designers, analysts, even hobbyists to contribute in meaningful ways. That kind of inclusivity is reshaping who gets to build, and how fast they can go from idea to execution, resulting in faster prototyping and quicker time-to-market. Security in the age of AI-generated code While vibecoding offers tremendous benefits, it also introduces new security concerns. AI-generated code may carry vulnerabilities inherited from training data or produce insecure implementations if user prompts are vague. The high volume and speed of generated code make traditional, reactive security methods insufficient. Security now needs to be embedded from the very beginning and not as an afterthought. That's where DevSecOps becomes crucial, integrating security into every stage of development to keep pace with this evolving paradigm. DevSecOps: Fueling secure vibecoding at scale DevSecOps refers to the integration of security into every phase of the software development lifecycle (SDLC) ranging from design and development to testing and deployment. As vibecoding accelerates development timelines, DevSecOps ensures that this speed doesn't come at the cost of safety. Here's how: 1. Embedding security early ('Shift left and everywhere') The core principle of DevSecOps is 'shifting left,' meaning embedding security checks as early as possible. When AI is generating large volumes of code, tools like static application security testing (SAST) and software composition analysis (SCA) can run seamlessly within development environments. They offer immediate feedback on vulnerabilities and third-party risks, allowing developers to address issues upfront rather than post-release. Additionally, DevSecOps promotes 'everywhere' security, extending protection into runtime environments and incident response systems to ensure ongoing vigilance. 2. Automation that keeps up with the pace Vibecoding doesn't just move fast, it outpaces the old development cycle by a mile. DevSecOps helps keep that momentum going by automating security tasks within CI/CD pipelines. Instead of pausing to run manual checks, teams rely on built-in scans, simulated attacks, and automated policy validation. These tools flag issues early, often before a human even reviews the code. It's a way to stay secure without stepping off the gas. 3. Building a security-conscious mindset More than just tools, DevSecOps is about mindset. In a vibecoding world, where even non-traditional developers are shaping software, everyone needs at least a basic sense of what secure coding looks like. That might mean understanding how vague prompts can lead to risky output or simply knowing what red flags to watch for. The goal isn't perfection; its awareness baked into the creative process. 4. Making compliance less of a bottleneck For Indian companies working across borders, compliance is a constant challenge. DevSecOps helps by embedding legal and regulatory checks into the development flow itself. Whether it's GDPR, local data laws, or sector-specific policies, these guardrails run quietly in the background, cutting risk and proving that trust isn't optional, it's foundational. 5. Tearing down the silos DevSecOps isn't just about tools, it's about people working together. In fast-paced, AI-driven development, silos between developers, security experts, and ops teams can slow things down or let issues slip through the cracks. DevSecOps helps break those walls. Security teams can offer real-time guidance on how to handle AI-generated code safely, while ops folks ensure that what gets shipped is secure and stable. When everyone's in the loop, the whole system runs smoother and safer. Tackling roadblocks and peering into the future Adopting DevSecOps across India's tech ecosystem isn't exactly a walk in the park. The journey is riddled with real-world challenges: a talent gap that keeps widening, organizational pushback rooted in traditional hierarchies, and the technical headache of embedding security into sprawling, often outdated codebases. Yet, things are shifting. Platforms are beginning to offer more holistic training by blending DevSecOps with generative AI and that fusion is slowly demystifying both fields for the next generation of developers. That said, the future isn't just about patching up today's gaps. As AI evolves, DevSecOps is expected to become far more dynamic with self-repairing security systems that don't just react but also anticipate. In the era of vibecoding, where software is shaped as much by instinct and creative drive as by logic, these adaptive systems could become the linchpin. They won't just protect code. They'll empower creators to build boldly without fear of leaving vulnerabilities in their wake. Final thoughts Vibecoding is a new lens through which we're reimagining software development: quicker, sharper, and inclusive. But speed and creativity mean little without security. This is where DevSecOps earns its stripes as a core philosophy baked into the build process. As India rides the crest of its AI revolution, DevSecOps will remain the silent force ensuring that this wave of progress doesn't crash under its own weight. For every line of code written in rhythm and flow, there must be an underlying cadence of security. And in that balance lies the true promise of the future.

Technobabble: We need a whole new vocabulary to keep up with the evolution of AI

Mint

06-07-2025

Business
Mint

Technobabble: We need a whole new vocabulary to keep up with the evolution of AI

The artificial intelligence (AI) news flow does not stop, and it's becoming increasingly obscure and pompous. China's MiniMax just spiked efficiency and context length, but we are not gasping. Elon Musk says Grok will 'redefine human knowledge," but is that a new algorithm or just hot air? Andrej Karpathy's 'Software 3.0" sounds clever but lacks real-world bite. Mira Murati bet $2 billion on 'custom models," a term so vague it could mean anything. And only by testing Kimi AI's 'Researcher" did we get why it's slick and different. Technology now sprints past our words. As machines get smarter, our language lags. Buzzwords, recycled slogans and podcast quips fill the air but clarify nothing. This isn't just messy, it's dangerous. Investors chase vague terms, policymakers regulate without definitions and the public confuses breakthroughs with sci-fi. Also Read: An AI gadget mightier than the sword? We're in a tech revolution with a vocabulary stuck in the dial-up days. We face a generational shift in technology without a stable vocabulary to navigate it. This language gap is not a side issue. It is a core challenge that requires a new discipline: a fierce scepticism of hype and a deep commitment to the details. The instinct to simplify is a trap. Once, a few minutes was enough to explain breakthrough apps like Google or Uber. Now, innovations in robotics or custom silicon resist such compression. Understanding OpenAI's strategy or Nvidia's product stack requires time, not sound-bites. We must treat superficial simplicity as a warning sign. Hot areas like AI 'agents' or 'reasoning layers' lack shared standards or benchmarks. Everyone wants to sell a 'reasoning model,' but no one agrees on what that means or how to measure it. Most corporate announcements are too polished to interrogate and their press releases are not proof of defensible innovation. Extraordinary claims need demos, user numbers and real-world metrics. When the answers are fuzzy, the claim is unproven. In today's landscape, scepticism is not cynicism. It is discipline. This means we must get comfortable with complexity. Rather than glossing over acronyms, we must dig in. Modern tech is layered with convenient abstractions that make understanding easier, but often too easy. A robo-taxi marketed as 'full self-driving' or a model labelled 'serverless' demands that we look beneath the surface. Also Read: Productivity puzzle: Solow's paradox has come to haunt AI adoption We don't need to reinvent every wheel, but a good slogan should never be an excuse for missing what is critical. The only way to understand some tools is to use them. A new AI research assistant, for instance, only feels distinct after you use it, not when you read a review of what it can or cannot accomplish. In this environment, looking to the past or gazing towards the distant future is a fool's errand. History proves everything and nothing. You can cherry-pick the dot-com bust or the advent of electricity to support any view. It's better to study what just happened than to force-fit it into a chart of inevitability. The experience of the past two years has shattered most comfortable assumptions about AI, compute and software design. The infographics about AI diffusion or compute intensity that go viral on the internet often come from people who study history more than they study the present. It's easier to quote a business guru than to parse a new AI framework, but we must do the hard thing: analyse present developments with an open mind even when the vocabulary doesn't yet exist. Also Read: Colleagues or overlords? The debate over AI bots has been raging but needn't The new 'Nostradami' of artificial intelligence: This brings us to the new cottage industry of AI soothsaying. Over the past two years, a fresh crop of 'laws' has strutted across conference stages and op-eds, each presented as the long-awaited Rosetta Stone of AI. We're told to obey Scaling Law (just add more data), respect Chinchilla Law (actually, add exactly 20 times more tokens) and reflect on the reanimated Solow Paradox (productivity still yawns, therefore chatbots are overrated). When forecasts miss the mark, pundits invoke Goodhart's Law (metrics have stopped mattering) or Amara's Law (overhype now, under-hype later). The Bitter Lesson tells us to buy GPUs (graphic processing units), not PhDs. Cunningham's Law says wrong answers attract better ones. Our favourite was when the Victorian-era Jevons' Paradox was invoked to argue that a recent breakthrough wouldn't collapse GPU demand. We're not immune to this temptation and have our own Super-Moore Law; it has yet to go viral. Also Read: AI as infrastructure: India must develop the right tech These laws and catchphrases obscure more than they reveal. The 'AI' of today bears little resemblance to what the phrase meant in the 1950s or even late 2022. The term 'transformer," the architecture that kicked off the modern AI boom, is a prime example. Its original 2017 equation exists now only in outline. The working internals of today's models—with flash attention, rotary embeddings and mixture-of-experts gating—have reshaped the original methods so thoroughly that the resulting equations resemble the original less than general relativity resembles Newton's laws. This linguistic mismatch will only worsen as robotics grafts cognition onto actuators and genomics borrows AI architecture for DNA editing. Our vocabulary, built for a slower era, struggles to keep up. Also Read: Rahul Matthan: AI models aren't copycats but learners just like us Beneath the noise, a paradox remains: staying genuinely current is both exceedingly difficult and easier than ever. It's difficult because terminology changes weekly and breakthroughs appear on preprint servers, not in peer-reviewed journals. However, it's easier because we now have AI tools that can process vast amounts of information, summarize dense research and identify core insights with remarkable precision. Used well, these technologies can become the most effective way to understand technology itself. And that's how sensible investment in innovation begins: with a genuine grasp of what's being invested in. The author is a Singapore-based innovation investor for GenInnov Pte Ltd