Optimizing AI apps in a million-token world

Fast Company04-06-2025

The context size problem in large language models is nearly solved.
In recent months, models like GPT-4.1, LLaMA 4, and DeepSeek V3 have reached context windows ranging from hundreds of thousands to millions of tokens. We're entering a phase where entire documents, threads, and histories can fit into a single prompt. It marks real progress—but it also brings new questions about how we structure, pass, and prioritize information.
WHAT IS CONTEXT SIZE (AND WHY WAS IT A CHALLENGE)?
Context size defines how much text a model can process in one go, and is measured in tokens, which are small chunks of text, like words or parts of words. It shaped the way we worked with LLMs: splitting documents, engineering recursive prompts, summarizing inputs—anything to avoid truncation.
Now, models like LLaMA 4 Scout can handle up to 10 million tokens, and DeepSeek V3 and GPT-4.1 go beyond 100K and 1M respectively. With those capabilities, many of those older workarounds can be rethought or even removed.
FROM BOTTLENECK TO CAPABILITY
This progress unlocks new interaction patterns. We're seeing applications that can reason and navigate across entire contracts, full Slack threads, or complex research papers. These use cases were out of reach not long ago. However, just because models can read more does not mean they automatically make better use of that data.
The paper ' Why Does the Effective Context Length of LLMs Fall Short? ' examines this gap. It shows that LLMs often attend to only part of the input, especially the more recent or emphasized sections, even when the prompt is long. Another study, ' Explaining Context Length Scaling and Bounds for Language Models,' explores why increasing the window size does not always lead to better reasoning. Both pieces suggest that the problem has shifted from managing how much context a model can take to guiding how it uses that context effectively.
Think of it this way: Just because you can read every book ever written about World War I doesn't mean you truly understand it. You might scan thousands of pages, but still fail to retain the key facts, connect the events, or explain the causes and consequences with clarity.
What we pass to the model, how we organize it, and how we guide its attention are now central to performance. These are the new levers of optimization.
CONTEXT WINDOW ≠ TRAINING TOKENS
A model's ability to accept a large context does not guarantee that it has been trained to handle it well. Some models were exposed only to shorter sequences during training. That means even if they accept 1M tokens, they may not make meaningful use of all that input.
This gap affects reliability. A model might slow down, hallucinate, or misinterpret input if overwhelmed with too much or poorly organized data. Developers need to verify if the model was fine tuned for long contexts, or simply adapted to accept them.
WHAT CHANGES FOR ENGINEERS
With these new capabilities, developers can move past earlier limitations. Manual chunking, token trimming, and aggressive summarization become less critical. But this does not remove the need for data prioritization.
Prompt compression, token pruning, and retrieval pipelines remain relevant. Techniques like prompt caching help reuse portions of prompts to save costs. Mixture-of-experts (MoE) models, like those used in LLaMA 4 and DeepSeek V3, optimize compute by activating only relevant components.
Engineers also need to track what parts of a prompt the model actually uses. Output quality alone does not guarantee effective context usage. Monitoring token relevance, attention distribution, and consistency over long prompts are new challenges that go beyond latency and throughput.
IT IS ALSO A PRODUCT AND UX ISSUE
For end users, the shift to larger contexts introduces more freedom—and more ways to misuse the system. Many users drop long threads, reports, or chat logs into a prompt and expect perfect answers. They often do not realize that more data can sometimes cloud the model's reasoning.
Product design must help users focus. Interfaces should clarify what is helpful to include and what is not. This might mean offering previews of token usage, suggestions to refine inputs, or warnings when the prompt is too broad. Prompt design is no longer just a backend task, but rather part of the user journey.
THE ROAD AHEAD: STRUCTURE OVER SIZE
Larger context windows open important doors. We can now build systems that follow extended narratives, compare multiple documents, or process timelines that were previously out of reach.
But clarity still matters more than capacity. Models need structure to interpret, not just volume to consume. This changes how we design systems, how we shape user input, and how we evaluate performance.
The goal is not to give the model everything. It is to give it the right things, in the right order, with the right signals. That is the foundation of the next phase of progress in AI systems.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

A roundup of the best ChatGPT apps and how they stack up for work vs. personal use

Time Business News

12 hours ago

Time Business News

A roundup of the best ChatGPT apps and how they stack up for work vs. personal use

The widespread adoption of AI-driven tools has brought ChatGPT apps into the daily workflows of professionals and casual users alike. Whether you're writing reports, automating emails, managing your calendar, or just asking for movie recommendations, ChatGPT apps have become powerful companions. But not all ChatGPT apps are built the same—and depending on whether you need an AI assistant for work or personal use, your ideal app may vary. Options for the best ChatGPT app tend to fall into two main categories: official OpenAI apps and third-party platforms that build on OpenAI's technology. The official OpenAI ChatGPT app (available on desktop and mobile) leads in reliability, feature updates, and model access—including the powerful GPT-4o model, which blends text, vision, and voice capabilities. It's perfect for users who want a no-frills, high-performance AI for drafting emails, generating reports, coding, and even handling customer support tasks. Other leading apps include Poe by Quora, which supports multiple AI models like Claude and Gemini alongside GPT-4. Poe is ideal for users who want variety and comparison. Meanwhile, apps like Chatbot for Google Sheets or Notion AI bring ChatGPT functionality directly into tools many teams already use. These integrations are work-focused, streamlining data analysis and content generation inside productivity suites. They're especially valuable for marketing teams, sales operations, and analysts. For personal use, options like Replika or offer more entertaining and emotionally engaging experiences. These apps allow users to interact with AI personalities in a conversational, human-like way—perfect for companionship, storytelling, or casual brainstorming. While these aren't ideal for formal work tasks, they do excel at simulating natural dialogue and helping users decompress or get creative in their free time. How they stack up for work vs. personal use depends largely on context and expectations. For example, the OpenAI ChatGPT app excels in work settings due to its clean interface, advanced features like file uploads and code interpretation, and access to plugins or custom GPTs tailored to business functions. It's also highly secure, a non-negotiable for enterprise users. Poe, on the other hand, bridges the gap—it can be effective for work if you're comparing model outputs or trying different tones and voices for content. However, its lack of deep integrations into enterprise tools may limit its utility for some users. Notion AI and ChatGPT browser extensions are more specialized. Notion's integration is excellent for internal documentation and collaborative editing, but less useful outside the Notion ecosystem. ChatGPT Chrome extensions are flexible and lightweight, offering AI assistance across web pages, emails, or even LinkedIn messaging, making them solid choices for multitaskers who jump between work and personal tabs throughout the day. When evaluating for personal use, entertainment-focused apps like and Replika shine due to their personalization and immersive experience. However, these apps are not built with productivity in mind and typically don't offer export options, formatting tools, or task-specific enhancements. In conclusion, the best ChatGPT app for you hinges on how you plan to use it. If your priority is work efficiency and advanced AI features, the official ChatGPT app or enterprise integrations like Notion AI are ideal. For creative exploration or social-style interactions, and Replika may better suit your needs. Hybrid users—those toggling between productivity and play—might find Poe to be the most versatile option. TIME BUSINESS NEWS

Meta Platforms (META) Considers Using Rival AI Models Instead of Llama

Business Insider

3 days ago

Business Insider

Meta Platforms (META) Considers Using Rival AI Models Instead of Llama

Tech giant Meta Platforms (META) is reportedly thinking about pulling back from its own Llama language models in favor of models from rivals like OpenAI and Anthropic, according to The New York Times. This would be a big shift in strategy, since Meta's Llama models are open-source, meaning others can use and modify them, while OpenAI and Anthropic use 'closed' models that aren't open to the public. So far, no final decision has been made. Confident Investing Starts Here: Easily unpack a company's performance with TipRanks' new KPI Data for smart investment decisions Receive undervalued, market resilient stocks right to your inbox with TipRanks' Smart Value Newsletter This possible change comes after the rocky launch of Meta's Llama 4 models in April. Indeed, Meta introduced the models during its LlamaCon event, but they didn't perform as well as competing AI models, and many developers left the event unimpressed. Since then, Meta has been spending heavily to catch up. Earlier this month, it announced a $14.3 billion investment to buy nearly half of Scale AI. As part of that deal, Scale's CEO Alexandr Wang will join Meta to lead its 'superintelligence' team, which is focused on creating artificial general intelligence. Meta is also making bold moves to attract top AI talent. Reports say that the company is offering massive signing bonuses—up to $100 million—to hire researchers from OpenAI. On top of that, Meta is in talks to buy PlayAI, which is a startup that creates realistic AI voices. The company had even looked into acquiring other AI firms like Perplexity AI, Runway AI, and FuriosaAI before choosing to invest in Scale. Altogether, these efforts show how serious Meta is about becoming a leader in the AI space under CEO Mark Zuckerberg. Is Meta a Buy, Sell, or Hold? Turning to Wall Street, analysts have a Strong Buy consensus rating on META stock based on 42 Buys, three Holds, and one Sell assigned in the past three months, as indicated by the graphic below. Furthermore, the average META price target of $716.48 per share implies that shares are near fair value.

The biggest AI companies you should know

Yahoo

4 days ago

Yahoo

The biggest AI companies you should know

AI continues to be the hottest trend in tech, and it doesn't appear to be going away anytime soon. Microsoft (MSFT), Google (GOOG, GOOGL), Meta (META), and Amazon (AMZN) continue to debut new AI-powered software capabilities while leaders from other AI firms split off to form their own startups. But the furious pace of change also makes it difficult to keep track of the various players in the AI space. With that in mind, we're breaking down what you need to know about the biggest names in AI and what they do. From OpenAI ( to Perplexity ( these are the AI companies you should be following. Microsoft-backed OpenAI helped put generative AI technology on the map. The company's ChatGPT bot, released in late 2022, quickly became one of the most downloaded apps in the world. Since then, the company has launched its own search engine, 4o image generator, a video generator, and a file uploader that allows you to ask the bot to summarize the content of your documents, as well as access to specialized first- and third-party GPT bots. Microsoft uses OpenAI's various large language models (LLM) in its Copilot and other services. Apple (AAPL) also offers access to ChatGPT as part of its Apple Intelligence and Visual Intelligence services. But there's drama behind the scenes. OpenAI is working to restructure its business into a public benefit corporation overseen by its nonprofit arm, which will allow it to raise more capital. To do that, it needs Microsoft's sign-off, but the two sides are at loggerheads over the details of the plan and what it means for each company. In the meantime, both OpenAI and Microsoft are reportedly working on products that will compete with each other's existing offerings. Microsoft offers its own AI models, and OpenAI is developing a productivity service, according to The Information. Still, the pairing has been lucrative for both tech firms. During its most recent quarterly earnings call, Microsoft said AI revenue was above expectations and contributed 16 percentage points of growth for the company's Azure cloud business. OpenAI, meanwhile, saw its annualized revenue run rate balloon to $10 billion as of June, according to Reuters. That's up from $5.5 billion in Dec. 2024. OpenAI offers a limited free version of its ChatGPT bot, as well as ChatGPT Plus, which costs $20 per month, and enterprise versions of the app. Google's Gemini offers search functionality using the company's Gemini 2.5 family of AI models. You can choose between using Gemini Flash for quick searches or Gemini Pro, which is meant for deep research and coding. Gemini doesn't just power Google's Gemini app. It's pervasive across Google's litany of services. Checking your email or prepping an outline in Docs, Gemini is there. Get an AI Overviews result when using standard Google Search? That's Gemini too. Google Maps? That also takes advantage of Gemini. Chrome, YouTube, Google Flights, Google Hotels — you name it, it's using Gemini. But Google's Gemini, previously known as Bard, got off to a rough start. When Google debuted its Gemini-powered AI Overviews in May 2024, it began offering up wild statements like recommending users put glue on their pizza to help make the cheese stick. But during its I/O developer conference in May, Google showed off a number of impressive new developments for Gemini, including its updated video-generation software Veo 3 and Gemini running on prototype smart glasses. A limited version of Gemini is available to use for free. A paid tier that costs $19.99 per month gives you access to advanced AI models and integration with Google's productivity suite. A $249 subscription lets you use Google's most advanced Gemini models and 30TB of storage via Google Drive, Photos, and Gmail. Mark Zuckerberg's Meta has gone through a number of transformations over the years, from desktops to mobile to short-form video to an ill-advised detour into the metaverse. Now the company is leaning heavily into AI with the goal of dominating the space so it doesn't have to rely on technologies from rivals like Apple and Google, like it did during the smartphone wars. It helps that Meta has a massive $70 billion in cash and marketable securities on hand that it can deploy at a moment's notice and data from billions of users to train its models. Unlike most competitors, Meta is offering its Llama family of AI models as open-weights software, which means companies and researchers can adjust the models as they see fit, though they don't get access to the original training data. More people developing apps and tools that use Llama means Meta effectively gets to see how its software can evolve without having to do extra work. But Llama 4 Behemoth, the company's massive LLM, has been delayed by months, according to the Wall Street Journal. To seemingly offset similar delays moving forward, Meta is scooping up AI talent left and right. The company invested $14.3 billion in Scale AI and hired its CEO, Alexandr Wang. Meta also grabbed Safe Superintelligence CEO Daniel Gross and former GitHub CEO Nat Friedman. Meta's AI, like Google's, runs across its various platforms, including Facebook, Instagram, and WhatsApp, as well as its smart glasses. Founded in 2021 by siblings and ex-OpenAI researchers Dario and Daniela Amodei, Anthropic ( is an AI company focused on safety and trust. The duo split off from OpenAI over disagreements related to AI safety and the company's general direction. Like OpenAI, Anthropic has accumulated some deep-pocketed backers, including Amazon and Google, which have already poured billions into the company. The company's Claude models are available across various cloud services. Its Anthropic chat interface offers a host of capabilities, including web searches, coding, as well as writing and drafting documents. Anthropic also allows users to build what it calls artifacts, which are documents, games, lists, and other bite-sized pieces of content you can share online. In June, a federal judge sided with Anthropic in a case in which the company was accused of breaking copyright law by training its models on copyrighted books. But Anthropic allegedly downloaded pirated versions of some books and will now face trial over the charge. Elon Musk's xAI, a separate company from X Corp, which owns X (formerly Twitter), offers its own Grok chatbot and Grok AI models. Users can access Grok through a website, app, and X. Like other AI services, it allows users to search for information via the web, generate text and images, and write code. The company trains Grok on its Colossus supercomputer, which xAI said will eventually include 1 million GPUs. According to Musk, Grok was meant to have an edgy flair, though like other chatbots, it has been caught spreading misinformation. Musk previously co-founded OpenAI with Sam Altman but left the company after disagreements over its future and leadership positions. In 2024, Musk filed a lawsuit against OpenAI and Sam Altman over the AI company's effort to restructure itself as a for-profit organization. Musk says OpenAI has abandoned its original mission statement to build AI to benefit humanity and instead is working to enrich itself and Microsoft. Perplexity takes a real-time web search approach to AI chatbots, serving as a true threat to the likes of Google and its own search engine. Headed by CEO Aravind Srinivas, who previously worked as a research scientist at OpenAI, Perplexity allows users to choose from a number of different AI models, including OpenAI's GPT-4.1, Anthropic's Claude 4.0 Sonnet, Google's Gemini 2.5 Pro, xAI's Grok 3, and the company's own Sonoar. Perplexity also provides users with Discover pages for topics like finance, sports, and more, with stories curated by both the Perplexity team and outside contractors. As with other AI companies, Perplexity has been criticized by media organizations for allegedly using their content without permission. Dow Jones is suing the company over the practice. Email Daniel Howley at dhowley@ Follow him on X/Twitter at @DanielHowley. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data