Latest news with #AmericanInvitationalMathematicsExamination

Grok 4 vs Grok 3: What makes Elon Musk's newest AI model the "world's most powerful AI'

Time of India

6 days ago

Business
Time of India

Grok 4 vs Grok 3: What makes Elon Musk's newest AI model the "world's most powerful AI'

Elon Musk 's xAI has released the Grok 4, just five months after Grok 3's debut earlier this year. The latest model promises a quantum leap in performance, achieving perfect scores on math competitions while commanding a premium $300 monthly subscription. While Grok 3 established the foundation with strong reasoning capabilities and mainstream accessibility, Grok 4 now positions itself as the "world's most powerful AI model," marking xAI's rapid ascent in advanced AI territory. Here's a comparison between xAI's Grok 4 and Grok 3. Grok 4 vs Grok 3: The performance comparison Grok 4 dominates academic benchmarks with remarkable precision. On the American Invitational Mathematics Examination (AIME), Grok 4 achieved a perfect 100% score compared to Grok 3's 52.2%. The Graduate-level Physics Question Answering (GPQA) test shows Grok 4 scoring 87% against Grok 3's 75.4%. Most impressively, Grok 4 scored 25.4% on Humanity's Last Exam without tools, outperforming Google's Gemini 2.5 Pro (21.6%) and OpenAI's o3 (21%). With tools enabled, Grok 4 Heavy variant reaches 44.4%, nearly double Gemini's 26.9%. by Taboola by Taboola Sponsored Links Sponsored Links Promoted Links Promoted Links You May Like You Can Make Massive Side Income By Learning Order Flow Analysis TradeWise Learn More Undo The ARC-AGI-2 benchmark, testing visual pattern recognition, shows Grok 4 achieving 16.2%, twice the performance of the next-best commercial model, Claude Opus 4. On coding benchmarks, Grok 4 handles 256,000 tokens compared to Grok 3's 131,072 tokens, enabling processing of significantly larger codebases. Grok 3 utilized 200,000 GPUs with 10x more compute than Grok 2. Grok 4's training details remain undisclosed, but performance improvements suggest even greater computational resources. Grok 4 vs Grok 3: Upgraded technical capabilities Grok 4 represents a fundamental shift in AI design. Unlike Grok 3, which offered both reasoning and non-reasoning modes, Grok 4 operates exclusively as a reasoning model. This architectural change eliminates quick responses in favor of deeper, more accurate problem-solving. The context window expansion from 131,072 tokens (Grok 3) to 256,000 tokens (Grok 4) enables processing documents twice as large. Grok 4 integrates real-time data from X, Tesla, and SpaceX platforms, providing current information that Grok 3 lacked. Multimodal capabilities distinguish the models significantly. Grok 4 supports text and vision modalities with image generation coming soon, while Grok 3 focused primarily on text-based interactions. xAI plans specialized variants including Grok 4 Code (August 2025) and video generation models (October 2025). Grok 4 vs Grok 3: Pricing and availability The cost difference reflects capability gaps. Grok 3 maintains $3 per million input tokens and $15 per million output tokens through xAI's API. Grok 4 uses identical API pricing but introduces SuperGrok Heavy subscription at $300 monthly, the highest among major AI providers. This premium positioning targets enterprise users and researchers requiring cutting-edge performance. OpenAI, Google, and Anthropic offer similar ultra-premium tiers, but none match xAI's $300 monthly price point. Both models integrate into X's social platform, but Grok 4's launch follows controversy around Grok 3's generation of antisemitic content and misinformation. xAI addressed these issues by removing "politically incorrect" guidance from system prompts and implementing stricter safeguards. AI Masterclass for Students. Upskill Young Ones Today!– Join Now

DeepSeek's upgraded foundational model excels in coding and maths

South China Morning Post

25-03-2025

Business
South China Morning Post

DeepSeek's upgraded foundational model excels in coding and maths

Chinese artificial intelligence (AI) star DeepSeek has upgraded its open-source V3 large language model by adding parameters and improving capabilities in coding and solving mathematical problems. Advertisement The DeepSeek-V3-0324, named after its predecessor and the launch date, has 'enhanced reasoning capabilities, optimised front-end web development and upgraded Chinese writing proficiency', according to a notice on the company's website. The new version and DeepSeek V3 are both foundation models trained on vast data sets that can be applied in different use cases, including that of a chatbot. DeepSeek R1, the reasoning model, is based on DeepSeek V3. The updated foundation model has made improvements in several benchmarks, especially the American Invitational Mathematics Examination (AIME), where it scored 59.4 compared with 39.6 for its predecessor, while achieving an increase of 10 points on LiveCodeBench to achieve 49.2, DeepSeek data showed. This illustration photograph taken on January 29, 2025 shows screens displaying the logos of DeepSeek and OpenAI's AI chatbot ChatGPT. Photo: AFP Compared with DeepSeek V3, which has 671 billion parameters and adopts the company's own commercial license, the new 685-billion-parameter model uses the MIT software licence that is the most popular on developer platform GitHub. Advertisement Launched on AI community Hugging Face as well as the company's own website, DeepSeek-V3-0324 is now the top trending model on Hugging Face, receiving positive comments on its performance.

ByteDance advances DeepSeek work in AI reasoning with open-source project led by intern

South China Morning Post

21-03-2025

Business
South China Morning Post

ByteDance advances DeepSeek work in AI reasoning with open-source project led by intern

TikTok owner ByteDance, which has invested heavily in artificial intelligence (AI), has unveiled a new system that claims to improve on the work done by DeepSeek in training AI reasoning models. Advertisement DAPO, or Decoupled Clip and Dynamic Sampling Policy Optimisation, is a scalable reinforcement learning algorithm that helps a large language model (LLM) achieve better complex reasoning behaviour such as self-verification and iterative refinement, according to a research paper published earlier this week by ByteDance and Tsinghua University's Institute for AI Industry Research. The algorithm outperformed the reinforcement learning approach in DeepSeek's R1 reasoning model, scoring 50 points in the American Invitational Mathematics Examination (AIME) 2024 using Alibaba Group Holding's Qwen2.5-32B base model, compared with 47 points attained by R1 when applying the same Alibaba model, the paper showed. Alibaba owns the South China Morning Post. Notably, DAPO achieved the better result with 50 per cent fewer training steps. TikTok owner ByteDance has invested heavily in artificial intelligence. Photo: Digitimes The achievement drew positive academic and industry comments. Google DeepMind engineer Philipp Schmid, who shared the project on X, said the new method was 'better than' DeepSeek's 'group relative policy optimisation (GRPO)' in reinforcement learning. GRPO is one of DeepSeek's training methods that enables a model to learn by comparing different actions and making updates with a 'group' of observations.

Latest news with #AmericanInvitationalMathematicsExamination

Grok 4 vs Grok 3: What makes Elon Musk's newest AI model the "world's most powerful AI'

DeepSeek's upgraded foundational model excels in coding and maths

ByteDance advances DeepSeek work in AI reasoning with open-source project led by intern

Get Started Now: Download the App