AI Fire
Posts
🧱 AI Hit a Wall

🧱 AI Hit a Wall

Now What?

Wendy
December 19, 2025

In partnership with

Free AFIRE Guide | AI Academy | Advertise | AI Mastery A-Z

Plus: DeepSeek-V3.2 vs GPT-5 vs Gemini 3: Our Hands-On Test on Real Coding & Reasoning Tasks

If bigger models aren’t enough… what actually gets us to AGI? Why are agents still smart and dumb at the same time? And why is stability suddenly the real bottleneck in AI coding?

What's on FIRE 🔥

🧠 Demis Hassabis: AGI Needs 50% Scaling + 50% Innovation
📚 AI Sources From AI Fire
⭐ Today's AI Highlights
🥇 Top AI Papers of the Week
🏅 AI Tools
📊 AI Chart

IN PARTNERSHIP WITH THESYS

Turn your n8n workflows into intelligent AI apps in minutes.

C1 by Thesys turns any n8n workflow into a smart, adaptive AI app - with interactive UIs instead of walls of text.

From chatbots to AI agents for research, analytics or automation, no coding and no changes to your workflow logic.

Thesys is the UI your n8n workflows have been missing.

AI INSIGHTS

🧠 Demis Hassabis: AGI Needs 50% Scaling + 50% Innovation

demis-hassabis-agi-needs-scaling-and-innovation

On the Google DeepMind Podcast, Demis Hassabis (CEO, co-founder of Google DeepMind) told Hannah Fry the key bet is simple: AGI won’t come from bigger models alone. You need scale + new research, 50/50.

Key takeaways:

Agents are the shift: AI is moving from “chat” to systems that plan and act.
Jagged intelligence is the gap: models can solve elite problems, then fail easy logic. Not consistent yet.
Confidence scores are missing: AI should say “i’m unsure” instead of hallucinating.
World models matter: projects like Genie and SIMMA train agents in simulated worlds, with robotics as the long-term goal.
Science is still the big unlock: after AlphaFold, DeepMind is pushing materials, fusion (with Commonwealth Fusion), and quantum error correction.

Why it matters: The next wave isn’t smarter chat. It’s autonomous agents - so reliability has to catch up fast.

PRESENTED BY HUBSPOT

The Future of AI in Marketing. Your Shortcut to Smarter, Faster Marketing.

Unlock a focused set of AI strategies built to streamline your work and maximize impact. This guide delivers the practical tactics and tools marketers need to start seeing results right away:

7 high-impact AI strategies to accelerate your marketing performance
Practical use cases for content creation, lead gen, and personalization
Expert insights into how top marketers are using AI today
A framework to evaluate and implement AI tools efficiently

Stay ahead of the curve with these top strategies AI helped develop for marketers, built for real-world results.

Download the Free Report

AI SOURCES FROM AI FIRE

1. DeepSeek-V3.2 vs GPT-5 vs Gemini 3: Our hands-on test on real coding & reasoning tasks. We found a workflow that outperforms others in the market

2. If we started a business in 2026, we'd ignore all the noise & do this instead. This business strategy proves that finding expensive problems to solve is the fastest route to six figures now

TODAY IN AI

AI HIGHLIGHTS

🧠 Google just dropped FunctionGemma, a tiny 270M on-device model built for function calling. It boosts accuracy from 58% → 85%, runs fully offline, and turns natural language into real actions on phones and edge devices.

🧩 Anthropic expanded Claude in Chrome to all paid tiers and added full Claude Code integration. You can build in your terminal, debug in the browser, and let Claude read console errors and DOM state directly.

📄 Mistral introduced OCR 3, a faster and cheaper OCR model for enterprise docs. It beats OCR 2 on forms, tables, handwriting, and scans, with pricing as low as $1 per 1,000 pages via batch API.

🛠️ Anthropic open-sourced Agent Skills, a universal standard for sharing agent capabilities. Skills now work across Claude, OpenAI Codex, Cursor, VS Code, GitHub, and more, making workflows portable.

📌 OpenAI rolled out Pinned Chats on web, iOS, and Android. You can now pin key conversations for instant access, which finally makes long-term ChatGPT workflows easier to manage.

💰 Big AI Fundraising: Galbot raised $300M at a $3B valuation, bringing total funding to $800M as interest in AI humanoid robots accelerates. The company is already deploying robots across manufacturing, logistics, retail, and healthcare, with partners like Toyota, Hyundai, and Bosch, and thousands of units on order.

TOP AI PAPERS OF THE WEEK

Next-Embedding Prediction Makes Strong Vision Learners
Michigan, NYU, and Princeton bring next-token prediction to vision. NEPA hits 83.8% ImageNet top-1 with a simpler self-supervised setup. (University of Michigan • NYU • Princeton)
Evaluating LLMs in Scientific Discovery
Toronto and Harvard introduce SDE, a benchmark showing LLMs know science - but still struggle with iterative research and discovery loops. (University of Toronto • Harvard)
Memory in the Age of AI Agents
NUS and Fudan map how agent memory really works, separating it from RAG and LLM memory, and laying out a clear framework for long-horizon agents. (NUS • Fudan University)
Universal Reasoning Model (URM)
Ubiquant’s URM boosts recurrent reasoning to set new SOTA on ARC-AGI and Sudoku, pushing progress on abstract reasoning benchmarks. (Ubiquant)
Kling-Omni Technical Report
Kuaishou’s Kling-Omni unifies video generation, editing, and reasoning - delivering cinematic-quality video with strong control across tasks. (Kuaishou Technology)

NEW EMPOWERED AI TOOLS

⚡ Nimbalyst is the local WYSIWYG editor & session manager where PMs and Devs iterate with Claude Code on the full context: markdown docs, diagrams, data models, mockups, and code.
🎨 Loki.Build designs and ships studio-grade landing pages with AI, using a live editor with SEO & hosting built in.
🔍 Userology AI is an AI user research agent that recruits users, runs sessions & delivers clear insights.
📊 Vurge brings AI web scraping to Google Sheets, pulling data from any site in seconds.

AI CHART

🧩 GPT-5.2-Codex Solves Long Coding Drift

OpenAI released GPT-5.2-Codex to fix a common issue in AI coding: losing track during long sessions.

The key change is context compaction. The model compresses past steps but keeps intent and state. It remembers the plan, not the noise.

This helps it:

Stay focused during long refactors
Debug across many steps without drifting
Handle multi-file changes better

Built for engineers using Codex in real projects, not one-off prompts.

Benchmarks:

SWE-Bench Pro: 56.4%
Terminal-Bench 2.0: 64.0%

Small gains. Much better stability.

We read your emails, comments, and poll replies daily

How would you rate today’s newsletter?

Your feedback helps us create the best newsletter possible

Hit reply and say Hello – we'd love to hear from you!
Like what you're reading? Forward it to friends, and they can sign up here.

Cheers,
The AI Fire Team

Reply

or to participate.