🧱 AI Hit a Wall

Now What?

In partnership with

ai-fire-banner

If bigger models aren’t enough… what actually gets us to AGI? Why are agents still smart and dumb at the same time? And why is stability suddenly the real bottleneck in AI coding?

IN PARTNERSHIP WITH THESYS

C1 by Thesys turns any n8n workflow into a smart, adaptive AI app - with interactive UIs instead of walls of text.

From chatbots to AI agents for research, analytics or automation, no coding and no changes to your workflow logic.

Thesys is the UI your n8n workflows have been missing.

AI INSIGHTS

demis-hassabis-agi-needs-scaling-and-innovation

On the Google DeepMind Podcast, Demis Hassabis (CEO, co-founder of Google DeepMind) told Hannah Fry the key bet is simple: AGI won’t come from bigger models alone. You need scale + new research, 50/50.

Key takeaways:

  • Agents are the shift: AI is moving from “chat” to systems that plan and act.

  • Jagged intelligence is the gap: models can solve elite problems, then fail easy logic. Not consistent yet.

  • Confidence scores are missing: AI should say “i’m unsure” instead of hallucinating.

  • World models matter: projects like Genie and SIMMA train agents in simulated worlds, with robotics as the long-term goal.

  • Science is still the big unlock: after AlphaFold, DeepMind is pushing materials, fusion (with Commonwealth Fusion), and quantum error correction.

Why it matters: The next wave isn’t smarter chat. It’s autonomous agents - so reliability has to catch up fast.

PRESENTED BY HUBSPOT

The Future of AI in Marketing. Your Shortcut to Smarter, Faster Marketing.

Unlock a focused set of AI strategies built to streamline your work and maximize impact. This guide delivers the practical tactics and tools marketers need to start seeing results right away:

  • 7 high-impact AI strategies to accelerate your marketing performance

  • Practical use cases for content creation, lead gen, and personalization

  • Expert insights into how top marketers are using AI today

  • A framework to evaluate and implement AI tools efficiently

Stay ahead of the curve with these top strategies AI helped develop for marketers, built for real-world results.

AI SOURCES FROM AI FIRE

1. DeepSeek-V3.2 vs GPT-5 vs Gemini 3: Our hands-on test on real coding & reasoning tasks. We found a workflow that outperforms others in the market

2. If we started a business in 2026, we'd ignore all the noise & do this instead. This business strategy proves that finding expensive problems to solve is the fastest route to six figures now

TODAY IN AI

AI HIGHLIGHTS

🧠 Google just dropped FunctionGemma, a tiny 270M on-device model built for function calling. It boosts accuracy from 58% → 85%, runs fully offline, and turns natural language into real actions on phones and edge devices.

🧩 Anthropic expanded Claude in Chrome to all paid tiers and added full Claude Code integration. You can build in your terminal, debug in the browser, and let Claude read console errors and DOM state directly.

📄 Mistral introduced OCR 3, a faster and cheaper OCR model for enterprise docs. It beats OCR 2 on forms, tables, handwriting, and scans, with pricing as low as $1 per 1,000 pages via batch API.

🛠️ Anthropic open-sourced Agent Skills, a universal standard for sharing agent capabilities. Skills now work across Claude, OpenAI Codex, Cursor, VS Code, GitHub, and more, making workflows portable.

📌 OpenAI rolled out Pinned Chats on web, iOS, and Android. You can now pin key conversations for instant access, which finally makes long-term ChatGPT workflows easier to manage.

💰 Big AI Fundraising: Galbot raised $300M at a $3B valuation, bringing total funding to $800M as interest in AI humanoid robots accelerates. The company is already deploying robots across manufacturing, logistics, retail, and healthcare, with partners like Toyota, Hyundai, and Bosch, and thousands of units on order.

TOP AI PAPERS OF THE WEEK

  1. Next-Embedding Prediction Makes Strong Vision Learners
    Michigan, NYU, and Princeton bring next-token prediction to vision. NEPA hits 83.8% ImageNet top-1 with a simpler self-supervised setup. (University of Michigan • NYU • Princeton)

  2. Evaluating LLMs in Scientific Discovery
    Toronto and Harvard introduce SDE, a benchmark showing LLMs know science - but still struggle with iterative research and discovery loops. (University of Toronto • Harvard)

  3. Memory in the Age of AI Agents
    NUS and Fudan map how agent memory really works, separating it from RAG and LLM memory, and laying out a clear framework for long-horizon agents. (NUS • Fudan University)

  4. Universal Reasoning Model (URM)
    Ubiquant’s URM boosts recurrent reasoning to set new SOTA on ARC-AGI and Sudoku, pushing progress on abstract reasoning benchmarks. (Ubiquant)

  5. Kling-Omni Technical Report
    Kuaishou’s Kling-Omni unifies video generation, editing, and reasoning - delivering cinematic-quality video with strong control across tasks. (Kuaishou Technology)

NEW EMPOWERED AI TOOLS

  1. Nimbalyst is the local WYSIWYG editor & session manager where PMs and Devs iterate with Claude Code on the full context: markdown docs, diagrams, data models, mockups, and code.

  2. 🎨 Loki.Build designs and ships studio-grade landing pages with AI, using a live editor with SEO & hosting built in.

  3. 🔍 Userology AI is an AI user research agent that recruits users, runs sessions & delivers clear insights.

  4. 📊 Vurge brings AI web scraping to Google Sheets, pulling data from any site in seconds.

AI CHART

gpt-5-2-codex-solves-long-coding

OpenAI released GPT-5.2-Codex to fix a common issue in AI coding: losing track during long sessions.

The key change is context compaction. The model compresses past steps but keeps intent and state. It remembers the plan, not the noise.

This helps it:

  • Stay focused during long refactors

  • Debug across many steps without drifting

  • Handle multi-file changes better

Built for engineers using Codex in real projects, not one-off prompts.

Benchmarks:

  • SWE-Bench Pro: 56.4%

  • Terminal-Bench 2.0: 64.0%

Small gains. Much better stability.

We read your emails, comments, and poll replies daily

How would you rate today’s newsletter?

Your feedback helps us create the best newsletter possible

Login or Subscribe to participate in polls.

Hit reply and say Hello – we'd love to hear from you!
Like what you're reading? Forward it to friends, and they can sign up here.

Cheers,
The AI Fire Team

Reply

or to participate.