• AI Fire
  • Posts
  • 🔬 AI Thinks Like Scientists

🔬 AI Thinks Like Scientists

And Ships Real Products

In partnership with

ai-fire-banner

Gemini hit research-grade reasoning. OpenAI merged 1,500 AI-written PRs. Meta boosted reasoning with tiny updates. What does this mean for your work?

IN PARTNERSHIP WITH HUBSPOT

Want to get the most out of ChatGPT?

ChatGPT is a superpower if you know how to use it correctly.

Discover how HubSpot's guide to AI can elevate both your productivity and creativity to get more things done.

Learn to automate tasks, enhance decision-making, and foster innovation with the power of AI.

AI INSIGHTS

gemini-3-deep-think-hits-research-grade-reasoning

Google just upgraded Gemini 3 Deep Think - its most advanced reasoning mode.

This isn’t about better chat. It’s built for hard science.

Early test:
A Rutgers mathematician used it to review a complex physics paper.
Deep Think found a logical flaw human peer review missed.

New highs:

• 48.4% on Humanity’s Last Exam (no tools)
• 84.6% on ARC-AGI-2
• 3455 Elo on Codeforces
• Gold-level at Math, Physics, Chemistry Olympiads

It can now:

• Model physical systems
• Analyze complex data
• Turn sketches into 3D-printable files

Why it matters: AI is shifting from chatbot → to research engine.

PRESENTED BY WISPR FLOW

Keep pace with your calendar

Dictate investor updates, board notes, and daily rundowns and get final-draft writing you can paste immediately. Wispr Flow preserves nuance and uses voice snippets for repeatable founder comms. Try Wispr Flow for founders.

AI SOURCES FROM AI FIRE

1. I fired my marketing team and hired 3 AI agents instead. Copy this exact system to automate content, ads, and sales without chaos

2. 7 hidden Gemini 3.0 hacks Google doesn’t show you. Turn it from a chatbot into a full work assistant in 10 minutes

TODAY IN AI

AI HIGHLIGHTS

🧠 MiniMax just launched M2.5, an open-source model matching GPT-5 & Opus 4.6 on coding - but far cheaper. It’s already powering most of MiniMax’s internal tasks. Full details here.

🎬 ByteDance launched Seedance 2.0, its SOTA video model with cinematic control, stable motion, and audio-video generation. Benchmarks are live, access is still limited. See it here.

⚠️ OpenAI is retiring GPT-4o, GPT-4.1, and o4-mini from ChatGPT today, despite user pushback. Most usage has shifted to GPT-5.2. Read the official announcement here.

📢 Ex-OpenAI researcher Zoë Hitzig quit after ChatGPT began testing ads, warning about manipulation risks from user data. Her full NYT op-ed is here.

🚀 Elon Musk says xAI’s recent exits were forced as part of a reorg for “speed of execution,” after multiple co-founders left. See Musk’s statement here.

💰 Big AI Fundraising: Anthropic raises $30B in Series G at a $380B valuation, led by GIC and Coatue, with backing from Microsoft, NVIDIA, Sequoia, and BlackRock. Claude Code now exceeds $2.5B ARR as Anthropic scales enterprise AI across AWS, Google Cloud, and Azure.

HOT PAPERS OF THE WEEK

  1. AI Doesn’t Reduce Work - It Intensifies It
    Harvard researchers found AI made teams work faster, longer, and broader, not less. Productivity rose - but so did burnout risk. (Harvard University)

  2. Gemini Deep Think in Research-Level Math & Science
    Google shows Gemini Deep Think solving Olympiad-level math and advancing PhD research via agent Aletheia - even contributing to publishable results. (Google)

  3. Why LLMs Pass Benchmarks but Fail Reasoning
    Stanford’s survey maps reasoning failures into fundamental, domain-specific, and robustness gaps, exposing why strong scores don’t mean true reasoning. (Stanford University)

  4. Learning to Reason in 13 Parameters
    Meta trains an 8B model to 91% on GSM8K using just 13 parameters via RL. TinyLoRA shows reasoning gains don’t require massive updates. (Meta)

NEW EMPOWERED AI TOOLS

  1. 🧘 Lovon AI Therapy offers voice-based AI support, letting you talk it out anytime you need.

  2. 🔌 ZenMux provides an enterprise LLM gateway, with unified API, smart routing & automatic compensation.

  3. 🆚 Code Arena lets you prompt once and compare AI-built apps, exporting ready-to-run code for free.

  4. GPT-5.3-Codex-Spark is an ultra-fast coding model, built for real-time collaboration with 128K context.

AI BREAKTHROUGH

openai-shipped-a-1m-line-product-with-0-human-code

OpenAI ran a bold experiment: ship a real product where every line of code is written by Codex + GPT-5. In just 5 months, they generated ~1 million lines of code and merged 1,500 pull requests, averaging 3.5 PRs per engineer per day.

No manual coding. Humans only steer. Engineers focused on designing strict architecture, building feedback loops, and making the repository fully legible to agents.

Codex can now reproduce bugs, fix them, test the UI with Chrome DevTools, open pull requests, respond to feedback, and merge changes end-to-end.

The shift is clear: in an agent-first world, code is cheap. System design is the real leverage.

We read your emails, comments, and poll replies daily

How would you rate today’s newsletter?

Your feedback helps us create the best newsletter possible

Login or Subscribe to participate in polls.

Hit reply and say Hello – we'd love to hear from you!
Like what you're reading? Forward it to friends, and they can sign up here.

Cheers,
The AI Fire Team

Reply

or to participate.