- AI Fire
- Posts
- 🎭 Claude’s Pretty, GPT-5’s Smart, Humans Mid
🎭 Claude’s Pretty, GPT-5’s Smart, Humans Mid
14 Years of Training vs. 14 Seconds of AI

Read time: 5 minutes
Can AI really replace weeks of expert work? OpenAI just proved it’s closer than you think - with a new $3T benchmark. Claude is winning on looks. GPT-5 on logic. Humans? Not always first place anymore…
What are on FIRE 🔥
IN PARTNERSHIP WITH DESELECT
Build powerful Salesforce segments quickly, no coding or IT support needed.
Build powerful Salesforce segments with ease, no coding or IT support required. Accelerate your marketing campaigns using drag-and-drop tools for smart segmentation.
Empower your team to target the right audience efficiently. Try DESelect Segment now and revolutionize your marketing campaigns!
AI INSIGHTS
OpenAI just dropped a new benchmark: GDPval. It doesn’t test games, riddles, or trivia. It measures how well AI handles real-world, economically valuable tasks - the kind that drive $3T+ of U.S. GDP.
Here’s what it looks like:
44 occupations, 9 sectors (finance, law, design, engineering…)
1,320 tasks in the full set; 220-task “gold” subset is open-source
Each task = real expert work (avg expert has 14 years experience)
Formats span Excel sheets, CAD files, videos, decks, images
Performance Trends:
Progress is steady: each new generation beats the last
Claude Opus 4.1 = best on aesthetics (layout, formatting)
GPT-5 = best on accuracy (calculations, following instructions)
On the gold set, 47.6% of Claude’s work matched or beat humans
Speed & Cost:
With human review: 1.2–1.6× faster & cheaper than experts
Raw speed? Models are 90–300× faster - but quality checks still matter
Weak Spots:
Claude, Gemini, Grok: often ignore instructions / wrong formats
GPT-5: strongest on accuracy, weakest on PowerPoint/Word formatting
True “catastrophic” errors rare (~3%)
What’s Next:
More reasoning effort + better prompting = higher win rates
Open-source gold subset live now → evals.openai.com
Why it matters: Benchmarks like MMLU showed knowledge. GDPval shows economic value. Frontier models aren’t just getting smarter - they’re starting to replace weeks of expert work with deliverables judged equal (or better) by other experts.
PRESENTED BY ROKU
It’s go-time for holiday campaigns
Roku Ads Manager makes it easy to extend your Q4 campaign to performance CTV.
You can:
Easily launch self-serve CTV ads
Repurpose your social content for TV
Drive purchases directly on-screen with shoppable ads
A/B test to discover your most effective offers
The holidays only come once a year. Get started now with a $500 ad credit when you spend your first $500 today with code: ROKUADS500. Terms apply.
TODAY IN AI
AI HIGHLIGHTS
🍏 Pulse just landed in ChatGPT Pro (iOS/Android preview). Sam Altman calls it his “favorite feature so far.” It runs overnight, delivers 5-10 cards each morning, and even drafts agendas from Gmail or Calendar if you connect them. The catch: you need memory on.
🌍 A researcher dropped TinyWorlds - a minimal world-modeling codebase on GitHub. It compresses video into tokens, predicts actions between frames, and generates future frames. It’s built to be hackable, so you can fork, PR, or plug in new modules.
📈 Anthropic will triple its global headcount, with new offices in Dublin, London, Zurich, and Tokyo. Claude now powers 300k+ businesses worldwide, and revenue jumped from $1B → $5B in 8 months. Microsoft just signed to integrate Claude into Copilot.
💻 Compute shortages may push markets into auctions. That means companies could soon bid for GPU time instead of paying flat contracts - a big shift in how labs and startups budget for AI access.
⚖️ Elon Musk’s xAI filed a new lawsuit against OpenAI over alleged trade-secret theft. Court docs cite ex-employees copying code and sharing it via Signal. OpenAI called it “harassment,” but the case highlights just how cutthroat the AI talent wars have become.
💰 AI Daily Fundraising: Distyl AI raised $175M at a $1.8B valuation, backed by Lightspeed, Khosla Ventures, DST Global, Coatue, and Dell Technologies Capital. The company helps Fortune 500 firms in healthcare, telecom, insurance, and finance become AI-native enterprises.
AI SOURCES FROM AI FIRE
NEW EMPOWERED AI TOOLS
📢 Scrumball runs influencer campaigns with AI agents, tapping into a 120M+ creator database
🛡️ Fakeradar gives real-time deepfake protection for video calls with one click
💻 Neutron is a proactive desktop AI assistant that helps before you even ask
🎨 Figma MCP brings your design context into IDEs and AI agents with remote access
AI QUICK HITS
🌐 Google launched the MCP Server, giving AI agents direct access to Data Commons datasets for faster, reliable stats
🛒 Microsoft unveiled Marketplace, a hub with 3,000+ AI apps & agents to rival AWS
🎵 Spotify removed 75M AI spam tracks, adding new impersonation policies & AI disclosure credits
🇰🇷 South Korea pledged ₩530B ($390M) to fund local AI giants like LG, SK Telecom, and Naver to compete with OpenAI & Google
📱 YouTube Labs is testing AI hosts in Music, serving trivia and commentary for listeners
AI CHART
Meta’s FAIR team just released Code World Model (CWM) - a 32B open-weights LLM that doesn’t just read code… it learns what code does when it runs.
⚡ Why It Matters:
Open-source: weights, checkpoints, inference code - free for research.
Performance:
SWE-bench: 65.8% pass@1
LiveCodeBench: 68.6% pass@1
Math-500: 96.6%
→ Rivals bigger closed models like Claude and GPT-oss.
🛠️ New Tricks:
Neural debugger → predicts Python state line-by-line.
Agentic coding → fixes bugs end-to-end inside repos.
Reasoning mode →
<think>
tokens for step-by-step logic.
Bottom line: Meta just gave the open-source world a coding agent that can compete head-to-head with giants.
We read your emails, comments, and poll replies daily
How would you rate today’s newsletter?Your feedback helps us create the best newsletter possible |
Hit reply and say Hello – we'd love to hear from you!
Like what you're reading? Forward it to friends, and they can sign up here.
Cheers,
The AI Fire Team
Reply