Why You Need Different AIs for Different Jobs

ai-daily-brief-podcast

Why You Need Different AIs for Different Jobs

Overview

This episode of the AI Daily Brief (published September 11, 2025) argues that the most effective AI users — individuals and enterprises alike — recognize that no single model is universally best, and that matching the right AI tool to the right task is the defining characteristic of AI power users. The host uses several concurrent news stories (Microsoft/Anthropic, Claude’s document features, coding model shifts, and Google’s VO3 updates) to illustrate this thesis in practice. No individual speaker name or affiliation is given beyond the show itself.

Source video: (URL not provided)


Prerequisites

  • Familiarity with major AI model providers: OpenAI (ChatGPT, GPT-4o, GPT-5, Codex), Anthropic (Claude, Claude Sonnet 4), Google (Gemini, VO3), Meta, and Microsoft (Copilot/Office 365)
  • Basic understanding of AI model benchmarking and the concept of model trade-offs (cost vs. performance)
  • General awareness of the enterprise AI market and cloud infrastructure providers (AWS, Google Cloud, Oracle Cloud)
  • Familiarity with AI-assisted coding tools (Claude Code, Codex) and image/video generation tools (Midjourney, Ideogram, VO3)
  • Understanding of concepts like ARR (Annual Recurring Revenue), contract backlog, and cloud infrastructure revenue

Main Points

1. Volkswagen’s $1.2 Billion AI Bet

  • Volkswagen announced a $1.2 billion investment in AI capabilities through the end of the decade.
  • Focus areas include AI-supported vehicle development, industrial applications, and IT infrastructure expansion.
  • The company currently operates more than 1,200 AI applications, with hundreds more in development.
  • A global AI-powered engineering application is expected to reduce vehicle development time from three years to two.
  • Volkswagen projects $5 billion in savings by 2035; the stated ambition is “no process without AI.”

2. Oracle’s Explosive Cloud Revenue Forecast

  • Oracle reported a 359% year-over-year increase in contract backlog, reaching $455 billion.
  • Four multi-billion dollar contracts were signed across three customers in a single quarter.
  • Cloud infrastructure revenue is projected to grow from ~$18 billion this year to $144 billion by 2030 — an 8x increase.
  • Analysts described the forecast as evidence of “a seismic shift in computing.”
  • Oracle stock rose 27% in after-hours trading following the announcement.

3. Google Cloud’s $106 Billion Backlog

  • Google Cloud CEO Thomas Kurian reported $106 billion in unfulfilled customer commitments.
  • Approximately $58 billion of that backlog is expected to convert to revenue within two years.
  • Current Google Cloud revenue stands at $13.6 billion, growing at a 32% annual rate.
  • The backlog is growing faster than revenue itself, indicating accelerating demand.

4. Meta’s Licensing Deals with Image Generation Startups

  • Meta signed a multi-year licensing deal with Black Forest Labs: $35 million in year one, $105 million in year two.
  • Black Forest Labs is currently generating $96 million in ARR and is projected to reach $300 million ARR next year.
  • Meta previously signed a licensing deal with Midgenie for “aesthetic technology.”
  • Black Forest Labs has partnership relationships with Adobe, Canva, Snap, and previously xAI (for Grok’s initial image generation).

5. OpenAI Reverses Course on Standard Voice Mode Deprecation

  • OpenAI had announced the retirement of standard voice mode following an expansion of advanced voice mode access.
  • User backlash — similar to the GPT-4o deprecation controversy — prompted OpenAI to reverse the decision.
  • Users reported that advanced voice mode felt “lazier” and “less smart” in some contexts, though it outperformed in domains like customer service call handling.
  • The episode raises the question of whether older models are genuinely better in some ways, or whether users simply adapt to familiar tools.

6. Apple iPhone 17 Event: AI Downplayed

  • Unlike 2024, Apple barely mentioned Apple Intelligence at its iPhone 17 launch event.
  • Subtle AI-related developments included neural accelerators embedded in each GPU core, described as “MacBook Pro levels of compute” in a smartphone — potentially enabling on-device AI models.
  • AirPods Pro 3 will feature live in-person language translation as a native capability.
  • The host interprets Apple’s restraint as a deliberate choice to build infrastructure for AI without over-promising on features.

7. Microsoft Adopting Anthropic’s Claude for Office 365 Copilot — The Core Case Study

  • Microsoft, historically an exclusive OpenAI partner for Copilot, is now integrating Anthropic’s Claude into Office 365 features.
  • Internal Microsoft teams found Claude Sonnet 4 performs better than OpenAI’s models for specific tasks: financial functions in Excel and generating aesthetically pleasing PowerPoint presentations.
  • Microsoft is willing to pay for Claude via Amazon Web Services even though OpenAI’s technology is available to them at no additional cost.
  • The host argues the primary driver is performance fit, not a negotiating tactic against OpenAI — though the hedging strategy is acknowledged.
  • This creates an opening for OpenAI to potentially launch a competing productivity suite.

8. Anthropic Launches Native Document Creation in Claude

  • Claude can now generate and export Excel spreadsheets, PowerPoint decks, Word documents, and PDFs directly from its interface.
  • The capability is powered by Claude’s computer use feature, which constructs documents in a virtual environment before delivery.
  • In a head-to-head comparison by Olivia Moore (a16z), Claude completed a full slide deck from a Figma S1 filing in 4.5 minutes; ChatGPT took 19 minutes, Manus took 28 minutes.
  • Ethan Mollick noted Claude generated 406 formulas from a single prompt with solid accuracy and formatting.
  • The host frames native document output as a capability that will quickly become a baseline expectation for LLMs.

9. Coding Model Competition: Codex vs. Claude Code

  • Usage data showed Codex sessions rising sharply while Claude Code sessions declined over the same period.
  • The primary driver cited by engineers: Codex offers strong performance relative to its cost.
  • Menlo Ventures’ July market update noted nearly half of AI programmers had upgraded to Claude Sonnet 4 at the time of its release.
  • The episode frames this as evidence that in production environments, the cost-performance ratio matters as much as raw capability.

10. Google VO3 Video Model: Price Cuts Targeting Professional Use

  • VO3 received a significant update including 1080p resolution support, vertical video format, and introduction of VO3 Fast.
  • Price reductions: VO3 dropped 46% to ~$0.40/second of generation; VO3 Fast dropped 62% to ~$0.15/second.
  • Google framed the update explicitly as targeting “scaled production use,” not just demonstration or viral content.
  • VO3’s key differentiator remains integrated audio generation baked into the video model.
  • The host anticipates a wave of competing models with audio generation over the next three to six months, each better suited to different video production use cases.

Key Concepts

  • Model fit / task specificity: The principle that different AI models have different strengths, and optimal results come from matching the model to the task rather than defaulting to a single “best” model.
  • Cost-performance ratio: The trade-off between a model’s capability and its inference cost, increasingly central to production AI deployment decisions.
  • Contract backlog: Signed but unfulfilled revenue commitments; used here as a forward-looking demand signal for cloud AI infrastructure.
  • Claude computer use: Anthropic’s capability allowing Claude to operate a virtual computing environment to perform multi-step tasks, such as constructing Office documents, before delivering results to the user.
  • VO3 Fast: A lower-cost, faster variant of Google’s VO3 video generation model, positioned for high-volume production workflows.
  • Agentic AI sessions: Interactions in which an AI model autonomously executes multi-step tasks, as tracked by coding platforms to compare tool adoption (e.g., Codex vs. Claude Code).
  • Neural accelerator (Apple): Dedicated silicon within a GPU core optimized for machine learning inference, enabling on-device AI model execution.
  • Live in-person translation (AirPods Pro 3): A feature that detects a nearby speaker in a foreign language, translates in near real-time, and displays the user’s spoken response in the original language.
  • ARR (Annual Recurring Revenue): A normalized measure of recurring subscription or contract revenue, used here to assess the commercial scale of AI startups like Black Forest Labs.
  • Production era of AI: The phase beyond experimentation in which AI capabilities are deployed in reliable, scaled, cost-conscious business workflows.

Summary

The central argument of this episode is that the most effective approach to AI — whether for individual power users or large enterprises like Microsoft — is to reject the notion of a single best model and instead match different AI tools to different tasks based on capability, output quality, and cost. This thesis is illustrated across multiple converging stories: Microsoft integrating Claude into Office 365 because it produces more aesthetically polished PowerPoint slides and performs better on Excel tasks than GPT-5; Anthropic launching native document creation that demonstrably outperforms competitors on speed and precision; Codex displacing Claude Code in coding workflows primarily on cost-performance grounds; and Google cutting VO3 prices to make video generation viable for professional production at scale. Broader headlines — Volkswagen’s sweeping AI investment, Oracle and Google Cloud’s massive contract backlogs, and Meta’s image generation licensing deals — reinforce the picture of an AI market entering a mature production phase in which differentiation, fit-for-purpose selection, and economic efficiency are becoming as important as raw model performance.