What 1,250 Professionals Say About Working With AI

ai-daily-brief-podcast

Study Document: What 1,250 Professionals Say About Working with AI

Overview

This episode of the AI Daily Brief (dated 2025-12-05) covers two segments. The headlines segment reviews several major AI industry developments. The main segment focuses on Anthropic’s large-scale professional survey conducted via a new AI-powered interview tool, examining how working professionals actually experience AI in their daily work. The host presents the survey findings as a meaningful counterpoint to theoretical automation studies, arguing that real-world lived experience data is what the field most urgently needs. The speaker is the host of the AI Daily Brief podcast/video series; no additional affiliation is stated.

Source video URL: Not provided.


Prerequisites

  • Basic familiarity with large language models (LLMs) and frontier AI products (Claude, GPT, Gemini)
  • Awareness of ongoing public debate around AI-driven job displacement and automation
  • General understanding of survey methodology and its limitations (scale vs. depth trade-offs)
  • Familiarity with AI coding assistants and “vibe coding” as a concept
  • Basic understanding of AI benchmarking practices

Main Points

1. Headline: Google Releases Gemini 3 DeepThink Mode

  • Available exclusively to Google AI Ultra Plan subscribers (several hundred dollars per month)
  • Designed for complex math, science, and logic problems; builds on Gemini 2.5 DeepThink
  • Claims state-of-the-art benchmark results: 41% on Humanity’s Last Exam (vs. GPT-5 Pro at 30.7%); 45% on ARC-AGI-2 (more than double GPT-5 Pro’s score)
  • Operates at approximately $77 per task — high inference cost explains the premium paywall; this is the first time regular users have had access to a model at this cost tier
  • Achieves performance by exploring multiple hypotheses simultaneously before delivering a solution
  • Early user reactions: initially overloaded servers; strong results on hard coding/debugging problems; host found limited added value for business strategy questions relative to cost

2. Headline: Google Partners with Replit for Enterprise Vibe Coding

  • Multi-year partnership integrates Google Cloud infrastructure and models into Replit; Replit apps gain access to Google Cloud Marketplace
  • Replit CEO Amjad Masad acknowledges a “vibe coding hype slowdown” — early tools burned users, revenue growth for vibe coding companies has softened
  • However, Ramp Economics Lab data shows Replit is currently #1 for new customer growth across all software vendors
  • Host’s contrarian view: vibe coding for non-technical/business users is under-penetrated and 2026 could be a major growth year for that audience, even as developer-focused vibe coding recalibrates

3. Headline: Google NeoCloud Partner FluidStack Raising $700M at $7B Valuation

  • FluidStack signed data center deals backed by Google (Google pledged to repay debt if FluidStack defaults); became an early third-party recipient of Google TPUs
  • Secured contract to build a gigawatt-capacity data center in France (part of Macron’s sovereign AI initiative)
  • Infrastructure partner for Anthropic’s $50B data center investment
  • New funding round reportedly led by Leopold Aschenbrenner’s hedge fund, Situational Awareness

4. Headline: Claude Opus 4.5 Declared to Have “Solved” CoreBench Scientific Agent Benchmark

  • CoreBench requires AI agents to reproduce scientific papers from code and data; scored on repo setup, code execution, and answering questions about results
  • Initial CoreAgent scaffold: Opus 4.5 scored 42% (below Opus 4.1’s 51%)
  • Re-run using Claude Code harness (suggested by DeepMind’s Nicholas Carlini): Opus 4.5 nearly doubled to 78%; after correcting grading errors (floating point issues, removed datasets), manual score reached 95%
  • The performance jump was unique to Opus 4.5 — Sonnet 4/4.5 saw smaller gains; Opus 4.1 regressed — suggesting the 4.5 series is particularly well-tuned to work with Claude Code, or that lower-level scaffolding instructions hinder more capable models
  • CoreBench team declared the benchmark solved; will pivot to undisclosed test questions to avoid training data contamination
  • Non-benchmark testimonials: Dan Shipper (Every) reports Opus 4.5 sustains autonomous coding without error loops; NYT’s Kevin Roos praises it for writing, brainstorming, and unexpected social pushback

5. Headline: Salesforce AgentForce Adoption Drives Strong Revenue Forecast

  • Q4 revenue forecast: $11.1–$11.2B (vs. analyst estimate of $10.9B); remaining performance obligations up ~15% (vs. 10% estimate)
  • AgentForce active customer accounts grew 70% quarter over quarter; over 9,500 paying customers
  • CEO Mark Benioff publicly praised Gemini 3 and posted that “LLMs are the new disk drives — commodity infrastructure you hot swap for whoever’s cheapest and best,” signaling potential model-switching away from OpenAI
  • Host flags model commoditization as a key theme to watch

6. Headline: Meta Considering Deep Cuts to Metaverse Division

  • Bloomberg reports potential 30% budget cuts to the Metaverse group, possibly including January layoffs
  • Metaverse group has lost over $70 billion since the strategy was announced in 2021; stock jumped 5.7% on the news
  • Meta is shifting investment toward AI glasses and wearables (Reality Labs’ broader AR/VR division); Meta Ray-Bans cited as a surprise hit
  • Meta hired veteran Apple UX designer Alan Dye to lead a new creative studio within Reality Labs focused on design, fashion, and technology

7. Main Segment: Anthropic’s Survey of 1,250 Professionals

The Research Tool: Anthropic Interviewer

  • Anthropic built an AI-powered interview tool to scale qualitative research; tested it by interviewing professionals about their AI work experience
  • Addresses a gap that their prior Clio system (privacy-preserving usage analytics) could not fill: what happens after the conversation — how outputs are used, how people feel, what they envision for the future
  • Google researcher Tao Dong characterized the format as “semi-structured surveys”: predefined open-ended questions with AI-driven follow-up, combining survey scale with interview flexibility, then analyzed via AI for quantitative patterns
  • Host argues this pattern — AI scaling information gathering and analysis — will become standard practice for research across domains, enabling research projects previously impossible at this scale

Key Survey Findings

  • 86% of professionals reported AI saves them time
  • 65% reported satisfaction with the role AI plays in their work
  • Optimism dominates across nearly all topics; pessimism is most concentrated in: career adaptation (general workforce), artist displacement, writer displacement (creatives), and security concerns (scientists)

Finding 1: Professionals Want to Preserve Identity-Defining Tasks

  • Workers want to delegate routine tasks to AI while retaining work central to their professional identity
  • Many envision their future role as overseeing AI systems rather than performing tasks directly
  • This mirrors insider discourse about human-AI management hierarchies, but is now emerging organically from workers themselves

Finding 2: Creatives Face Stigma and Displacement Anxiety but Are Adopting AI Anyway

  • Creative professionals navigate social stigma within their communities around AI use alongside deeper concerns about economic displacement and erosion of creative identity
  • A salesperson example: colleagues reportedly react negatively to AI-generated email correspondence, perceiving it as impersonal or lazy
  • Host raises an open question: whether this stigma is a temporary transitional feeling or a persistent cultural norm
  • Despite this, creatives are turning to AI to increase productivity; designers show more frustration than filmmakers within the creative category

Finding 3: Scientists Want More from AI but Don’t Yet Trust It for Core Research

  • Scientists uniformly express desire for AI that could generate hypotheses and design experiments
  • In practice, use is confined to peripheral tasks: writing manuscripts, debugging analysis code
  • Unlike other categories, scientists are not asking AI to automate routine tasks — they want AI partnership on their core functions but lack sufficient trust to act on it

Finding 4: Career Adaptation Is the Primary Source of Pessimism

  • Workers are actively trying to identify skills and roles that won’t be automated
  • Example: a trucking dispatcher quoted as trying to identify irreplaceable human value (“personalized human interactions”) while acknowledging uncertainty about long-term relevance
  • Host notes this has design implications for upskilling and retraining programs: training curricula risk being obsoleted by advancing models

Data Availability

  • Anthropic is making the full dataset publicly available on Hugging Face with participant approval, enabling independent analysis

Key Concepts

  • Gemini 3 DeepThink: Google’s most capable model variant, using multi-hypothesis exploration at inference time; available only on the premium AI Ultra subscription tier due to high per-task cost (~$77)
  • ARC-AGI-2: A benchmark measuring general reasoning capability; scored on both accuracy and cost-per-task
  • Humanity’s Last Exam: A benchmark of extremely difficult questions used to assess frontier model capability
  • Vibe Coding: The practice of building software using natural language prompts to AI tools, with little or no manual coding; currently bifurcating into developer-focused and non-technical-user-focused tracks
  • CoreBench: A scientific agent benchmark requiring AI to reproduce published research papers from code and data
  • Claude Code harness: A scaffold built around Anthropic’s Claude Code tool; found to dramatically improve Opus 4.5’s CoreBench performance
  • AgentForce: Salesforce’s AI agent product suite; currently their fastest-growing product
  • FluidStack: A NeoCloud infrastructure provider that received Google TPUs and is building large-scale AI data centers
  • Anthropic Interviewer: An AI-powered tool that conducts scalable semi-structured interviews, replacing or augmenting traditional survey and interview research methods
  • Clio: Anthropic’s prior privacy-preserving analytics system for understanding real-world Claude usage patterns within conversations
  • Semi-structured surveys: A research format (described by Google’s Tao Dong) combining predefined open-ended questions with AI-driven follow-up questions; merges the scale of surveys with the contextual depth of interviews
  • Model commoditization: The thesis that LLMs are becoming interchangeable infrastructure components, eliminating any durable competitive moat from the model itself
  • Reality Labs: Meta’s broader AR/VR division, distinct from the Metaverse group; encompasses AI glasses, Ray-Bans, and wearables

Summary

This episode of the AI Daily Brief presents a range of industry developments — including Google’s DeepThink release, Replit’s enterprise vibe coding push, Opus 4.5’s benchmark performance, and Meta’s pivot away from the Metaverse — before focusing its main argument on the Anthropic Interviewer study of 1,250 professionals. The host’s central contention is that the field needs empirical, large-scale, real-world data about how AI is actually being used, not just theoretical capability assessments or small-sample studies. The Anthropic survey, enabled by an AI-powered interview tool that scales qualitative research, finds that most professionals are optimistic about AI, with 86% reporting time savings and 65% expressing satisfaction, but that meaningful anxiety persists around career adaptation, creative displacement, and — uniquely among scientists — distrust of AI for core research tasks. The host argues that the emerging pattern of using AI to conduct and analyze large-scale interviews will itself become a transformative research methodology, and calls for this type of longitudinal, real-world tracking to be made available to policymakers on a regular basis. The overall message is that AI’s future is one of significant opportunity alongside genuine disruption, and that understanding the human experience of that transition requires the kind of systematic, scaled qualitative research that AI now makes possible.