25 Things Nano Banana Pro Does That AI Couldn't Before

November 22, 2025

ai-daily-brief-podcast

NanoBanana Pro: 25 Things You Can Do That Weren’t Possible Before

Overview

This episode of AI Daily Brief covers the release and capabilities of NanoBanana Pro (Google’s latest image generation model, technically named something like Gemini 2.5 Flash Image Gen), arguing it represents a more significant practical leap for everyday users than several concurrent major model releases. The speaker positions the model as a paradigm shift in AI image generation, not merely an incremental improvement. No speaker name or affiliation is explicitly stated beyond hosting the AI Daily Brief podcast/video series.

Source Video: URL not provided. The AI Daily Brief is available on Spotify and YouTube.

Prerequisites

Basic familiarity with AI image generation models (e.g., Midjourney, DALL-E, Stable Diffusion)
General understanding of large language models (LLMs) and multimodal AI
Awareness of Google’s Gemini model family
Familiarity with concepts like prompt engineering and model benchmarking
Helpful but not required: knowledge of Magic: The Gathering card mechanics; basic understanding of video generation tools like Veo/VO 3.1

Main Points

The Context: A Crowded Release Window

The model launched alongside GPT 5.1, Gemini 3, GPT 5.1 Pro, and Codex Max within a two-week span
Despite the competition, NanoBanana Pro is argued to have the greatest immediate impact on practical user capabilities
The original NanoBanana (codename that stuck) was notable for steerability and fine-grained editing; Pro extends this significantly

Two Core Capability Unlocks

Text representation: The improvement in rendering accurate, readable, aesthetically integrated text in images is described as “the single biggest jump between models” ever observed in image generation
Reasoning over image generation: Within Gemini, image generation is not a disconnected tool — users can converse with the model, reason through creative intent, and iteratively refine outputs
A third supporting factor: extremely high fidelity to edit instructions, making precise, targeted changes without degrading surrounding image quality
The speaker proposes an “unlock score” metric — measuring what new capabilities a model makes possible, rather than benchmark performance alone — as a more useful evaluation heuristic for models like this

The Meta-Theme: Visual Compression

The overarching pattern across most use cases is compressing large amounts of information into dense, accurate visual outputs
This is described as a qualitative change, not just quantitative: the model doesn’t just handle text better, it enables visual storytelling at a level previously impossible

Use Case 1–3: Data Visualization and Accurate Charts

Didi Das (Menlo Ventures) converted an entire NVIDIA Q3 earnings PDF into a single-page infographic with revenue, operating income, gross margin, segment performance, and risk highlights
Justine Moore (a16z) did the same for Alphabet’s Q1 earnings, producing accurate, high-density revenue and income charts
Simon Smith demonstrated bar charts scaled to correct proportional lengths — a task prior image models consistently failed at
Kashyap Shivakumar (Google DeepMind) tested GDP per capita charts and found outputs both accurate to scale and aesthetically polished

Use Case 4: Whiteboard Summaries (The “Compression Algorithm” Use Case)

Pietro Sciarano converted a 92-page Llama 3 technical PDF into a professor’s whiteboard photo
Described as “the greatest compression algorithm in human history” — hyperbolic but illustrative of the capability
Works because text handling and reasoning combine to produce genuinely summarized, visually organized content

Use Cases 5–8: Educational Infographics and Explainers

Robotics bottleneck visualizations; touchscreen explainer generated from a minimal prompt (“make an infographic explaining how a touchscreen works”) producing a clean four-part visual
Swix asked the model to explain itself — received both an academic infographic and a comic strip format
Google’s Jacqueline Konselman created a solar system poster in the style of children’s wall art
Speaker’s own example: a construction-equipment-themed alphabet chart for a four-year-old, generated without specifying individual items per letter — the model inferred appropriate entries (e.g., asphalt paver for A, bulldozer for B)

Use Case 9: Flowcharts

Ethan Mollick prompted a deliberately over-complicated flowchart for toasting bread; model executed accurately
Practical applications in process documentation are noted as a natural extension

Use Cases 10–11: Visual Tutorials and Recipes

Callum McClark generated a step-by-step bowing guide for ITF Taekwondo from a simple natural language prompt
Chubby on X produced a cardamom tea preparation chart; Vittorio created a pasta cooking guide
Speaker notes the obvious commercial value for assembly instructions and how-to content

Use Case 12: Anatomical and Technical Drawings

The JSON Prompts account demonstrated Pokémon anatomy diagrams (Pikachu, Squirtle, Bulbasaur, Charmander) as illustrative examples of the technical drawing capability

Use Case 13: Media-to-Media Transformation

Shopify CEO Toby Lutke converted a video of a speech into a rich visualization
Speaker indicates intent to apply the same technique to podcast transcripts

Use Case 14: Blueprints

AI for Success account demonstrated the model reading an actual blueprint, comprehending its spatial layout, and generating a final image reflecting all architectural details
Cited as evidence of genuine multimodal spatial understanding

Use Case 15: Virtual Staging and Interior Design

Justine Moore provided three furniture reference images and received a staged living room; notes improved texture and asymmetry retention versus prior model version
Alcine uploaded a floor plan and received full room-by-room design visualizations based on actual room dimensions
Speaker frames this as augmenting professional interior designers rather than replacing them, while acknowledging the scale of capability increase

Use Case 16: Combining Multiple People Into a Single Image

FOFR found the model accepts up to 14 reference images; optimal performance around 5 people
Prior models frequently blended facial features rather than compositing distinct individuals; NanoBanana Pro handles this more accurately
Multiple stylistic outputs are possible from the same reference set

Use Case 17: Precise Photo Editing

Clark Wimberly changed a man’s facial expression in a warehouse photo to convey concern — naturally, without exaggeration
Clark also swapped a White Claw can for a glass of soda with a striped straw
Prins changed Magic: The Gathering cards from red to black — the model independently understood it needed to change the mountain land art to a swamp, and correctly updated the card border pattern to match black card visual conventions (not merely recoloring)
Described as demonstrating comprehension beyond literal prompt instructions

Use Cases 18–19: Advertising, Product Shots, and Brand Assets

High-fidelity earbud advertising visuals; Hedra Labs logo on a billboard; Jacob Palsall converting product reference images into magazine-style ads
Crystal Maria generated a complete brand identity for a fictional chicken pizza company — pizza box, t-shirt, and hat — with a consistent integrated logo system, in a single prompt
Andrew Lane replicated this for a matcha energy and collagen brand
Speaker notes logos themselves still tend toward aesthetically poor outputs, consistent with the visual style of most training data

Use Case 20: Relaxed IP/Likeness Guardrails (With Caveats)

Google appears to have loosened content restrictions somewhat; users were generating accurate Star Wars and Disney logos
Speaker questions whether this will persist and frames it cautiously, but acknowledges the utility for brand-adjacent creative work

Use Case 21: Cinematic Movie Stills

PJ Ace generated photorealistic fake production stills for a Legend of Zelda movie
Archit Rathi generated Wallace and Gromit stills from multiple camera angles; describes it as “a leapfrog moment for AI filmmaking”

Use Case 22: Annotating Images to Direct Video Generation

Nick Matariz describes a workflow:
1. Generate or upload an image using NanoBanana
2. Use NanoBanana Pro to add sketch annotations describing camera movement (e.g., “crane up and look down as an aerial shot”)
3. Feed annotated image into VO 3.1’s frames-to-video pipeline
The annotations function as visual direction for the video model, bridging intent and output in a new way

Use Cases 23–25: Media Remixing, Physics, and Memes

Digital news articles rendered on old newsprint; contemporary logos made fluffy; children’s photos turned into movie posters
Christopher Friant applied a portrait (Sidney Sweeney) to a dodecahedron surface — demonstrating physics-aware image projection
FOFR converted a meme into a Lego construction
Speaker converted the “bass face kid” meme into a four-panel progression scale (normal → mild → intense → insane), which the model executed accurately without being given the intermediate states explicitly

The Saturated Benchmark Problem

Ethan Mollick’s long-standing test prompt — “otters on a plane using Wi-Fi” — which reliably exposed failures in prior models, is now effectively solved
NanoBanana Pro produced lab-coat-clad otters at a whiteboard explaining why previous models had struggled, with a gallery wall showing historical failures
Speaker uses this as a closing illustration that the field has entered “very new territory”

Key Concepts

NanoBanana Pro: Google’s latest image generation model (codename; technical name approximated as Gemini 2.5 Flash Image Gen); notable for text fidelity, reasoning integration, and precise editing
Unlock Score: A proposed evaluation metric measuring what new practical capabilities a model makes possible, as distinct from performance on standardized benchmarks
Visual Compression: The practice of converting large volumes of textual or data-heavy information into dense, accurate, readable visual formats using the model
Fine-grained / Precise Editing: The ability to make targeted, specific changes to an image (e.g., expression, object swap, card color) while preserving unrelated elements accurately
Reasoning on Image Generation: The integration of conversational LLM reasoning with image output within a single model session, enabling iterative refinement and intent-driven generation
Reference Image Compositing: The ability to accept multiple uploaded reference images (faces, products, furniture) and accurately combine their distinct features into a single output
Multimodal Understanding: The model’s capacity to read, interpret, and act on structured inputs such as blueprints, floor plans, or existing image content — not just text prompts
Frames-to-Video Pipeline (VO 3.1): A video generation workflow where a still image (potentially annotated) is used as an input frame to direct video output
Benchmark Saturation: The phenomenon where a previously reliable failure-revealing test prompt is solved well enough by a new model that it no longer discriminates capability

Summary

The speaker’s central argument is that NanoBanana Pro (Google’s latest image generation model) represents a qualitative shift in what AI image generation can do, driven primarily by two compounding capabilities: dramatically improved text representation within images, and the integration of language model reasoning directly into the image generation workflow. Together, these unlock a new meta-category of use — visual compression — in which large volumes of information (financial reports, technical papers, instructional content, brand systems) can be accurately, aesthetically, and densely encoded into visual outputs. The speaker surveys approximately 25 concrete examples from early adopters spanning data visualization, educational infographics, technical drawings, advertising, interior design, film production, and meme remixing, arguing that most of these were practically impossible or severely limited just days before the model’s release. Rather than evaluating the model through conventional benchmarks, the speaker advocates for an “unlock score” framework focused on newly accessible capabilities, under which NanoBanana Pro scores exceptionally high. The closing message is a direct recommendation: users with Gemini access should explore the model immediately, particularly for tasks requiring high information density in visual form.