25 Things Nano Banana Pro Does That AI Couldn't Before
NanoBanana Pro: 25 Things You Can Do That Weren’t Possible Before
Overview
This episode of AI Daily Brief covers the release and capabilities of NanoBanana Pro (Google’s latest image generation model, technically named something like Gemini 2.5 Flash Image Gen), arguing it represents a more significant practical leap for everyday users than several concurrent major model releases. The speaker positions the model as a paradigm shift in AI image generation, not merely an incremental improvement. No speaker name or affiliation is explicitly stated beyond hosting the AI Daily Brief podcast/video series.
Source Video: URL not provided. The AI Daily Brief is available on Spotify and YouTube.
Prerequisites
- Basic familiarity with AI image generation models (e.g., Midjourney, DALL-E, Stable Diffusion)
- General understanding of large language models (LLMs) and multimodal AI
- Awareness of Google’s Gemini model family
- Familiarity with concepts like prompt engineering and model benchmarking
- Helpful but not required: knowledge of Magic: The Gathering card mechanics; basic understanding of video generation tools like Veo/VO 3.1
Main Points
The Context: A Crowded Release Window
- The model launched alongside GPT 5.1, Gemini 3, GPT 5.1 Pro, and Codex Max within a two-week span
- Despite the competition, NanoBanana Pro is argued to have the greatest immediate impact on practical user capabilities
- The original NanoBanana (codename that stuck) was notable for steerability and fine-grained editing; Pro extends this significantly
Two Core Capability Unlocks
- Text representation: The improvement in rendering accurate, readable, aesthetically integrated text in images is described as “the single biggest jump between models” ever observed in image generation
- Reasoning over image generation: Within Gemini, image generation is not a disconnected tool — users can converse with the model, reason through creative intent, and iteratively refine outputs
- A third supporting factor: extremely high fidelity to edit instructions, making precise, targeted changes without degrading surrounding image quality
- The speaker proposes an “unlock score” metric — measuring what new capabilities a model makes possible, rather than benchmark performance alone — as a more useful evaluation heuristic for models like this
The Meta-Theme: Visual Compression
- The overarching pattern across most use cases is compressing large amounts of information into dense, accurate visual outputs
- This is described as a qualitative change, not just quantitative: the model doesn’t just handle text better, it enables visual storytelling at a level previously impossible
Use Case 1–3: Data Visualization and Accurate Charts
- Didi Das (Menlo Ventures) converted an entire NVIDIA Q3 earnings PDF into a single-page infographic with revenue, operating income, gross margin, segment performance, and risk highlights
- Justine Moore (a16z) did the same for Alphabet’s Q1 earnings, producing accurate, high-density revenue and income charts
- Simon Smith demonstrated bar charts scaled to correct proportional lengths — a task prior image models consistently failed at
- Kashyap Shivakumar (Google DeepMind) tested GDP per capita charts and found outputs both accurate to scale and aesthetically polished
Use Case 4: Whiteboard Summaries (The “Compression Algorithm” Use Case)
- Pietro Sciarano converted a 92-page Llama 3 technical PDF into a professor’s whiteboard photo
- Described as “the greatest compression algorithm in human history” — hyperbolic but illustrative of the capability
- Works because text handling and reasoning combine to produce genuinely summarized, visually organized content
Use Cases 5–8: Educational Infographics and Explainers
- Robotics bottleneck visualizations; touchscreen explainer generated from a minimal prompt (“make an infographic explaining how a touchscreen works”) producing a clean four-part visual
- Swix asked the model to explain itself — received both an academic infographic and a comic strip format
- Google’s Jacqueline Konselman created a solar system poster in the style of children’s wall art
- Speaker’s own example: a construction-equipment-themed alphabet chart for a four-year-old, generated without specifying individual items per letter — the model inferred appropriate entries (e.g., asphalt paver for A, bulldozer for B)
Use Case 9: Flowcharts
- Ethan Mollick prompted a deliberately over-complicated flowchart for toasting bread; model executed accurately
- Practical applications in process documentation are noted as a natural extension
Use Cases 10–11: Visual Tutorials and Recipes
- Callum McClark generated a step-by-step bowing guide for ITF Taekwondo from a simple natural language prompt
- Chubby on X produced a cardamom tea preparation chart; Vittorio created a pasta cooking guide
- Speaker notes the obvious commercial value for assembly instructions and how-to content
Use Case 12: Anatomical and Technical Drawings
- The JSON Prompts account demonstrated Pokémon anatomy diagrams (Pikachu, Squirtle, Bulbasaur, Charmander) as illustrative examples of the technical drawing capability
Use Case 13: Media-to-Media Transformation
- Shopify CEO Toby Lutke converted a video of a speech into a rich visualization
- Speaker indicates intent to apply the same technique to podcast transcripts
Use Case 14: Blueprints
- AI for Success account demonstrated the model reading an actual blueprint, comprehending its spatial layout, and generating a final image reflecting all architectural details
- Cited as evidence of genuine multimodal spatial understanding
Use Case 15: Virtual Staging and Interior Design
- Justine Moore provided three furniture reference images and received a staged living room; notes improved texture and asymmetry retention versus prior model version
- Alcine uploaded a floor plan and received full room-by-room design visualizations based on actual room dimensions
- Speaker frames this as augmenting professional interior designers rather than replacing them, while acknowledging the scale of capability increase
Use Case 16: Combining Multiple People Into a Single Image
- FOFR found the model accepts up to 14 reference images; optimal performance around 5 people
- Prior models frequently blended facial features rather than compositing distinct individuals; NanoBanana Pro handles this more accurately
- Multiple stylistic outputs are possible from the same reference set
Use Case 17: Precise Photo Editing
- Clark Wimberly changed a man’s facial expression in a warehouse photo to convey concern — naturally, without exaggeration
- Clark also swapped a White Claw can for a glass of soda with a striped straw
- Prins changed Magic: The Gathering cards from red to black — the model independently understood it needed to change the mountain land art to a swamp, and correctly updated the card border pattern to match black card visual conventions (not merely recoloring)
- Described as demonstrating comprehension beyond literal prompt instructions
Use Cases 18–19: Advertising, Product Shots, and Brand Assets
- High-fidelity earbud advertising visuals; Hedra Labs logo on a billboard; Jacob Palsall converting product reference images into magazine-style ads
- Crystal Maria generated a complete brand identity for a fictional chicken pizza company — pizza box, t-shirt, and hat — with a consistent integrated logo system, in a single prompt
- Andrew Lane replicated this for a matcha energy and collagen brand
- Speaker notes logos themselves still tend toward aesthetically poor outputs, consistent with the visual style of most training data
Use Case 20: Relaxed IP/Likeness Guardrails (With Caveats)
- Google appears to have loosened content restrictions somewhat; users were generating accurate Star Wars and Disney logos
- Speaker questions whether this will persist and frames it cautiously, but acknowledges the utility for brand-adjacent creative work
Use Case 21: Cinematic Movie Stills
- PJ Ace generated photorealistic fake production stills for a Legend of Zelda movie
- Archit Rathi generated Wallace and Gromit stills from multiple camera angles; describes it as “a leapfrog moment for AI filmmaking”
Use Case 22: Annotating Images to Direct Video Generation
- Nick Matariz describes a workflow:
- Generate or upload an image using NanoBanana
- Use NanoBanana Pro to add sketch annotations describing camera movement (e.g., “crane up and look down as an aerial shot”)
- Feed annotated image into VO 3.1’s frames-to-video pipeline
- The annotations function as visual direction for the video model, bridging intent and output in a new way
Use Cases 23–25: Media Remixing, Physics, and Memes
- Digital news articles rendered on old newsprint; contemporary logos made fluffy; children’s photos turned into movie posters
- Christopher Friant applied a portrait (Sidney Sweeney) to a dodecahedron surface — demonstrating physics-aware image projection
- FOFR converted a meme into a Lego construction
- Speaker converted the “bass face kid” meme into a four-panel progression scale (normal → mild → intense → insane), which the model executed accurately without being given the intermediate states explicitly
The Saturated Benchmark Problem
- Ethan Mollick’s long-standing test prompt — “otters on a plane using Wi-Fi” — which reliably exposed failures in prior models, is now effectively solved
- NanoBanana Pro produced lab-coat-clad otters at a whiteboard explaining why previous models had struggled, with a gallery wall showing historical failures
- Speaker uses this as a closing illustration that the field has entered “very new territory”
Key Concepts
- NanoBanana Pro: Google’s latest image generation model (codename; technical name approximated as Gemini 2.5 Flash Image Gen); notable for text fidelity, reasoning integration, and precise editing
- Unlock Score: A proposed evaluation metric measuring what new practical capabilities a model makes possible, as distinct from performance on standardized benchmarks
- Visual Compression: The practice of converting large volumes of textual or data-heavy information into dense, accurate, readable visual formats using the model
- Fine-grained / Precise Editing: The ability to make targeted, specific changes to an image (e.g., expression, object swap, card color) while preserving unrelated elements accurately
- Reasoning on Image Generation: The integration of conversational LLM reasoning with image output within a single model session, enabling iterative refinement and intent-driven generation
- Reference Image Compositing: The ability to accept multiple uploaded reference images (faces, products, furniture) and accurately combine their distinct features into a single output
- Multimodal Understanding: The model’s capacity to read, interpret, and act on structured inputs such as blueprints, floor plans, or existing image content — not just text prompts
- Frames-to-Video Pipeline (VO 3.1): A video generation workflow where a still image (potentially annotated) is used as an input frame to direct video output
- Benchmark Saturation: The phenomenon where a previously reliable failure-revealing test prompt is solved well enough by a new model that it no longer discriminates capability
Summary
The speaker’s central argument is that NanoBanana Pro (Google’s latest image generation model) represents a qualitative shift in what AI image generation can do, driven primarily by two compounding capabilities: dramatically improved text representation within images, and the integration of language model reasoning directly into the image generation workflow. Together, these unlock a new meta-category of use — visual compression — in which large volumes of information (financial reports, technical papers, instructional content, brand systems) can be accurately, aesthetically, and densely encoded into visual outputs. The speaker surveys approximately 25 concrete examples from early adopters spanning data visualization, educational infographics, technical drawings, advertising, interior design, film production, and meme remixing, arguing that most of these were practically impossible or severely limited just days before the model’s release. Rather than evaluating the model through conventional benchmarks, the speaker advocates for an “unlock score” framework focused on newly accessible capabilities, under which NanoBanana Pro scores exceptionally high. The closing message is a direct recommendation: users with Gemini access should explore the model immediately, particularly for tasks requiring high information density in visual form.