Plain English Papers Can image models understand what we’re asking for? High-quality graphics vs high-quality understanding — which one matters more? “Imagen 3 is our best diffusion model for text-to-image generation, capable of following descriptive prompts"