Plain English Papers "I think you're testing me": Claude 3 LLM called out creators while they tested its limits Anthropic's new LLM told prompters it knew they were testing it Claude Opus has insane context and can detect needles deep in the haystack