Plain English Papers Differential Transformers LLMs work better when they ignore unimportant info Can we train Transformers to focus more on what's important and less on irrelevant details? Photo by Ben Wicks / Unsplash