Sign in Subscribe

Plain English Papers

Differential Transformers

LLMs work better when they ignore unimportant info

Can we train Transformers to focus more on what's important and less on irrelevant details? Photo by Ben Wicks / Unsplash

Read next

Netflix's VOID shows video editing has finally learned the laws of physics

By treating object removal as a causal simulation rather than a pixel-patching job, VOID eliminates "ghost" physics from edited scenes

Old vs new transformer block design

Simplifying transformer blocks

Revisiting Transformer Architectures for Potential Efficiency Gains

SmolDocling: An Ultra-Compact VLM for Document Understanding

SmolDocling: An Ultra-Compact VLM for Document Understanding