How can Transformers handle variable length input?

Exposing one of the most untapped properties of self-attention

Javier
4 min readMar 3, 2024

--

We all know that Transformers were designed to be used for Natural Language Processing (NPL). In NPL you assume variable sequence length as input: when we ask ChatGPT something we are not worried about how many words we are inputting.

--

--

Javier

AI Research Engineer in Deep Learning. Living between the math and the code. A philosophic seeker interested in the meaning of everything from quarks to AI.