How can Transformers handle variable length input?

Exposing one of the most untapped properties of self-attention

4 min readMar 3, 2024

We all know that Transformers were designed to be used for Natural Language Processing (NPL). In NPL you assume variable sequence length as input: when we ask ChatGPT something we are not worried about how many words we are inputting.

How can Transformers handle variable length input?

Exposing one of the most untapped properties of self-attention

Written by Javier