How can Transformers handle variable length input?
We all know that Transformers were designed to be used for Natural Language Processing (NPL). In NPL you assume variable sequence length as input: when we ask ChatGPT something we are not worried about how many words we are inputting.