Position Encoding

Posted Sep 16, 2024

By Jinyu Xie 1 min read

Original Fixed Sinusoid Encoding vs Learned Encoding

The original fixed sinusoid encoding proposed by the Transformer paper.

\[PE(pos, 2i) = \sin(pos / 10000^{2i / d_{model}})\] \[PE(pos, 2i + 1) = \cos(pos / 10000^{2i / d_{model}})\]

Hypothesis is this could help model easily attend the relative positioned tokens. Since $PE(pos + k)$ is a linear transformation of $PE(pos)$, recalling that $\sin(a + b) = \sin a \cos b + \cos a \sin b$.

The fixed encoding has almost identical performance as the learned position embedding.
Note: the Transformer authors chose the fixed encoding assuming the fixed encoding extrapolates better on the sequence length that is unseen in training data.

Notes, MachineLearning

machine_learning

This post is licensed under CC BY 4.0 by the author.

Original Fixed Sinusoid Encoding vs Learned Encoding

Trending Tags