Another 2 ways of regularization, Label Smoothing and Weight Tying
Intro
I have been implementing GPT model myself these days, and I found 2 interesting regularization techniques. One is Label Smoothing mentioned in the transformer paper. The other is Weight Tying, which is found in nanoGPT codebase.
Label Smoothing
This post is licensed under CC BY 4.0 by the author.
Comments powered by Disqus.