Nano-GPT : Decoder only Transformer Simple GPT with multiheaded attention for char level tokens, inspired from Andrej Karpathy's video lectures : https://github.com/karpathy/ng-video-lecture Features Multi-headed self attention Layer Norm layers Skip connections Feed Forward layer