
Shyam's Blog
A deep dive into the internals of a small transformer model to learn how it turns self-attention calculations into accurate predictions for the next token.
Deriving the backprop equations for convolutions using a symbolic computing engine. Outputs from the engine are both representations of the equations and the metadata needed to create visualizations of what they do.