Shyam's Blog

Beyond Self-Attention: How a Small Language Model Predicts the Next Token

A deep dive into the internals of a small transformer model to learn how it turns self-attention calculations into accurate predictions for the next token.

Using a Machine to Learn Machine Learning

Deriving the backprop equations for convolutions using a symbolic computing engine. Outputs from the engine are both representations of the equations and the metadata needed to create visualizations of what they do.