Shyam's Blog

Beyond Self-Attention: How a Small Language Model Predicts the Next Token

Feb 1, 2024

A deep dive into the internals of a small transformer model to learn how it turns self-attention calculations into accurate predictions for the next token.

Using a Machine to Learn Machine Learning

May 9, 2023

Deriving the backprop equations for convolutions using a symbolic computing engine. Outputs from the engine are both representations of the equations and the metadata needed to create visualizations of what they do.