How LLMs actually learn: an interactive walkthrough

A few weeks ago, Andrej Karpathy published microgpt — a complete GPT model in about 200 lines of pure Python with no dependencies. It’s the latest in his series of “micro” projects that distil complex ideas down to something you can actually read and understand. It went straight to the top of Hacker News, as you’d expect.

In the comments, a user called growingswe shared an interactive blog post they’d built to visualise the code. It’s excellent — step-by-step breakdowns of the tokeniser, attention mechanism, and training loop, with live charts and clickable diagrams throughout. HN moderator dang put it in the second-chance pool and it picked up 309 points.

That got me thinking.

Why I wanted to go deeper

I studied computer science and AI at university about 20 years ago. I know the fundamentals — or at least I did. But it’s been a while since I properly engaged with the maths. I can use LLMs effectively, I can integrate them into systems, but if you’d asked me to explain exactly what cross-entropy loss is and why we use it instead of something simpler — I’d have had to be honest and say I didn’t really remember.

For a while now I’ve wanted to revisit the machinery of LLMs properly, with the goal of being able to explain the entire pipeline from raw text to generated output. Not just conceptually, but mathematically. My thinking: if I deeply understand how these models work at every level, I’m better placed to innovate — to see possibilities that aren’t obvious when you’re working at a higher level of abstraction.

So I started asking ChatGPT some questions. After a lengthy conversation working through the concepts, I realised I could build an interactive tool myself — both to solidify my own understanding and to help anyone else who wants the same kind of refresher.

The result is microLoss.

What it covers

microLoss is a 21-step interactive tutorial that walks through the complete prediction-to-learning pipeline. It uses a toy vocabulary of just four tokens — the, cat, ate, fish — so that every concept stays visually tractable. Given the context “the cat ate ___”, predict the next token.

It starts with tokenisation, works through the forward pass of a tiny 864-parameter transformer, then into the core of it: how logits become probabilities via softmax, how cross-entropy loss measures prediction quality, and how backpropagation computes the gradients that improve the model. There are interactive sliders, live charts, and a small autograd engine that animates the chain rule step by step. It finishes with temperature scaling, attention, and autoregressive text generation.

The whole thing runs in the browser and takes 20–30 minutes. Have a go.

Build something to learn something

The thing I’d most want to pass on is this: the process of building an interactive tool like this is itself the best way to learn the material. You can’t build a working visualisation of backpropagation without actually understanding the chain rule. You can’t wire up a softmax demo without confronting the maths. The gaps in your knowledge become immediately obvious when you try to make something that works.

I’d encourage anyone who wants to understand LLMs at a deeper level to try something similar. Pick a concept, build a small interactive demo, and see where you get stuck. That’s where the real learning happens.

The source code for microLoss is public and open source — Vue 3, TypeScript, and ECharts for the visualisations. Fork it, break it apart, build your own version.

If you want to talk about any of this, or about how AI works for your business at a more practical level — book a call.

How LLMs actually learn: an interactive walkthrough

Why I wanted to go deeper

What it covers

Build something to learn something

More from the blog

Andrew Yang on the end of the office

Counselors: multi-agent code review in one CLI

An AI agent negotiated $4,200 off a car

Enjoyed this post?

Want to talk about AI for your business?