About This Mini-Course

tldr: This site is about probability and information theory for machine learning. I will assume knowledge from an introductory probability course.

💡 Content

This mini-course introduces topics in probability and information theory that I believe are important to understand machine learning.

You should get a solid theoretical background behind several techniques used in machine learning: KL divergence, entropy, cross-entropy, max likelihood & max entropy principles, logits, softmax, and loss functions. We will also get intuitions about neural nets by understanding compression, coding theory, and Kolmogorov complexity.

How to read this

Skip stuff you find boring, especially expanding boxes & footnotes. Skip boxes labeled with ⚠️ even more. Follow links, get nerdsniped, and don't feel the need to read this linearly.

This mini-course does not contain many formal theorem statements or proofs since the aim is to convey intuition in an accessible way. The downside is that some discussions are necessarily a bit imprecise. To get to the bottom of the topic, copy-paste the chapter to your favorite LLM and ask for details.

The total length of the text, including all footnotes and expand boxes, is about 5 chapters of Harry Potter and the Philosopher's stone. That's until Harry learns he's a wizard (I don't promise you will feel the same).

What is assumed

I assume probability knowledge after taking a typical introductory course at university.
You should be familiar with the basic language of probability theory: probabilities, distributions, random variables, independence, expectations, variance, and standard deviation.

Bayes' rule is going to be especially important.¹ I also assume that you get the gist of the law of large numbers ² and maybe even the central limit theorem. ³

Knowing example uses of machine learning, statistics, or information theory helps a good bit to appreciate the context.

About This Project

This text was written by Vašek Rozhoň and arose from a joint project with Tom Gavenčiak.

We were thinking about open exposition problems⁴ - Inasmuch as there are open scientific problems where we haven't figure out how the Nature works, there are also open exposition problems where we haven't figure out the best way to convey our knowledge. We had a joint probability seminar at Charles University where we tried to work out how to teach some important, underrated topics. This text tracks some of what we did in that seminar.

The text is motivated by two open problems:

How to adapt our teaching of computer science theory to convey more about neural networks?
How can we use the current capabilities of LLM to teach better, in general?

The first problem motivates the content, the second one the form.

Thanks

Huge thanks to Tom Gavenčiak (see above), as well as to my coauthors Claude, Gemini, and GPT that helped massively to create this mini-course.

Thanks to Richard Hladík, Petr Chmel, Vojta Rozhoň, Robert Šámal, Pepa Tkadlec, and others for feedback.

Feedback

I'd love to hear your feedback! Paste it here [todo], write a comment, or reach out to me directly.