You’ve spent the last module learning how to work with data — loading it, cleaning it, exploring it, and visualizing it. All of that was about understanding what already happened.
Machine learning takes the next step. Instead of just describing data, you build systems that learn from it — and use what they’ve learned to make predictions about things they’ve never seen before.
A spam filter that gets better at catching junk mail the more it sees. A credit scoring system that predicts whether someone will repay a loan. A recommendation engine that suggests products you’re likely to buy. None of these were explicitly programmed with rules. They were trained on data until they figured out the rules themselves.
That’s the core idea behind machine learning: instead of telling a program exactly what to do, you show it enough examples that it works it out on its own.
In traditional programming, you write the rules:
if email contains "win a prize" → mark as spam
if email contains "click here now" → mark as spam
This works for simple cases. But spam evolves. New patterns emerge that your rules don’t cover, and you have to keep updating them manually. It doesn’t scale.
In machine learning, you flip the process:
You’re not writing the rules. You’re showing the system enough data that it learns the rules on its own.
Machine learning problems fall into three broad categories depending on what kind of data you have and what you’re trying to do.
Supervised learning You have labelled data — examples where you already know the right answer. The algorithm learns from those examples and predicts answers for new ones.
Examples:
This is the most common type of ML and where most of this module lives.
Unsupervised learning You have data but no labels. The algorithm looks for structure and patterns on its own without being told what to find.
Examples:
Reinforcement learning An agent learns by taking actions in an environment and receiving rewards or penalties. It figures out the best strategy through trial and error.
Examples:
Reinforcement learning is the most complex of the three and won’t be covered in this module — but it’s worth knowing it exists.
Since most of this module focuses on supervised learning, it’s worth understanding the two flavours it comes in:
Regression — predicting a number
The output is a continuous value somewhere on a scale.
Classification — predicting a category
The output is one of a fixed set of labels.
The algorithms you’ll learn in this module — linear regression, logistic regression, decision trees, random forests — map directly onto these two types. Some are built for regression, some for classification, and some can do both.
The word model gets used constantly in machine learning. It’s worth being precise about what it means.
A model is the output of training an algorithm on data. It’s the thing that has learned the patterns and can now make predictions.
Think of it like this:
Once a model is trained you can use it over and over on new data without retraining. The spam filter doesn’t re-read every email it was ever shown every time a new message arrives. It uses the model — the patterns it already learned — to make a fast prediction.
Machine learning sits within a broader landscape of related fields that often get used interchangeably but mean different things:
Artificial Intelligence (AI) — the broadest term. Any technique that allows machines to simulate human intelligence. Machine learning is one approach to AI, but not the only one.
Machine Learning (ML) — systems that learn from data. A subset of AI.
Deep Learning — a subset of ML that uses neural networks with many layers. Powers most modern AI breakthroughs — image recognition, language models, speech synthesis.
Data Science — the practice of extracting insights from data. Overlaps heavily with ML but the emphasis is on understanding and communicating findings rather than building predictive systems.
You came through data science to get here. Machine learning is the natural next step — taking what you know about data and using it to build systems that learn.
Machine learning is a large field and this module covers the foundational layer — the algorithms and concepts that everything more advanced is built on. By the end you’ll have trained real models, evaluated their performance, and understood what’s happening under the hood.
The best way to learn it is to build things. So that’s exactly what we’ll do.