Unit 2 of 2

What is an ML workflow?

Updated Jun 2026

In the data science module you learned how to load, clean, and explore data. That work doesn’t stop here — it’s actually the first step of every machine learning project. But once your data is ready, a new set of steps begins.

Every machine learning project follows roughly the same shape:

  1. Prepare the data
  2. Split it into training and test sets
  3. Train a model
  4. Make predictions
  5. Evaluate how well it did

These steps apply whether you’re building a spam filter or a house price predictor. The algorithm in the middle changes. The workflow around it stays the same.

This unit walks through that workflow end to end using a simple example. The goal isn’t to understand the algorithm deeply — that comes in later units. The goal is to recognise the pattern so that when complexity arrives, the structure feels familiar.

Features and labels

Before you can train a model you need to understand how your data is structured from a machine learning perspective.

In supervised learning every row in your dataset has two parts:

  • Features — the input columns. The information the model uses to make a prediction. Also called X.
  • Label — the output column. The thing you’re trying to predict. Also called y.

Here’s a simple student dataset:

study_hours  | attendance | previous_score || passed
-------------|------------|----------------|--------
6            | 90         | 72             || True
2            | 60         | 55             || False
8            | 95         | 88             || True
1            | 45         | 40             || False

The first three columns are features — the information available before we know the outcome. The last column is the label — what we’re trying to predict.

In code you split them like this:

X = df[["study_hours", "attendance", "previous_score"]]  # features
y = df["passed"]                                          # label

This X and y convention is used everywhere in machine learning. You’ll see it in every tutorial, every textbook, and every Scikit-learn example.

The train/test split

Here’s a problem. If you train a model on your entire dataset and then evaluate it on that same dataset, you’re not really testing anything. The model has already seen every example — of course it performs well.

That’s like giving students the exact exam questions during class and then grading them on those same questions. The results tell you nothing about whether they actually learned anything.

The solution is to split your data into two parts before training:

  • Training set — the data the model learns from
  • Test set — data the model has never seen, used only for evaluation

Scikit-learn makes this one line:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

test_size=0.2 means 20% of the data goes to the test set and 80% goes to training. That’s a common default.

random_state=42 makes the split reproducible — every time you run the code you get the same split. Without it the split would be random each run, making your results inconsistent.

After the split you have four things:

  • X_train — training features
  • y_train — training labels
  • X_test — test features
  • y_test — test labels

The model only ever sees X_train and y_train during training. X_test and y_test are locked away until evaluation.

Training a model

Now you’re ready to train. For this unit we’ll use a Decision Tree — don’t worry about how it works internally, that’s covered in a later unit. Right now the focus is the workflow.

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

That’s it. Two lines.

DecisionTreeClassifier() creates the model. .fit() trains it — it reads through X_train and y_train, finds patterns, and stores what it learned internally.

This .fit(X, y) pattern is the same for almost every algorithm in Scikit-learn. Once you’ve seen it once you’ve seen it for all of them.

Making predictions

Once the model is trained you can use it to make predictions on new data:

predictions = model.predict(X_test)
print(predictions)
# [True, False, True, True, False, ...]

You pass in X_test — the features the model hasn’t seen — and it returns a prediction for each row. Notice you only pass in X, not y. The model doesn’t get to see the correct answers. It has to figure them out from the features alone.

Evaluating the model

Now you compare the model’s predictions against the actual answers:

from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, predictions)
print(accuracy)  # e.g. 0.85

An accuracy of 0.85 means the model got 85% of the test set correct.

Accuracy is the most intuitive metric but it has limitations. Imagine a dataset where 95% of students passed. A model that predicts “passed” for everyone would score 95% accuracy without learning anything useful.

That’s why accuracy alone isn’t enough — but it’s the right place to start. Later units will introduce better evaluation tools like confusion matrices, precision, and recall.

Putting it all together

Here’s the complete workflow from start to finish:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load and prepare
df = pd.read_csv("students.csv")
X = df[["study_hours", "attendance", "previous_score"]]
y = df["passed"]

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Predict
predictions = model.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy:.2f}")

Run this on a real dataset and you’ve completed your first machine learning workflow.

What just happened?

Step back and look at what those twenty lines of code actually did. You loaded real data, split it so the model couldn’t cheat, trained an algorithm to find patterns in it, asked it to predict outcomes it had never seen, and measured how often it was right.

That is machine learning. Every project you build from here follows this same shape. The algorithms will get more sophisticated, the evaluation will get more rigorous, and the data will get messier — but the workflow stays the same.

In the next unit you’ll learn your first algorithm properly — Linear Regression — and understand what’s actually happening when you call .fit().