NumPy Arrays

You already know how to store a list of numbers in Python:

scores = [72, 85, 90, 61, 78]

That works fine for small tasks. But imagine you have a dataset with a million rows and you need to multiply every value by two. With a plain Python list, you’d need a loop:

doubled = [x * 2 for x in scores]

That’s slow, and it gets messy fast. NumPy was built to solve exactly this problem. It provides a new kind of list — called an array — that is faster, more memory efficient, and designed for mathematical operations.

With NumPy, the same operation looks like this:

import numpy as np
scores = np.array([72, 85, 90, 61, 78])
doubled = scores * 2

As you see, you don’t need to write a loop here, the code is simplified. This is the core idea behind NumPy and you’ll see it everywhere in data science.

Creating arrays

The most common way to create a NumPy array is to pass a Python list into np.array():

scores = np.array([72, 85, 90, 61, 78])

NumPy also gives you several shortcuts for common cases:

np.zeros(5)          # [0. 0. 0. 0. 0.]
np.ones(5)           # [1. 1. 1. 1. 1.]
np.arange(0, 10, 2)  # [0 2 4 6 8] — like range(), but returns an array
np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1.] — 5 evenly spaced values between 0 and 1

Array shape and dimensions

Arrays can have one dimension (a simple list of values), two dimensions (rows and columns, like a spreadsheet), or more.

# 1D array
a = np.array([1, 2, 3])

# 2D array
b = np.array([[1, 2, 3],
              [4, 5, 6]])

You can check the shape of any array with .shape and the number of dimensions with .ndim:

print(a.shape)  # (3,)
print(b.shape)  # (2, 3) — 2 rows, 3 columns
print(b.ndim)   # 2

Understanding shape is important. Pandas, machine learning libraries, and most data tools will expect your data in a specific shape, and knowing how to read and work with it will save you a lot of confusion later.

Indexing and slicing

Accessing values in a NumPy array works similarly to Python lists:

scores = np.array([72, 85, 90, 61, 78])

scores[0]    # 72 — first element
scores[-1]   # 78 — last element
scores[1:4]  # [85, 90, 61] — a slice

For 2D arrays, you provide two indices — row first, then column:

b = np.array([[1, 2, 3],
              [4, 5, 6]])

b[0, 1]   # 2 — row 0, column 1
b[1, :]   # [4, 5, 6] — entire second row
b[:, 0]   # [1, 4] — entire first column

Vectorized operations

This is the big idea behind NumPy. Instead of looping through values one by one, you apply an operation to the entire array at once:

scores = np.array([72, 85, 90, 61, 78])

scores + 10   # [82, 95, 100, 71, 88]
scores * 2    # [144, 170, 180, 122, 156]
scores / 100  # [0.72, 0.85, 0.9, 0.61, 0.78]

You can also do operations between two arrays of the same shape:

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

a + b  # [5, 7, 9]
a * b  # [4, 10, 18]

NumPy applies the operation element by element. This is called vectorization and it’s what makes NumPy so much faster than plain Python loops.

Aggregate functions

NumPy comes with built-in functions for summarizing an array:

scores = np.array([72, 85, 90, 61, 78])

np.sum(scores)   # 386
np.mean(scores)  # 77.2
np.min(scores)   # 61
np.max(scores)   # 90
np.std(scores)   # standard deviation

These also work on 2D arrays. You can specify an axis to summarize along rows or columns:

b = np.array([[1, 2, 3],
              [4, 5, 6]])

np.sum(b, axis=0)  # [5, 7, 9] — sum of each column
np.sum(b, axis=1)  # [6, 15]   — sum of each row

Boolean masking

One of the most useful things you can do with a NumPy array is filter it based on a condition:

scores = np.array([72, 85, 90, 61, 78])

scores > 75          # [False, True, True, False, True]
scores[scores > 75]  # [85, 90, 78]

The first line produces an array of True and False values. The second uses that to return only the elements where the condition is True. This is called boolean masking.

You can combine conditions using & (and) and | (or):

scores[(scores > 70) & (scores < 90)]  # [72, 85, 78]

Boolean masking carries directly into Pandas, where you’ll use this same pattern to filter rows in a dataset. Getting comfortable with it here will pay off immediately in the next unit.

Putting it all together

Here’s a small example that uses everything from this unit:

import numpy as np

scores = np.array([72, 85, 90, 61, 78, 95, 55, 88])

# Summary
print("Mean:", np.mean(scores))
print("Highest:", np.max(scores))
print("Lowest:", np.min(scores))

# Bump every score up by 5
adjusted = scores + 5
print("Adjusted scores:", adjusted)

# Find everyone who passed (above 70)
passed = scores[scores > 70]
print("Passing scores:", passed)

NumPy arrays are the foundation that Pandas, data visualization, and machine learning libraries are all built on. The concepts you practiced here — shapes, slicing, vectorized math, and boolean masking — will appear constantly throughout the rest of this course.

Coming soon: 0D, 1D, 2D, and 3D arrays.