Using NumPy Arrays to Calculate Pi
Overview
Teaching: 30 min
Exercises: 10 minQuestions
What are the differences between numpy arrays and lists?
Objectives
Be able to name the differences between Python lists and numpy arrays.
Understand the idea of broadcasting.
Lesson Contents
Rewriting our calculation of pi using NumPy
We will now return to our calculation of \(\pi\). Instead of using the Python Standard Library functions, we will examine how this calculation could have been made much simpler using the NumPy library.
Recall our former code for calculating \(\pi\)
n_samples = 100
num_inside = 0
for i in range(n_samples):
# Generate a random point between 0 and 1 for x.
x = random.random()
# Generate a random point between 0 and 1 for y.
y = random.random()
# Calculate the radius of this point
r = math.sqrt(x ** 2 + y ** 2)
if r < 1:
num_inside += 1
pi = 4 * (num_inside) / n_samples
To start using numpy, we will first import it.
import numpy as np
We set the same number of samples:
n_samples = 100
Next, we generate 100 random points. In our previous code, we did this using the random.random
function in a for
loop.
NumPy also has a random.random
function. You can try it out:
np.random.random()
0.627385792598836
When no argument is given to np.random.random
it behaves exactly as we are used to random.random
. It gives us one random number. However, you can also give the function a size to generate an array of random numbers.
For us, we would like a random array that is size n_samples, 2
.
random_numbers = np.random.random((n_samples, 2))
If you check this variable, you will see it is an array with 100 rows and 2 columns.
The next thing we need to do is to square and sum the x and y values. We can square all the values at once because of element-wise operations:
r_squared = random_numbers ** 2
Next, we want to sum all of the rows of r_squared. We will want to use the numpy function sum
which we learned about in the last module.
Executing
np.sum(r_squared)
will give us the value of the sum of all the elements. We want a sum of the rows, so we will want to use the axis
arugment.
To figure out the axis we want, check the shape of r_squared
r_squared.shape
(100, 2)
If we want the sum of the rows, we should use axis=1
. We expect that this sum should have 100 values, so check this after your calculation:
vals = np.sum(r_squared, axis=1)
vals.shape
(100, )
Next, we can easily count the number of points inside the circle using logical comparison with the numpy array.
points_inside = vals < 1
num_inside = np.sum(points_inside)
Finally, we estimate \(\pi\)
pi = 4 * num_inside / n_samples
Final code:
import math
import numpy as np
# Start by using 100 samples.
n_samples = 100
random_numbers = np.random.random(size=(n_samples,2))
vals = np.sum(random_numbers**2, axis=1)
vals = np.sqrt(vals)
num_inside = np.sum(vals < 1)
pi = 4 * num_inside / n_samples
Comparing Performace : NumPy vs Python Standard Library
Let’s try timing how long it takes to calculate \(\pi\) with 10 million samples for each method:
import time
import random
start = time.time()
# Start by using 100 samples.
n_samples = 1000000
num_inside = 0
for i in range(n_samples):
# Generate a random point between 0 and 1 for x.
x = random.random()
# Generate a random point between 0 and 1 for y.
y = random.random()
## Calculate the radius of this point.
r = math.sqrt(x ** 2 + y ** 2)
# if this value is less than 1, the point is inside the circle.
if r < 1:
num_inside += 1
4 * num_inside / n_samples
end = time.time()
elapsed_time = end - start
print(elapsed_time)
0.6184561252593994
Do the same thing with the method which uses NumPy:
import time
start = time.time()
n_samples = 10000000
random_numbers = np.random.random(size=(n_samples,2))
vals = np.sum(random_numbers**2, axis=1)
num_inside = np.sum(vals < 1)
pi = 4 * num_inside / n_samples
end = time.time()
elapsed_time = end - start
print(elapsed_time)
0.059281349182128906
We can see that our NumPy version is about 10 times faster than the one written in pure Python. This is a huge advantage to using NumPy - it can make your Python code much faster.
In our case, we were only waiting 5 seconds for our first calculation, so the pure Python version may not bother us that much. However, there are some cases (such as our MC simulation) which become very slow using pure Python for large systems.
For your homework, your task (it’s a bonus), is to rewrite your MC simulation to use NumPy and to test the speed up.
See the description on the next page for more information!
Key Points
NumPy arrays which are the same size use element-wise operations when added or subtracted.
NumPy uses something called broadcasting for arrays which are not the same size to allow arrays to be added or multiplied.
NumPy has extensive documentation online - you should check this out if you need to do a computation.