Using Numpy to Fasten the Monte Carlo Simulation
Numpy
Numpy Library
Numpy is not a python standard library, but it is a library that is widely used for scientific computing. It is based on the C language, which makes the NumPy Array, compared with the python list, a better type for our data analysis. Numpy could help us to make our code run much faster and more efficiently.
Numpy Array
We can do calculating operations to the Numpy Array but not the python list. For example, if we have two lists:
list_1 = [x1,y1,z1]
list_2 = [x2,y2,z2]
And what we want to do is to add each element to both lists respectively. The output could be another list, list_sum. Without Numpy Array, what we might do is write a for-loop:
for i in range(3):
list_sum.append(list_1[i] + list_2[i])
With Numpy Array, we can get rid of the for-loop:
list_sum = np.array(list_1) + np.array(list_2)
Avoiding the for-loop can help our code run faster.
Array Shape
We can always check the shape and dimensions of the array by the function:
<array>.shape
<array>.ndim
We can also reshape the array by using the function:
<array>.reshape(dim1=, dim2=, ...)
Broadcasting
The operation could happen between the array that is not in the same shape. This feature is called Broadcasting, where the smaller array is broadcasting across the larger array. We utilized this feature in our homework assignment where we want to have the distance between one particle and all the other particles (799) in the system. We enter the coord_picked (1 x 3) and the coord_new (799 x 3) to the
calculate_distance_np(coord_picked, coord_new, box_length=None)
function, which subtracts the two coordinates and gets the difference between them. Without the broadcasting feature, we might need to clone the coord_picked into a (799 x 3) array to do the operation. But, with the broadcasting feature, we can subtract a (1 x 3) array from a (799 x 3) array.
Logical Comparisons
Sometimes, we might want to keep only part of the elements in the array. We can filter them out by using the logical comparisons in the Numpy Array. For example, in the group assignment, we would like to keep the value that is smaller than the cut-off distance in the dists array.
values = dists[dists < cut_off]
Yeah! We have another feature in the Numpy Array that can help us avoid the for-loop!! That's the other reason why we should use Numpy Array to fasten the code.
Practicing
We optimized the Monte Carlo $\pi$ (pi) calculation and distance calculation during the lecture.
When the sample size comes to 10000000, the Python Standard Library method is 12.746638718658495 times longer than the Numpy method.
Homework Assignment
We modified the following functions in the MC Simulation of total energy by introducing Numpy, to make the code run faster.
calculate_distance_np
calculate_pair_energy_np
calculate_LJ_pairwise
For the function calculate_distace_np, instead of using a for loop to calculate each dimension of coordinates, we can directly subtracting-->np.square-->np.sum-->np.sqrt to get the distance between the coordinates. More importantly, as mentioned above, thanks to the broadcasting, my input can be one coordinate (1x3) and all the other coordinates (799x3), and get the output of a Numpy Array with the distances between the one coordinate and the other coordinates. So that we can avoid the for-loop in the calculate_pair_energy_np function. We used
coords_excluded.mask[picked_particle] = True
function to exclude the picked particle from all the other particles. We then used the logical comparison feature of the Numpy Array to exclude the distances that are longer than the cut-off value.
values = dists[dists < cut_off]
In the calculate_LJ_pairwise function, to make it capable with the np.array type of input, we used np.power instead of math.pow. With the power of the Numpy Array, we have now an array of LJ pairwise energies. We then simply used np.sum to get the total energy. With the optimization we did, we successfully reduced the Monte Carlo Simulation time from 21 seconds to 13 seconds.