NUMPY

Basics


Create an Array

There are a number of ways to create arrays from scratch or from existing data. They can be created from things like a csv file or from an existing dataframe. If creating one from scratch there are methods to create them with zero or one values or a specific value of your choosing. There is also a method to create an array with a range of value (for example an array from 10 to 50 with a step of 5). Overall it is a straightforward process.

# create an array of ones with 3 rows and 4 columns
np.ones((3,4))

# create the same array but with zeros
np.zeros((3,4))

# create a 2 by 2 array filled with the value 10
np.full((2,2), 10)

# create an array of values from the range 10 to 50 with a step value of 5
np.arange(10, 50, 5)

# create an array from the range 0 to 2 with 9 evenly spaced values
np.linspace(0, 2, 9)

# create an identity matrix
np.identity(5)
np.eye(5)

Broadcasting

Broadcasting is a way to apply arithmetic operations on different size arrays. More specifically it is the process using a smaller array to apply that arithmetic operation on the larger array multiple times. There a few rules that need to be followed with broadcasting. It will only work if the dimensions match up for the arrays or if the arrays are compatible on all dimensions. Also when a dimension has the value of 1 it is compatible with any other value for that dimension. For example (3,4) and (3,4) sized arrays can be broadcast together and (3,4) and (1,4) sized arrays can also be broadcast together. But a (3,4) and (1,2) cannot be.

# 3 by 5 array of ones
x = np.ones((3,5))
# 3 by 1 array of fours
y = np.full((3,1), 4)

# broadcast works as 3 = 3 and 5 matches with the 1
mult = x * y  # outputs a 3 by 5 array of 4's 


# 2 by 3 by 5 array of 5's 
a = np.full((2,3,4), 5)

# 3 by 4 array of ones
b = np.ones((3,4))

# 1 by 4 array of ones
c = np.ones((1,4))

# both output a 2 by 3 by 4 array of 6's
add = a + b 
add2 = a  + c  

Reshape

There are times when the array that is being used needs to be changed so that it can become compatible with another array or as a specific input size. Reshape is a way of doing this without changing the values inside of the array. The main rule that applies to reshape is that the size must equal the original size of the array after the reshape. Take a 4 by 2 array for example. The size of this array is 8 so the array can be reshaped into (1,8) and (2,4) but not (3,5) as that would make it a size of 15. Another feature is placing a -1 as one of the dimensions is a way of saying that based on the other dimension figure out what this value should be. So if x.reshape(-1, 4) is done on that (4,2) array the value 2 would be what the dimension for -1 is.

x = np.ones((4,2))
y = np.ones((5, 4))

x.reshape(8,)  # creates a row vector out of x
x.reshape(-1, 1)  # creates a column vector out of x

y.reshape(2, -1)  # creates a 2 by 10 array out of y
y.reshape(4, 5)  # creates a 4 by 5 array out of y
y.reshape(6, 4)  # will throw an error because the size is now 24 and not 20

Arithmetic

There are many built in numpy functions that make it easy to do most of the mathematical operations you may come across. The broadcasting section was shown first as these operations rely on it in order to make sure the arrays are capable of doing the specific operation. Below will show a number of the operations but there are more that can be seen on the numpy documentation or with most IDE's typing in "numpy." then hitting tab will show all the available methods.

import numpy as np

x = np.array([[3,1,8], [5,2,9]])
y = np.array([[2,3,1], [1,0,10]])

# operations on a single array
np.sqrt(x)  # square root of each value
np.exp(x)   # e^x of each value
np.log(x)  # log of each value

# operations between two arrays
np.add(x, y)  # returns [[5,4,9], [6,2,19]]
np.subtract(x, y)  # returns [[1,-2,7], [4,2,-1]]
np.multiply(x, y)  # returns [[6,3,8], [5,0,90]]

# This does the dot product between two arrays (matrices) which is matrix multiplication.  
# For this the second dimension must match the first dimension of the second array
np.dot(x, y.reshape(3,2))  # returns [[7, 90], [12, 107]]

Slicing

Slicing numpy arrays works with bracket notation. For each dimension the range of values that are desired are chosen with a syntax of start:end. Boolean operators can also be used inside of the brackets to grab the value that meet the requirements as well.

x = np.array([
       [  53.,    2.,    1.,  171.],
       [  30.,    2.,    1.,  165.],
       [  39.,    4.,    1.,  222.],
       [  26.,    2.,    1.,   74.],
       [   1.,    2.,    1.,  165.]
])

# grab first 2 rows
x[:2]

# grab first 2 rows and columns 2 and 3
x[:2, 1:3]

# returns a vector of values satisfying the condition
x[x > 3]

Interactive Shell

Here is an interactive shell to try out the methods discussed on a dataframe

# this gets executed each time the exercise is initialized import pandas as pd data = pd.DataFrame() # dataframe name is data and only the first dataframe is available # not needed right now