# Numpy

Learning Objectives:
* Students will gain understanding of the motivation for numpy ndarray objects as a computationally efficient alternative to Python lists.
* Students will practice basic indexing, slicing, concatenating, and splitting operations on numpy ndarrays.
* Students will practice basic application of universal functions, aggregation functions, and broadcasting on ndarray broadcasting.

Readings before class:

* Jake VanderPlas.  [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/):
  * [Chapter 2 - Introduction to Numpy through section "Computation on Arrays: Broadcasting"](https://jakevdp.github.io/PythonDataScienceHandbook/) _If you have time to read further, you'll benefit from seeing some of the masking and sorting computation examples that follow.  In this course, we will primarily use pandas DataFrame objects for computations on data._

Before class:
* Perform the **To-Do** tasks below as you do the reading.  You are encouraged to add code blocks and play with the forms to gain understanding and comfort with them.

In class:
* We will work together on the exercises in section "In Class".

Homework after class:
* Complete the section labeled "Homework" below before the next class when it will be collected.

In [1]:
# Place your imports here.

import numpy as np
import pandas as pd
np.random.seed(0)  # seed for reproducibility


## Motivations

One thing you may have noticed is that there are a number of ways to represent a sequence of numbers:

In [2]:
print('Python list')
# Python list
l = [1, 2, 3]
print(l, type(l))

print('--> numpy ndarray')
# Python list to numpy ndarray
a = np.asarray(l)
print(a, type(a))

print('--> pandas Series')
# numpy ndarray to pandas Series
s = pd.Series(a)
print(s, type(s))

print('--> pandas DataFrame')
# pandas Series to pandas DataFrame
df = pd.DataFrame(s)
print(df, type(df))

print('--> pandas Series')
# pandas DataFrame to pandas Series
s = df[0]
print(s, type(s))

print('--> numpy ndarray')
# pandas Series to numpy ndarray
a = np.asarray(s)
print(a, type(a))

print('--> Python list')
# numpy ndarray to Python list
l = a.tolist()
print(l, type(l))

Python list
[1, 2, 3] <class 'list'>
--> numpy ndarray
[1 2 3] <class 'numpy.ndarray'>
--> pandas Series
0    1
1    2
2    3
dtype: int64 <class 'pandas.core.series.Series'>
--> pandas DataFrame
   0
0  1
1  2
2  3 <class 'pandas.core.frame.DataFrame'>
--> pandas Series
0    1
1    2
2    3
Name: 0, dtype: int64 <class 'pandas.core.series.Series'>
--> numpy ndarray
[1 2 3] <class 'numpy.ndarray'>
--> Python list
[1, 2, 3] <class 'list'>


In this chapter and the next of VanderPlas' text, we will come to a better understanding of why there is a distinctive benefit to having these forms.  The short answer is that Python list versatility trades off computational efficiency.  The numpy library gives us higher performance multidimensional arrays with efficient operations over homogenous data.  Pandas builds on top of numpy ndarrays to provide support for tabular data and versatile table operations.  Here we summarize the motivation for numpy.

Python is a dynamically typed language, meaning that the type of a variable is determined dynamically according to what value is currently assigned to it.  This means that each value is a reference to an object that contains not only the data, but the data type, size, number of variables that reference it (the "reference count", so that the memory can be reclaimed or "garbage collected" when the count goes to zero), all in addition to the data itself!  This makes iterative operations through lists much slower than in other languages like C, C++ and Java.

Numpy allows a programmer to create an array of a single type of data so that there is informational overhead only for the list itself, and data is stored as a list of just the data itself.  A picture of this different is provided by VanderPlas:

![ndarray versus Python list](https://jakevdp.github.io/PythonDataScienceHandbook/figures/array_vs_list.png)

Much of the reading is how to perform various tasks.  The best way to learn is through practiced application.  Before class, in class, and after class in homework, you will exercise the core skills of the reading.  First, however, are questions best answered by well-formed web search queries.

**To-Do: Look up and perform the following additional conversions.**

In [3]:
l = [1, 2, 3]

# Convert Python list l as directly as possible to a pandas Series s.



# Convert pandas Series s as directly as possible to a Python list.



0    1
1    2
2    3
dtype: int64 <class 'pandas.core.series.Series'>
[1, 2, 3] <class 'list'>


### Creating Arrays from Scratch

**To-Do: Initialize the following numpy arrays in the simplest way possible and print the result to verify correctness.**

In [4]:
# Create and print numpy ndarray of 20 integer zeros.



# Create and print an array filled with -24, -21, -18, ..., 18, 21, 24.



# Create and print a 4-by-4 array of normally distributed random values with mean 10 and standard deviation 5.



# Create and print an uninitialized array of 20 integers.



[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[-24 -21 -18 -15 -12  -9  -6  -3   0   3   6   9  12  15  18  21  24]
[[18.82026173 12.00078604 14.89368992 21.204466  ]
 [19.33778995  5.1136106  14.75044209  9.24321396]
 [ 9.48390574 12.05299251 10.72021786 17.27136753]
 [13.80518863 10.60837508 12.21931616 11.66837164]]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]


### Numpy Array Attributes

**To-Do:**

In [5]:
# Create and print a 2-by-3-by-4 array filled with 1.23.



# Use members of that array to print the number of dimensions, shape (size of each dimension), size (total count of values), and type.



 x ndim: 3
x shape: (2, 3, 4)
 x size: 24
x dtype: float64


### No-copy and copy views

**To-Do:**

In [6]:
a = np.arange(1, 17).reshape((4, 4))
print(a)

# Create a copy 'c' of a view of the middle two columns of array 'c'.



# Assign the first row, first column of 'c' to value 42.  Print 'c' and 'a' to show that the change to copy 'c' didn't affect original 'a'.



# Create a non-copy 'nc' of a view of the middle two rows of array 'a'.



# Use negative indexing to assign the last row and column of 'nc' to 0.  Print 'nc' and 'a' to show that the change to no-copy slice 'nc' affected original 'a'.



[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]
[[42  3]
 [ 6  7]
 [10 11]
 [14 15]]
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]
[[ 5  6  7  8]
 [ 9 10 11  0]]
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11  0]
 [13 14 15 16]]


### Array Splitting

**To-Do:**

In [7]:
a = np.arange(0, 10)
print(a)
a2 = np.arange(1, 17).reshape((4, 4))
print(a2)

# Assign arrays 'b', 'c', and 'd' to be a split of 'a' at indices 3 and 7.  Print 'b', 'c', and 'd'.



# Print the result of splitting a2 into its first column and last three columns.



# Print the result of splitting a2 into its first three rows and its last row.




[0 1 2 3 4 5 6 7 8 9]
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]
[0 1 2] [3 4 5 6] [7 8 9]
[[ 1]
 [ 5]
 [ 9]
 [13]]
[[ 2  3  4]
 [ 6  7  8]
 [10 11 12]
 [14 15 16]]
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
[[13 14 15 16]]


### Aggregation

**To-Do:**

In [8]:
a = np.array([32, 8, 2])

# Use numpy aggregation to print the sum of the values of 'a'.


# Use numpy aggregation to print the accumulated values that were partial sums on the way to the previous result ([32 40 42]).


# Use numpy aggregation to print the product of the values of 'a'.



42
[32 40 42]
512


# In Class

First, check your pre-class work above with each other.

Then, work together to complete the following exercises.

Look up and perform the following additional conversions.

In [9]:
a = np.array([1, 2, 3])

# Convert numpy ndarray a as directly as possible to a pandas Dataframe df.  Print the result and the type.



# Convert pandas Dataframe df as directly as possible to a numpy ndarray a.  Print the result and the type.



   0
0  1
1  2
2  3 <class 'pandas.core.frame.DataFrame'>
[1 2 3] <class 'numpy.ndarray'>


### Creating Arrays from Scratch

In [10]:
# Create and print numpy ndarray of 15 floating-point ones.



# Create and print an array of 10 values evenly spaced between -5 and 5



# Create and print a 3-by-11 array of random integers in from -1 to 1 inclusive.



[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[-5.         -3.88888889 -2.77777778 -1.66666667 -0.55555556  0.55555556
  1.66666667  2.77777778  3.88888889  5.        ]
[[ 1 -1  0  0  1 -1  0  0  0 -1  1]
 [-1  1  1 -1  1 -1 -1 -1  0  0  1]
 [-1 -1  0 -1  0  1  1 -1  0  0  0]]


### Array Indexing: Accessing Single Elements

In [11]:
# Create and print a 4-by-4 numpy array with values 1 through 16 left-to-right, top-to-bottom.



# Change the entry in the second row and third column to be 20.  Print the array.



# Use negative indexing to set the entry in the second to last row and second to last column to be 0.  Print the array.



[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]
[[ 1  2  3  4]
 [ 5  6 20  8]
 [ 9 10 11 12]
 [13 14 15 16]]
[[ 1  2  3  4]
 [ 5  6 20  8]
 [ 9 10  0 12]
 [13 14 15 16]]


### Array slicing: One-dimensional subarrays

In [12]:
# Create and print an array with integer values 0 through 9 using function arange.



# Use array slicing to slice subrange indices 5 through 7 inclusive and print the slice.



# Use array slicing to slice all even index values and print the slice.



# Use a negative step with slicing to print the array reversed.  Omit any slicing values that you can.



[0 1 2 3 4 5 6 7 8 9]
[5 6 7]
[0 2 4 6 8]
[9 8 7 6 5 4 3 2 1 0]


### Reshaping Arrays

In [13]:
a = np.arange(1, 17).reshape((4, 4))
print(a)

# Reshape array 'a' to have 2 rows and 8 columns.  Print the result.



# Create a copy 'c' of the first row of reshaped 'a' and reshape it to be a single column.  Print the result.



# Do the same with copy 'c2', but using newaxis instead of reshape.




[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]
[[ 1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16]]
[[1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]]
[[1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]]


### UFuncs: Universal Functions

For each of the following, apply universal functions to perform operations efficiently.

In [14]:
a = np.arange(1, 4)
print(a)

# Print the array of array 'a' value reciprocals.



# Print the array formed by adding array 'a' to itself.



# Print the array of 5 to the power of each value of 'a'.




[1 2 3]
[1.         0.5        0.33333333]
[2 4 6]
[  5  25 125]


### Numpy Aggregation Functions

For each of the following, apply Numpy's fast built-in aggregation functions.

In [15]:
np.random.seed(0)
a = np.arange(0,16)
np.random.shuffle(a)
a = a.reshape((4, 4))
print(a)

# Use aggregation to print the maximum value of 'a', the maximum values of each row, and the maximum values of each column.



# Use aggregation to print the row-based index of the minimum value of 'a'.  The row-based index is the index if we reshaped the array into a single row.



# Use aggregation to print the mean of the values of 'a'.



[[ 1  6  8  9]
 [13  4  2 14]
 [10  7 15 11]
 [ 3  0  5 12]]
15
[ 9 14 15 12]
[13  7 15 14]
13
7.5


# Homework

(0) Complete the in-class exercises.  (This may be done with others beyond your assigned pairs.)

(1) Look up and perform the following additional conversions.

In [16]:
l = [1, 2, 3]

# Convert Python list l as directly as possible to a pandas Dataframe df.  Print the result and the type.



# Convert pandas Dataframe df as directly as possible to a Python list.  Print the result and the type.



(2) Creating Arrays from Scratch

In [17]:
# Create and print a 4 row, 2 column numpy array filled with the integer value 42.



# Create a 5-by-5 array of uniform random numbers in the range [0, 1).



# Create and print a 5-by-5 identity matrix.



(3) Array slicing: Multidimensional Arrays

In [18]:
a = np.arange(1, 17).reshape((4, 4))
print(a)

# Use array slicing to print the slice of array 'a' with values:
#[[ 6  7]
# [10 11]]



# Use array slicing to print the slice of array 'a' with values:
#[[ 5  7]
# [13 15]]



# Combine indexing and slicing to print the second row of array 'a':



# Combine indexing and slicing to print the third column of array 'a':



[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]


(4) Array concatenation

In [19]:
a = np.arange(1, 17).reshape((4, 4))
print(a)

# Print the horizontal (row) concatenation of 'a' with 'a'.



# Print the vertical (column) concatenation of 'a' with 'a'.


[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]


(5) UFuncs: Universal Functions

For each of the following, apply universal functions to perform operations efficiently.

In [20]:
a = np.arange(-0.5, 0.5, .25)
print(a)

# Print the array with absolute values of 'a' values.


# Print the array with the cosines of 'a' values.


# Print the array with the inverse tangents of 'a' values.


# Print the array with 10 raised to the powers of 'a' values.



[-0.5  -0.25  0.    0.25]


(6) Broadcasting

Use broadcasting to perform the following operation:

In [21]:
a = np.arange(1, 10)
print(a)

# Use broadcasting with 'a' and a column vector of 'a' to print the entries of a multiplication table with 9 rows and 9 columns showing products 1*1 (upper-left) through 9*9 (lower-right).



[1 2 3 4 5 6 7 8 9]


(end of homework)