Numpy
The Array
The main object that you throw around in NumPy is called a multidimensional array. Typically you store numbers in it. Each “dimension” is called an axes. For example, a single co-ordinate in 3D space could be stored as:
This has one axis (one dimension).
A 2D rotation transformation could be described with:
This has two axes.
Creating An Array
NumPy arrays can be created with standard Python lists:
If we wanted to create a 2 axis array we could pass in a list of lists:
You can continue to nest lists within lists to create an array with any number of axes (dimensions).
You can also create arrays with special values, such as arrays full of 1’s, arrays full of zero’s, arrays full of random numbers and arrays with 1’s on the diagonal (like identity matrices).
An array of 1’s:
An array with 1’s on the diagonal:
Another really useful way of creating arrays is with np.arange()
. This does exactly what is says, it creates an array with a range of values:
np.linspace()
is another great array creating tool, which creates an array of linearly spaced numbers. The following example creates 5 numbers, linearly spaced from 4.0 to 10.0:
Indexing And Reading/Writing
NumPy arrays have one index per axis, forming a tuple. The indexed are zero-indexed, like all sensible languages/libraries :-D.
Reading from a 1 axis array:
Reading from a 2 axis array:
Writing to a 2-axis array:
Doing Basic Operations With Arrays
NumPy arrays can be added element wise with the +
operator:
They can be multiplied element-wise with the *
operator (this is the same as np.multiply
):
A dot-product of two arrays can be done with np.dot()
:
The cross-product of two arrays can be done with np.cross()
:
Slicing
One of the powerful features of Numpy arrays is the simple and terse slicing syntax (which is built upon Python’s slicing syntax). A slice is when you extract just a portion of the array for further use:
Simple Slicing
Very simple slicing is really the same as indexing:
Extract the first two elements:
Multidimensional Slicing
Some of the real power of slicing is seen when you slice multidimensional arrays (arrays with more than 1 axis).
my_array[:, 0]
tells Numpy to make a slice using all elements from the 1st axis (:
), and only the first element from the second axis (0
). An example of this slice is shown below:
Note that :
is the same as 0:<len - 1>
, and captures all data.
This is commonly used to extract columns from data. For example, if you had the following array of x, y pairs:
You could extract all the x values and all the y values with:
You can also use this to extract “rows” from an array:
Adding A Step
You can also add a step size while slicing Numpy arrays, just as you can when using standard Python slicing. The step size is the third argument in the slice syntax, i.e. start:stop:step
.
Stacking
Stacking is used to join arrays along a new axis. A classic example of this would be if you had two separate arrays of x and y values, and you wanted to to combine them into a single array of (x, y) coordinate pairs.
Concatenation
If you want to start of with an empty array and add values to it in a loop, you can use concatenate, but you have to reshape
your empty array first to make sure it has the right dimensions:
Reading A CSV File
You can use Numpy’s genfromtxt()
method to read in CSV files and convert the data into a Numpy array:
Each line will be a different element in axis 1 of the array. Each CSV value on a line will be a different element in axis 2 of the array.
For example, if your CSV looked like:
The array would look like:
Skip Header Rows
You can skip a header line/row in the CSV file by providing skip_header=1
to genfromtxt()
:
This is good when you have column data names in the first row (which is a common practice), e.g.:
Functions
dot()
Dot product of two arrays.
np.eye()
Returns an array with 1’s on the diagonal and 0’s elsewhere (also known as an identity matrix).
np.ravel()
Returns a flattened array.
Masked Arrays
Numpy has a powerful feature called a masked array. A masked array is essentially composed of two arrays, one containing the data, and another containing a mask (a boolean True
or False
value for each element in the data array).
Retrieving an array value which is masked will result in masked
being returned.
Creating A Masked Array
You can use the np.ma.masked_equal()
function to create a masked array from a standard array, specifying the value you want to use as the mask as the second parameter:
Checking If An Array Is Masked
You can check if an array is masked with np.ma.is_masked()
:
Removing Masked Values From An Array
You can trim down an array and remove all masked values by using the compressed()
function that belong to an ndarray
:
Numpy Warnings And How To Silence Them
Invalid Number
For example, running np.mean([])
using Python 3.7 and a up-to-date version of Numpy will produce the following:
Not how this is not an exception. Numpy prints a warning stating that you are trying to calculate a mean of an empty slice, as well as that there is an invalid value, but continues execution and returns nan
. These warnings are usually helpful in debugging problems with the data you are providing, but in some cases you will want to silence the warnings as the data is as expected.
The safest way to suppress Numpy warnings is to use the np.errstate
context manager, which only changes the warning state while the content is active. However, this has some problems…
However, the problem with this is that it will silence the RuntimeWarning: invalid value encountered in double_scalars
warning, BUT NOT the RuntimeWarning: Mean of empty slice.
warning. A better approach is to use the warnings
module (which is shipped with Python), however this comes at the expense of silencing a larger group of warnings (what if a RuntimeWarning
was emitted here for a different reason?):
If you want to convert all warnings into exceptions, you can use the following code. This is particular dangerous as in applies to all code after this call.
Divide Errors
By default, Numpy will print warnings when you attempt to divide by 0. For example, the following code:
will cause Numpy to print:
These warnings can be temporarily silenced with the with np.errstate(divide='ignore')
context manager: