pythonnumpy
Ben Gorman

Ben Gorman

Life's a garden. Dig it.

where()

You can use NumPy's where() function as a vectorized form of "if array element meets condition, then x else y".

For example, given a 2-d array foo

foo = np.array([
    [1,2,3],
    [4,5,6]
])

We can create a corresponding array bar which displays "cat" where foo is even and "dog" where foo is odd.

np.where(foo % 2 == 0, 'cat', 'dog')
array([['dog', 'cat', 'dog'],
       ['cat', 'dog', 'cat']], dtype='<U3')

Math Functions

sum()

Consider this 2-d array, foo.

foo = np.array(
    [[5.0, 2.0, 9.0],
     [1.0, 0.0, 2.0],
     [1.0, 7.0, 8.0]]
)

There are numerous ways to take its sum with the sum() function.

Sum all the values of foo

np.sum(foo)  # 35.0

Sum across axis 0 (column sums)

np.sum(foo, axis=0)
# array([ 7.,  9., 19.])

Sum across axis 1 (row sums)

np.sum(foo, axis=1)
# array([16.,  3., 16.])

If foo contains NaNs, sum() returns NaN

foo[0, 0] = np.nan
np.sum(foo)  # nan

There are numerous ways to exclude NaNs or treat them as 0s.

np.sum(foo, where = ~np.isnan(foo))  # 30.0

Here we use the where parameter, telling the sum() function to only include elements where ~np.isnan(foo) evaluates to True;

np.sum(np.nan_to_num(foo))  # 30.0

Here we use the nan_to_num() function, which converts NaNs to 0 (by default).

np.nansum(foo)  # 30.0

Here we use the nansum() function which treats NaNs as 0s.

Other Math Functions

Unsurprisingly, there are numerous math functions in NumPy including minimum(), maximum(), mean(), exp(), log(), floor(), and ceil() among others.

Truth Value Testing

all()

You can use the all() function to check if all the values in an array meet some condition.

foo = np.array([
    [np.nan,    4.4],
    [   1.0,    3.2],
    [np.nan, np.nan],
    [   0.1, np.nan]
])

Check if all the values are NaN

np.all(np.isnan(foo)) 
# False

Check if all the values in each row are NaN

np.all(np.isnan(foo), axis=1)
# array([False, False,  True, False])

Check if all the values in each column are NaN

np.all(np.isnan(foo), axis=0)
# array([False, False])

any()

You can use the any() function to check if any of the values in an array meet some condition.

foo = np.array([
    [np.nan,    4.4],
    [   1.0,    3.2],
    [np.nan, np.nan],
    [   0.1, np.nan]
])

Check if any value is NaN

np.any(np.isnan(foo)) 
# True

Check if any value in each row is NaN

np.any(np.isnan(foo), axis=1)
array([ True, False,  True,  True])

Check if any value in each column is NaN

np.any(np.isnan(foo), axis=0)
array([ True,  True])

concatenate()

You can use the concatenate() function to combine two or more arrays.

roux = np.zeros(shape = (3,2))
print(roux)
[[0. 0.]
 [0. 0.]
 [0. 0.]]
 
gumbo = np.ones(shape = (2,2))
print(gumbo)
[[1. 1.]
 [1. 1.]]

Concatenate roux with a couple copies of itself row-wise.

np.concatenate((roux, roux, roux), axis=0)
# array([[0., 0.],
#        [0., 0.],
#        [0., 0.],
#        [0., 0.],
#        [0., 0.],
#        [0., 0.],
#        [0., 0.],
#        [0., 0.],
#        [0., 0.]])

Concatenate roux with a couple copies of itself column-wise.

np.concatenate((roux, roux, roux), axis=1)
# array([[0., 0., 0., 0., 0., 0.],
#        [0., 0., 0., 0., 0., 0.],
#        [0., 0., 0., 0., 0., 0.]])

Concatenate roux and gumbo row-wise.

np.concatenate((roux, gumbo), axis=0)
# array([[0., 0.],
#        [0., 0.],
#        [0., 0.],
#        [1., 1.],
#        [1., 1.]])

When you concatenate arrays, they must have the same exact shape excluding the axis along which you’re concatenating. For example, if we try to concatenate roux and gumbo column-wise, NumPy throws an error.

np.concatenate((roux, gumbo), axis = 1)
# ValueError: (1)
  1. ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 3 and the array at index 1 has size 2

Stacking

You can use vstack(), hstack(), and stack() to combine arrays.

vstack()

vstack() takes one argument - a sequence of arrays. You could describe its algorithm in pseudocode as

for each array in the sequence:
  if the array is 1-d:
    promote the array to 2-d by giving it a new front axis
  if every array has the same shape:
    concatenate the arrays along axis 0
  else:
    throw an error

Visually, you could imagine vstack() as vertically stacking 1-d or 2-d arrays.

Examples

foo = np.array(['a', 'b'])
bar = np.array(['c', 'd'])
baz = np.array([['e', 'f']])
bingo = np.array([['g', 'h', 'i']])
np.vstack((foo, bar))
# [['a' 'b']
#  ['c' 'd']]
np.vstack((foo, bar, baz))
# [['a' 'b']
#  ['c' 'd']
#  ['e' 'f']]
np.vstack((baz, bingo))
# ValueError: (1)
  1. ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 2 and the array at index 1 has size 3

hstack()

hstack() takes one argument - a sequence of arrays. You could describe its algorithm in pseudocode as

if every array in the sequence is 1-d:
  concatenate the arrays along axis 0
else:
  if every array has the same shape excluding axis 1:
    concatenate arrays along axis 1
  else:
    throw an error

Visually, you could imagine hstack() as horizontally stacking 1-d or 2-d arrays.

Examples

foo = np.array(['a', 'b'])
bar = np.array(['c', 'd'])
baz = np.array([['e', 'f']])
bingo = np.array([['g', 'h', 'i']])
bongo = np.array(
    [['j', 'k'],
     ['l', 'm']]
)
np.hstack((foo, bar))
# ['a' 'b' 'c' 'd']
np.hstack((baz, bingo))
# [['e' 'f' 'g' 'h' 'i']]
np.hstack((foo, bingo))
# ValueError: (1)
  1. ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s)
np.hstack((bingo, bongo))
# ValueError: (1)
  1. ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 1 and the array at index 1 has size 2

stack()

stack() takes two arguments:

  1. a sequence of arrays to combine
  2. axis which tells stack() to create a new axis along which to combine the arrays.

You could describe its algorithm in pseudocode as

if every array is the same shape and axis is less than or equal to the dimensionality of the arrays:
  for each array:
    insert a new axis where specified
  concatenate the arrays along the new axis
else:
  throw an error.

Examples

foo = np.array(['a', 'b'])
bar = np.array(['c', 'd'])
# np.stack((foo, bar), axis=0)
# array([['a', 'b'],
#        ['c', 'd']], dtype='<U1')
np.stack((foo, bar), axis=1)
# array([['a', 'c'],
#        ['b', 'd']], dtype='<U1')
np.stack((foo, bar), axis=2)
# numpy.AxisError: (1)
  1. numpy.AxisError: axis 2 is out of bounds for array of dimension 2
np.stack((foo, bar), axis=-1)
# array([['a', 'c'],
#        ['b', 'd']], dtype='<U1')

Sorting

You can use numpy’s sort() function to sort the elements of an array.

sort() takes three primary parameters:

  1. a: the array you want to sort
  2. axis: the axis along which to sort. (The default, -1, sorts along the last axis.)
  3. kind: the kind of sort you want numpy to implement. By default, numpy implements quicksort.

For example, here we make a 1-d array, foo, and then sort it in ascending order.

foo = np.array([1, 7, 3, 9, 0, 9, 1])
np.sort(foo)
# array([0, 1, 1, 3, 7, 9, 9])

Note that the original array remains unchanged.

foo
# array([1, 7, 3, 9, 0, 9, 1])

If you want to sort the values of foo in place, use the sort method of the array object.

foo.sort()
foo
# array([0, 1, 1, 3, 7, 9, 9])

Sort with NaN

If you have an array with NaN values, sort() pushes them to the end of the array.

bar = np.array([5, np.nan, 3, 11])
np.sort(bar)
# array([ 3.,  5., 11., nan])

Sort In Descending Order (Reverse Sort)

Unfortunately NumPy doesn't have a direct way of sorting arrays in descending order. However, there are multiple ways to accomplish this.

  1. Sort the array in ascending order and then reverse the result.
bar = np.array([5, np.nan, 3, 11])
np.sort(bar)[::-1]
# array([nan, 11.,  5.,  3.])
  1. Negate the array’s values, sort those in ascending order, and then negate that result.
bar = np.array([5, np.nan, 3, 11])
-np.sort(-bar)
# array([11.,  5.,  3., nan])

The main difference between these techniques is that the first method pushes NaNs to the front of the array and the second method pushes NaNs to the back. Also, the second method won’t work on strings since you can’t negate a string.

Sorting A Multidimensional Array

What if you wanted to sort a multidimensional array like this?

boo = np.array([
    [55, 10, 12],
    [20, 0, 33],
    [55, 92, 3]
])

In this case, you can use the axis parameter of the sort() function to specify which axis to sort along.

Sort each column of a 2-d array

np.sort(boo, axis=0) # sort along the row axis
# array([[20,  0,  3],
#        [55, 10, 12],
#        [55, 92, 33]])

Sort each row of a 2-d array

np.sort(boo, axis=1) # sort along the column axis
# array([[10, 12, 55],
#        [ 0, 20, 33],
#        [ 3, 55, 92]])

Sort the last axis of an array

np.sort(boo, axis=-1) # sort along the last axis (1)
# array([[10, 12, 55],
#        [ 0, 20, 33],
#        [ 3, 55, 92]])
  1. Since boo is a 2-d array, the last axis, 1, is the column axis. Thus np.sort(boo, axis=-1) is equivalent to np.sort(boo, axis=1).

Tip

When we talk about sorting along an axis, each element's position in the array remains fixed except for that axis. For example, observe the 20 in boo. When we sort along the row axis (axis 0), only its row coordinate changes (from (1,0) to (0,0)). When we sort along the column axis (axis 1), only its column coordinate changes (from (1,0) to (1,1)). That's why sorting along axis 0 does column sorts in a 2-d array and sorting along axis 1 does row sorts in a 2-d array.

argsort()

argsort() works just like sort(), except it returns an array of indices indicating the position each value of the array would map to in the sorted case.

Example

foo = np.array([3, 0, 10, 5])
np.argsort(foo)
# array([1, 0, 3, 2])

Here, argsort() tells us:

  • the smallest element of foo is at position 1
  • the second smallest element of foo is at position 0
  • the third smallest element of foo is at position 3
  • the fourth smallest element of foo is at position 2

If you used this array to index the original array, you’d get its sorted form (just as if you had called np.sort (foo)).

foo = np.array([3, 0, 10, 5])
idx = np.argsort(foo)
foo[idx]
# array([ 0,  3,  5, 10])

Sort the rows of a 2-d array according to its first column

boo = np.array([
    [55, 10, 12],
    [20, 0, 33],
    [55, 92, 3]
])

If you want to reorder the rows of boo according to the values in its first column, you can plug in the index array [1, 0, 2].

idx = np.array([1, 0, 2])
boo[idx]
# array([[20,  0, 33],
#        [55, 10, 12],
#        [55, 92,  3]])

To create the index array dynamically, simply call argsort() on the first column of boo.

idx = np.argsort(boo[:, 0])
print(idx)
# [1 0 2]
 
boo[idx]
# array([[20,  0, 33],
#        [55, 10, 12],
#        [55, 92,  3]])

Stable Sorting

The previous example raises an important question. If an array has repeated values, how do we guarantee that sorting them won't alter the order they appear in the original array? For example, given boo

boo = np.array([
    [55, 10, 12],
    [20, 0, 33],
    [55, 92, 3]
])

this

boo[[1, 0, 2]]
# array([[20,  0, 33],
#        [55, 10, 12],
#        [55, 92,  3]])

and this

boo[[1, 2, 0]]
# array([[20,  0, 33],
#        [55, 92,  3],
#        [55, 10, 12]])

are both valid sorts of boo along its first column, but only the first array retains the original order of the rows beginning with 55. This is known as a [stable sorting algorithm](https://en.wikipedia. org/wiki/Sorting_algorithm#Stability). By default, np.sort() and np.argsort() don't use a stable sorting algorithm. If you'd like to use a stable sort, set the kind parameter equal to 'stable'.

boo[np.argsort(boo[:, 0], kind='stable')]
# array([[55, 10, 12],
#        [20,  0, 33],
#        [55, 92,  3]])

unique()

You can use the unique() function to get the unique elements of an array.

Example

gar = np.array(['b', 'b', 'a', 'a', 'c', 'c'])
np.unique(gar)
# array(['a', 'b', 'c'], dtype='<U1')

You may have noticed that 'b' appeared first in the input but 'a' appeared first in the output. That's because unique() returns the unique elements in sorted order.

Get unique elements in order of first occurrence

You can use return_index=True to get index of first occurrence of each element in an array.

gar = np.array(['b', 'b', 'a', 'a', 'c', 'c'])
np.unique(gar, return_index=True)
# (array(['a', 'b', 'c'], dtype='<U1'), array([2, 0, 4]))

With return_index=True, numpy returns a tuple containing

  1. the unique elements array
  2. a corresponding array with the index at which each element first occurred in the original array

In the above example 'a' first occurred at index 2 in the original array, b first occurred at index 0, and so on.

If you want to reorder the unique elements in the same order they occurred in the original array, use argsort() the index array and use that to sort the unique elements array.

gar = np.array(['b', 'b', 'a', 'a', 'c', 'c'])
uniques, first_positions = np.unique(gar, return_index=True)
uniques[np.argsort(first_positions)]
# array(['b', 'a', 'c'], dtype='<U1')

unique() with counts

You can use return_counts=True to additionally return the count of each element.

np.unique(gar, return_counts=True)
# (array(['a', 'b', 'c'], dtype='<U1'), array([2, 2, 2]))