NumPy: the Dark Side and Its Applications¶

numpy-logo

Dboy Liao

Medium GitHub LinkedIn CakeResume

Fork me on GitHub

Taipei.py March 2022¶

pybind11 array and buffer interface: A numpy user’s perspective

GitHub repo

The NumPy ndarray¶

int matrix[3][5];
  • arbitrary shape?
  • flexible reshape?
  • different data type?

no-cpp

matrix = [
    [1, 2, 3, 4, 5],
    [6, 7, 8, 9, 10],
    [11, 12, 13, 14, 15]
]

來寫個 reshape 吧!

def reshape_(in_array: list, new_shape):
    # 要怎麼決定 in_array 的 shape?
    # 要怎麼檢查裡面的資料型態?
    # 要怎麼檢查 new_shape 是合理的?
    ...
matrix = [
    [1, 2, 3, 4],
    [5, 6, 7, 8, 9],
    [10, 11, 12, '13']
]

better-way

In [1]:
import numpy as np

array = np.arange(16, dtype=np.int8).reshape(4, 4).copy()
array
Out[1]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]], dtype=int8)
In [2]:
array.shape
Out[2]:
(4, 4)
In [3]:
array.strides
Out[3]:
(4, 1)
In [4]:
array.data
Out[4]:
<memory at 0x109d8d220>
In [5]:
array.base is None
Out[5]:
True

array-2d

array-2d-flatten

In [6]:
arr_flatten = array.ravel()
arr_flatten
Out[6]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
      dtype=int8)
# Dark Magic 
np.lib.stride_tricks

那我們先來看看用這個 Dark Magic 可以怎麼做一個自己的 reshape

In [7]:
def flat_list(ll, acc=None):
    if acc is None:
        acc = []
    for l in ll:
        if not isinstance(l, list):
            acc.append(l)
        else:
            acc = flat_list(l, acc)
    return acc
In [8]:
flat_list([[1, 2, 3], [4, 5, 6]])
Out[8]:
[1, 2, 3, 4, 5, 6]
In [9]:
flat_list([[1, 2, 3], [4, 5, 6], [1, 2, [3, 4]]])
Out[9]:
[1, 2, 3, 4, 5, 6, 1, 2, 3, 4]
In [10]:
def reshape_(in_array, new_shape):
    flat_array = flat_list(in_array)
    new_strides = []
    acc = 1
    for s in new_shape[::-1]:
        new_strides.insert(0, acc*8)
        acc *= s
    return np.lib.stride_tricks.as_strided(
        flat_array,
        shape=new_shape,
        strides=new_strides
    ).tolist()
In [11]:
reshape_(
    [
        [1, 2, 3, 4, 5],
        [6, 7, 8, 9, 10],
        [11, 12, 13, 14, 15]
    ],
    (5, 3)
)
Out[11]:
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15]]
In [12]:
reshape_(
    [
        [1, 2, 3, 4, 5],
        [6, 7, 8, 9, 10],
        [11, 12, 13, 14, 15]
    ],
    (5, 1, 1, 3)
)
Out[12]:
[[[[1, 2, 3]]],
 [[[4, 5, 6]]],
 [[[7, 8, 9]]],
 [[[10, 11, 12]]],
 [[[13, 14, 15]]]]
In [13]:
reshape_(
    [
        [1, 2, 3, 4, 5],
        [6, 7, 8, 9, 10],
        [11, 12, 13, 14, 15]
    ],
    (5, 1, 3, 2)
)
Out[13]:
[[[[1, 2], [3, 4], [5, 6]]],
 [[[7, 8], [9, 10], [11, 12]]],
 [[[13, 14], [15, 2251799813685248], [0, 0]]],
 [[[0, 4415747056], [21, 140380210676528], [4437478368, 4409815040]]],
 [[[4437378560, 0], [140380210604576, 0], [4294967296, 0]]]]

NumPy is All You Need¶

Nature 2020 Review Paper¶

numpy-nature

source

  • strides, shape and tensor rank
  • indexing
    • basic indexing
    • advanced indexing
  • broadcasting
  • vectorization

Official Doc

numpy-memorize

better-way

array-2d-flatten

In [14]:
print("1 == 4 * 0 + 1 * 1:", 1 == 4*0 + 1*1)
print("6 == 4 * 1 + 1 * 2:", 6 == 4*1 + 1*2)
1 == 4 * 0 + 1 * 1: True
6 == 4 * 1 + 1 * 2: True
In [15]:
print("arr_flatten[1] == array[0, 1]:", arr_flatten[1] == array[0, 1])
print("arr_flatten[6] == array[1, 2]:", arr_flatten[6] == array[1, 2])
arr_flatten[1] == array[0, 1]: True
arr_flatten[6] == array[1, 2]: True
In [16]:
array.strides
Out[16]:
(4, 1)
  • linear offset: the offset of an element in the flattened array
  • $\mathbf{arr}$: a m-dims array
    • strides: $(s_0, s_1, ..., s_{m-1})$
    • shape: $(d_0, d_1, ..., d_{m-1})$
  • $e = \mathbf{arr}[i_0, i_1, ..., i_{m-1}]$, element in $\mathbf{arr}$
    • with linear offset $\text{offset}_e$

Then we have:

$$ \text{offset}_e = \sum\limits_{j=0}^{m-1} s_j \cdot i_j $$

Applications¶

Shared Memory View¶

In [17]:
cube = np.arange(3*3*3).reshape((3, 3, 3)).copy()
cube
Out[17]:
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

array-3d

array-3d-strided

In [18]:
strided_cube = cube[::2, ::2, ::2]
strided_cube
Out[18]:
array([[[ 0,  2],
        [ 6,  8]],

       [[18, 20],
        [24, 26]]])
In [19]:
# strided_cube is a view of cube
cube[0, 0, 0] = 3
strided_cube
Out[19]:
array([[[ 3,  2],
        [ 6,  8]],

       [[18, 20],
        [24, 26]]])
  • linear offset: the offset of an element in the flattened array
  • $\mathbf{arr}$: a m-dims array
    • strides: $(s_0, s_1, ..., s_{m-1})$
    • shape: $(d_0, d_1, ..., d_{m-1})$
  • $e = \mathbf{arr}[i_0, i_1, ..., i_{m-1}]$, element in $\mathbf{arr}$
    • with linear offset $\text{offset}_e$

Then we have:

$$ \text{offset}_e = \sum\limits_{j=0}^{m-1} s_j \cdot i_j $$
In [21]:
cube.ravel()
Out[21]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26])
In [22]:
cube.shape
Out[22]:
(3, 3, 3)
In [23]:
cube.strides # number of bytes
Out[23]:
(72, 24, 8)
  • $\mathbf{cube}$: a 3-dims array
    • strides: $(72, 24, 8)$
    • shape: $(3, 3, 3)$
  • $e = \mathbf{cube}[i_0, i_1, i_2]$, element in $\mathbf{cube}$
    • with linear offset $\text{offset}_e$

Then we have:

$$ \text{offset}_e = 72 \cdot i_0 + 24 \cdot i_1 + 8 \cdot i_2 $$
In [24]:
strided_cube.ravel()
Out[24]:
array([ 0,  2,  6,  8, 18, 20, 24, 26])
In [25]:
strided_cube.shape
Out[25]:
(2, 2, 2)
In [26]:
strided_cube.strides
Out[26]:
(144, 48, 16)
  • $\mathbf{strided\_cube}$: a 3-dims array
    • strides: $(144, 48, 16)$
    • shape: $(2, 2, 2)$
  • $\mathbf{strided\_cube}$$[i_0^{\prime}, i_1^{\prime}, i_2^{\prime}]$
    • with linear offset $\text{offset}_e$

Then we have:

$$ \begin{align*} \text{offset}_e &= 144 \cdot i_0^{\prime} + 48 \cdot i_1^{\prime} + 16 \cdot i_2^{\prime} \\ &= 72 \cdot (2 \cdot i_0^{\prime}) + 24 \cdot (2 \cdot i_1^{\prime}) + 8 \cdot (2 \cdot i_2^{\prime}) \\ &= 72 \cdot i_0 + 24 \cdot i_1 + 8 \cdot i_2 \end{align*} $$

array-3d-strided

In [27]:
strided_cube.data, type(strided_cube.data)
Out[27]:
(<memory at 0x1100ef1f0>, memoryview)

memoryview documentation

In [28]:
import struct

data_bytes = strided_cube.data.tobytes()
for i in range(8):
    elem = struct.unpack(
        'q',
        data_bytes[i*strided_cube.itemsize:(i+1)*strided_cube.itemsize],
    )[0]
    print(elem, end=" ")
0 2 6 8 18 20 24 26 
In [29]:
np.lib.stride_tricks.as_strided(
    cube,
    strides=(144, 48, 16),
    shape=(2, 2, 2)
)
Out[29]:
array([[[ 0,  2],
        [ 6,  8]],

       [[18, 20],
        [24, 26]]])
In [30]:
strided_cube.base is cube
Out[30]:
True
In [31]:
cube.base is None
Out[31]:
True

array-3d-not-strided

In [32]:
# advanced indexing
random_cube = cube[
    [0, 0, 0, 0, 2, 2, 2, 2],
    [0, 0, 2, 2, 0, 0, 2, 2],
    [0, 2, 1, 2, 0, 1, 0, 2]
]
random_cube.reshape((2, 2, 2))
Out[32]:
array([[[ 0,  2],
        [ 7,  8]],

       [[18, 19],
        [24, 26]]])
In [33]:
random_cube.base is None
Out[33]:
True
In [34]:
cube.ravel().base is cube
Out[34]:
True
In [35]:
cube.flatten().base is cube
Out[35]:
False

ok-then-what

Time Series: Sliding Window with Shared-Memory View¶

In [36]:
data = np.arange(20, dtype=np.int8)
data
Out[36]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19], dtype=int8)
In [37]:
windows_ = np.empty_like(data, shape=(16, 5))
for i in range(16):
    windows_[i, :] = data[i:(i+5)]
windows_
Out[37]:
array([[ 0,  1,  2,  3,  4],
       [ 1,  2,  3,  4,  5],
       [ 2,  3,  4,  5,  6],
       [ 3,  4,  5,  6,  7],
       [ 4,  5,  6,  7,  8],
       [ 5,  6,  7,  8,  9],
       [ 6,  7,  8,  9, 10],
       [ 7,  8,  9, 10, 11],
       [ 8,  9, 10, 11, 12],
       [ 9, 10, 11, 12, 13],
       [10, 11, 12, 13, 14],
       [11, 12, 13, 14, 15],
       [12, 13, 14, 15, 16],
       [13, 14, 15, 16, 17],
       [14, 15, 16, 17, 18],
       [15, 16, 17, 18, 19]], dtype=int8)

better-way

In [38]:
windows = np.lib.stride_tricks.as_strided(
    data,
    shape=(16, 5),
    strides=(1, 1)
)
In [39]:
windows
Out[39]:
array([[ 0,  1,  2,  3,  4],
       [ 1,  2,  3,  4,  5],
       [ 2,  3,  4,  5,  6],
       [ 3,  4,  5,  6,  7],
       [ 4,  5,  6,  7,  8],
       [ 5,  6,  7,  8,  9],
       [ 6,  7,  8,  9, 10],
       [ 7,  8,  9, 10, 11],
       [ 8,  9, 10, 11, 12],
       [ 9, 10, 11, 12, 13],
       [10, 11, 12, 13, 14],
       [11, 12, 13, 14, 15],
       [12, 13, 14, 15, 16],
       [13, 14, 15, 16, 17],
       [14, 15, 16, 17, 18],
       [15, 16, 17, 18, 19]], dtype=int8)
In [40]:
X = windows[:, :4]
Y = windows[:, 4]
In [41]:
X
Out[41]:
array([[ 0,  1,  2,  3],
       [ 1,  2,  3,  4],
       [ 2,  3,  4,  5],
       [ 3,  4,  5,  6],
       [ 4,  5,  6,  7],
       [ 5,  6,  7,  8],
       [ 6,  7,  8,  9],
       [ 7,  8,  9, 10],
       [ 8,  9, 10, 11],
       [ 9, 10, 11, 12],
       [10, 11, 12, 13],
       [11, 12, 13, 14],
       [12, 13, 14, 15],
       [13, 14, 15, 16],
       [14, 15, 16, 17],
       [15, 16, 17, 18]], dtype=int8)
In [42]:
Y
Out[42]:
array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
      dtype=int8)
In [43]:
X.base is windows, Y.base is windows
Out[43]:
(True, True)
In [44]:
X_ = windows[:, [0, 1, 2, 3]] # X = windows[:, :4]
In [45]:
X_.base is windows
Out[45]:
False

basic indexing v.s advanced indexing

Nested Loop and Vectorization (Broadcasting)¶

  • given an array a, which of shape (20, 5)
  • given an array b, which of shape (30, 5)
  • given an array c, which of shape (50, 1)

create an array out of shape (20, 30, 50), where out[i, j, k] is given as following

out[i, j, k] = sum([c[k] if a_ > b_ else 0.0 for a_, b_ in zip(a[i], b[j])])
In [46]:
Na, Nb, Nc = 20, 30, 50
a = np.random.rand(Na, 5)
b = np.random.rand(Nb, 5)
c = np.random.rand(Nc, 1)
In [47]:
%%time
out = np.zeros((Na, Nb, Nc), dtype=float)
for i in range(Na):
    for j in range(Nb):
        for k in range(Nc):
            out[i, j, k] = ((a[i] > b[j]) * c[k]).sum()
CPU times: user 198 ms, sys: 28.7 ms, total: 226 ms
Wall time: 199 ms

better-way

In [48]:
%%time
out_ = (
    (
        a[:, np.newaxis, np.newaxis, :] > b[np.newaxis, :, np.newaxis, :]
    ) * c[np.newaxis, np.newaxis, :, :]
).sum(axis=-1)
out_.shape
CPU times: user 2.2 ms, sys: 1.44 ms, total: 3.63 ms
Wall time: 1.93 ms
Out[48]:
(20, 30, 50)
In [49]:
np.allclose(
    out,
    out_
)
Out[49]:
True
In [50]:
print(a.shape, a[:, np.newaxis, np.newaxis, :].shape)
(20, 5) (20, 1, 1, 5)
In [51]:
a[:, np.newaxis, np.newaxis, :].strides
Out[51]:
(40, 0, 0, 8)
out = np.zeros((Na, Nb, Nc), dtype=float)
for i in range(Na):
    for j in range(Nb):
        for k in range(Nc):
            out[i, j, k] = ((a[i] > b[j]) * c[k]).sum()


out_ = (
    (
        a[:, np.newaxis, np.newaxis, :] > b[np.newaxis, :, np.newaxis, :]
    ) * c[np.newaxis, np.newaxis, :, :]
).sum(axis=-1)

Take Home Messages¶

  • the foundation data structure of numpy: ndarray
  • shared-memory view
  • indexing
  • broadcasting

NEVER write nested loops ever again

At any time, think of this picture and this talk when you want to apply nested loops on ndarrays:

better-way

What's Missing in This Talk¶

  • ndarray.flags and more other interesting attributes
  • row-major v.s column-major array
    • the memory layout of the array
    • the impact to the performance
  • sparse array

Learning Resources¶

  • https://github.com/wadetb/tinynumpy
    • pure python, numpy compliant implementation
  • https://github.com/dboyliao/numPY
    • my work
    • try to build a pure python implementation of numpy
    • failed, but valuable experience
  • https://github.com/rougier/numpy-100
    • 100 numpy exercises with solutions