pybind11 array and buffer interface: A numpy user’s perspective¶

pybind11-logo

Dboy Liao

Medium GitHub LinkedIn CakeResume

Fork me on GitHub

The Numpy ndarray¶

int matrix[3][5];
  • arbitrary shape?
  • flexible reshape?
  • different data type?
In [1]:
import numpy as np
In [2]:
array = np.arange(16, dtype=np.int8).reshape(4, 4).copy()
array
Out[2]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]], dtype=int8)
array.shape
array.strides
array.data
array.base
In [3]:
np.lib.stride_tricks
Out[3]:
<module 'numpy.lib.stride_tricks' from '/Users/dboyliao/.pyenv/versions/3.8.1/lib/python3.8/site-packages/numpy/lib/stride_tricks.py'>

array-2d

In [4]:
print(array)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

array-2d-flatten

In [5]:
arr_flatten = array.ravel()
arr_flatten
Out[5]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
      dtype=int8)
In [6]:
print("1 == 4 * 0 + 1 * 1:", 1 == 4*0 + 1*1)
print("6 == 4 * 1 + 1 * 2:", 6 == 4*1 + 1*2)
1 == 4 * 0 + 1 * 1: True
6 == 4 * 1 + 1 * 2: True
In [7]:
print("arr_flatten[1] == array[0, 1]:", arr_flatten[1] == array[0, 1])
print("arr_flatten[6] == array[1, 2]:", arr_flatten[6] == array[1, 2])
arr_flatten[1] == array[0, 1]: True
arr_flatten[6] == array[1, 2]: True
In [8]:
array.strides
Out[8]:
(4, 1)

array-2d-view

In [9]:
arr3 = array.reshape(2, 2, 2, 2)
arr3
Out[9]:
array([[[[ 0,  1],
         [ 2,  3]],

        [[ 4,  5],
         [ 6,  7]]],


       [[[ 8,  9],
         [10, 11]],

        [[12, 13],
         [14, 15]]]], dtype=int8)
In [10]:
arr3.strides
Out[10]:
(8, 4, 2, 1)
In [11]:
print("11 == 8 * 1 + 4 * 0 + 2 * 1 + 1 * 1:", 11 == 8 * 1 + 4 * 0 + 2 * 1 + 1 * 1)
11 == 8 * 1 + 4 * 0 + 2 * 1 + 1 * 1: True
In [12]:
arr_flatten[11] == arr3[1, 0, 1, 1]
Out[12]:
True
  • linear offset: the offset of an element in the flattened array
  • $\mathbf{arr}$: a m-dims array
    • strides: $(s_0, s_1, ..., s_m)$
    • shape: $(d_0, d_1, ..., d_m)$
  • $e = \mathbf{arr}[i_0, i_1, ..., i_m]$, element in $\mathbf{arr}$
    • with linear offset $\text{offset}_e$

Then we have:

$$ \text{offset}_e = \sum\limits_{j=0}^{m} s_j \cdot i_j $$

okey-then-what

Broadcasting Rule: From Scratch¶

Numpy Documentation

np-boradcasting

In [13]:
a = np.zeros((4, 3), dtype=np.int8)
a[1] = 10
a[2] = 20
a[3] = 30
b = np.array([1, 2, 3], dtype=np.int8)
In [14]:
a.shape
Out[14]:
(4, 3)
In [15]:
b.shape
Out[15]:
(3,)
In [16]:
a + b
Out[16]:
array([[ 1,  2,  3],
       [11, 12, 13],
       [21, 22, 23],
       [31, 32, 33]], dtype=int8)
In [17]:
b.strides
Out[17]:
(1,)
In [18]:
# promote b: make it of the shape as a
b_promoted = np.lib.stride_tricks.as_strided(
    b,
    shape=a.shape,
    strides=(0,)+b.strides
)
In [19]:
b_promoted.shape
Out[19]:
(4, 3)
In [20]:
b_promoted
Out[20]:
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]], dtype=int8)
In [21]:
b_promoted.strides
Out[21]:
(0, 1)

b_promoted[1, 1]

  • linear offset = 0 * 1 + 1 * 1 = 1
  • b_promoted[1, 1] == b[1] # 2

np-broadcasting-b-1

b_promoted[3, 2]

  • linear offset = 0 * 3 + 1 * 2 = 2
  • b_promoted[3, 2] == b[2] # 3

np-broadcasting-b-2

In [22]:
np.alltrue(a + b == a + b_promoted)
Out[22]:
True
In [23]:
a_promoted_lib, b_promoted_lib = np.lib.stride_tricks.broadcast_arrays(a, b)
In [24]:
b_promoted_lib.strides
Out[24]:
(0, 1)
In [25]:
# b is the owner of the underlying memory block
b.base is None
Out[25]:
True
In [26]:
# b_promoted_lib is just a shared-memory view of b
b_promoted_lib.base is b
Out[26]:
True
In [27]:
np.lib.stride_tricks.as_strided(np.array([0, 10, 20, 30], dtype=np.int8), shape=(4, 3), strides=(1, 0))
Out[27]:
array([[ 0,  0,  0],
       [10, 10, 10],
       [20, 20, 20],
       [30, 30, 30]], dtype=int8)
In [28]:
array.flatten().base is None
Out[28]:
True
In [29]:
array.ravel().base is None
Out[29]:
False

okey-then-what

Time Series: Sliding Window with Shared-Memory View¶

In [30]:
data = np.arange(20, dtype=np.int8)
data
Out[30]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19], dtype=int8)
In [31]:
windows = np.lib.stride_tricks.as_strided(
    data,
    shape=(16, 5),
    strides=(1, 1)
)
In [32]:
windows
Out[32]:
array([[ 0,  1,  2,  3,  4],
       [ 1,  2,  3,  4,  5],
       [ 2,  3,  4,  5,  6],
       [ 3,  4,  5,  6,  7],
       [ 4,  5,  6,  7,  8],
       [ 5,  6,  7,  8,  9],
       [ 6,  7,  8,  9, 10],
       [ 7,  8,  9, 10, 11],
       [ 8,  9, 10, 11, 12],
       [ 9, 10, 11, 12, 13],
       [10, 11, 12, 13, 14],
       [11, 12, 13, 14, 15],
       [12, 13, 14, 15, 16],
       [13, 14, 15, 16, 17],
       [14, 15, 16, 17, 18],
       [15, 16, 17, 18, 19]], dtype=int8)
In [33]:
X = windows[:, :4]
Y = windows[:, 4]
In [34]:
X
Out[34]:
array([[ 0,  1,  2,  3],
       [ 1,  2,  3,  4],
       [ 2,  3,  4,  5],
       [ 3,  4,  5,  6],
       [ 4,  5,  6,  7],
       [ 5,  6,  7,  8],
       [ 6,  7,  8,  9],
       [ 7,  8,  9, 10],
       [ 8,  9, 10, 11],
       [ 9, 10, 11, 12],
       [10, 11, 12, 13],
       [11, 12, 13, 14],
       [12, 13, 14, 15],
       [13, 14, 15, 16],
       [14, 15, 16, 17],
       [15, 16, 17, 18]], dtype=int8)
In [35]:
Y
Out[35]:
array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
      dtype=int8)
In [36]:
X.base is windows
Out[36]:
True
In [37]:
Y.base is windows
Out[37]:
True
In [38]:
# X = windows[:, :4]
X2 = windows[:, [0, 1, 2, 3]]
In [39]:
np.alltrue(X2 == X)
Out[39]:
True
In [40]:
X2.base is windows
Out[40]:
False
  • Python Data Model: __getitem__
    • link
  • start:end:step is a short-hand for slice object
    • has FIXED pattern (fixed start, fixed end, fixed step)
    • can be convert to a set of corresponding strides
    • shared-memory view is possible
    • NO COPY REQUIRED
  • a list of indices
    • no guarantee on the pattern of the indices contained
    • cannot be convert to a set of corresponding strides
    • COPY IS A MUST

see-c-cpp

pybind11 Buffer and Array Interface¶

Documentation

  • Good tutorial to CPython buffer protocol
    • https://jakevdp.github.io/blog/2014/05/05/introduction-to-the-python-buffer-protocol/
    • CPython Bufferinfo object: https://docs.python.org/3/c-api/buffer.html#buffer-structure
    • WARN: long and tedious C code
  • pybind11 make it easy to expose your type in C++ to Python via buffer protocol
cmake_minimum_required(VERSION 3.10)
project(python_example)

add_subdirectory(thirdparty/pybind11)

set(CMAKE_VERBOSE_MAKEFILE ON)

file(GLOB_RECURSE MYLIB_SRC "${CMAKE_SOURCE_DIR}/src/*.cpp" "${CMAKE_SOURCE_DIR}/src/*.c")
pybind11_add_module(_mylib SHARED ${MYLIB_SRC})
target_include_directories(_mylib PRIVATE include)
set_target_properties(
    _mylib
    PROPERTIES LIBRARY_OUTPUT_DIRECTORY ${CMAKE_SOURCE_DIR}/mylib
)
In [41]:
!make lib
mkdir -p build; \
	cmake -DPYTHON_EXECUTABLE=$(pyenv which python3) -S . -B build -DCMAKE_BUILD_TYPE=Debug; \
	cmake --build build
-- pybind11 v2.9.1 
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/dboyliao/Documents/Talks/TaipeiPy/pybind11_array_buffer/build
/usr/local/Cellar/cmake/3.20.5/bin/cmake -S/Users/dboyliao/Documents/Talks/TaipeiPy/pybind11_array_buffer -B/Users/dboyliao/Documents/Talks/TaipeiPy/pybind11_array_buffer/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/local/Cellar/cmake/3.20.5/bin/cmake -E cmake_progress_start /Users/dboyliao/Documents/Talks/TaipeiPy/pybind11_array_buffer/build/CMakeFiles /Users/dboyliao/Documents/Talks/TaipeiPy/pybind11_array_buffer/build//CMakeFiles/progress.marks
/Applications/Xcode.app/Contents/Developer/usr/bin/make  -f CMakeFiles/Makefile2 all
/Applications/Xcode.app/Contents/Developer/usr/bin/make  -f CMakeFiles/_mylib.dir/build.make CMakeFiles/_mylib.dir/depend
cd /Users/dboyliao/Documents/Talks/TaipeiPy/pybind11_array_buffer/build && /usr/local/Cellar/cmake/3.20.5/bin/cmake -E cmake_depends "Unix Makefiles" /Users/dboyliao/Documents/Talks/TaipeiPy/pybind11_array_buffer /Users/dboyliao/Documents/Talks/TaipeiPy/pybind11_array_buffer /Users/dboyliao/Documents/Talks/TaipeiPy/pybind11_array_buffer/build /Users/dboyliao/Documents/Talks/TaipeiPy/pybind11_array_buffer/build /Users/dboyliao/Documents/Talks/TaipeiPy/pybind11_array_buffer/build/CMakeFiles/_mylib.dir/DependInfo.cmake --color=
Consolidate compiler generated dependencies of target _mylib
/Applications/Xcode.app/Contents/Developer/usr/bin/make  -f CMakeFiles/_mylib.dir/build.make CMakeFiles/_mylib.dir/build
make[3]: Nothing to be done for `CMakeFiles/_mylib.dir/build'.
[100%] Built target _mylib
/usr/local/Cellar/cmake/3.20.5/bin/cmake -E cmake_progress_start /Users/dboyliao/Documents/Talks/TaipeiPy/pybind11_array_buffer/build/CMakeFiles 0

pybind11::buffer_info¶

struct buffer_info {
    void *ptr;                        // ndarray.data
    py::ssize_t itemsize;             // ndarray.itemsize, element size in bytes
    std::string format;               // Python struct format 
    py::ssize_t ndim;                 // ndarray.ndim
    std::vector<py::ssize_t> shape;   // ndarray.shape
    std::vector<py::ssize_t> strides; // ndarray.strides
};
  • Python Style Struct Format
py::class_<MyMatrix>(m, "MyMatrix", py::buffer_protocol())
  .def(py::init<int, int>(), py::arg("rows"), py::arg("cols"))
  .def_buffer([](const MyMatrix &m) {
    return py::buffer_info(
        /* the memory block */
        m.data(),
        /* Size of one scalar */
        sizeof(float),
        /* string, Python struct-style format descriptor */
        py::format_descriptor<float>::format(),
        /* Number of dimensions */
        2,
        /* Buffer shape */
        {m.rows(), m.cols()},
        /* Strides (in bytes) for each index */
        {sizeof(float) * m.cols(), sizeof(float)});
  });
def create_mat(rows, cols):
    return np.array(MyMatrix(rows, cols), copy=False)
In [42]:
import mylib

mylib.create_mat(5, 5)
Out[42]:
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]], dtype=float32)

pybind11::array and pybind11::array_t¶

py::array_t<double> array_add_value(const py::array_t<double> &in_arr,
                                    double v = 1.0) {
  auto info = in_arr.request();
  double *out_buff = new double[info.size];
  auto in_ptr = (double *)info.ptr;
  for (py::ssize_t idx = 0; idx < info.size; ++idx) {
    *(out_buff + idx) = *(in_ptr + idx) + v;
  }
  auto out_info = py::buffer_info(out_buff, info.itemsize, info.format,
                                  info.ndim, info.shape, info.strides);
  return py::array_t<double>(out_info);
}
In [43]:
import mylib

print(mylib.array_add_value.__doc__)
array_add_value(input_array: numpy.ndarray[numpy.float64], value: float = 0.0) -> numpy.ndarray[numpy.float64]

add by given value

In [44]:
arr_in = np.random.rand(5)
mylib.array_add_value(arr_in, 3.1)
Out[44]:
array([3.51770958, 3.49058292, 3.3391846 , 3.2065161 , 3.4542942 ])

okey-then-what

Non-Trivial Application: Array Elements Traversal¶

  • reduce functions along axis
    • ex: mean, sum, std, ...etc
  • Let's see an example, argmin
static py::array argmin(const py::array &in_arr, int32_t axis = -1);
template <typename T>
py::buffer_info argmin_impl_(const py::array &in_arr,
                             int32_t axis = -1) noexcept;

reduce-shape

reduce-inner-outer-loop

int64_t *out_data = (int64_t *)malloc(
    sizeof(int64_t) * static_cast<size_t>(outer_size * inner_size));
T *in_data = (T *)in_info.ptr;
for (size_t outer = 0; outer < outer_size; ++outer) {
  for (size_t inner = 0; inner < inner_size; ++inner) {
    T min_value = in_data[outer * axis_size * inner_size + inner];
    int64_t min_idx = 0;
    for (size_t i = 0; i < axis_size; ++i) {
      T value = in_data[(outer * axis_size + i) * inner_size + inner];
      if (value <= min_value) {
        min_value = value;
        min_idx = static_cast<int64_t>(i);
      }
    }
    out_data[outer * inner_size + inner] = min_idx;
  }
}

Let's change from py::array_<T> to py::array!

py::array array_add_value(const py::array &in_arr, double v = 1.0) {
  auto info = in_arr.request();
  double *out_buff = new double[info.size];
  auto in_ptr = (double *)info.ptr;
  for (py::ssize_t idx = 0; idx < info.size; ++idx) {
    *(out_buff + idx) = *(in_ptr + idx) + v;
  }
  auto out_info = py::buffer_info(out_buff, info.itemsize, info.format,
                                  info.ndim, info.shape, info.strides);
  return py::array(out_info);
}
In [1]:
import mylib

print(mylib.array_add_value.__doc__)
array_add_value(input_array: numpy.ndarray, value: float = 0.0) -> numpy.ndarray

add by given value

In [2]:
arr_in = np.random.rand(5)
mylib.array_add_value(arr_in, 3.1)
Out[2]:
array([3.64004604, 3.33775629, 3.14665428, 3.20065878, 3.7822936 ])
In [3]:
mylib.array_add_value(arr_in.astype(np.float128), 3.1)
Out[3]:
array([              nan, -1.13644336e+0987,  1.68195440e-4932,
                     nan,  0.00000000e+0000], dtype=float128)

Refereces¶

  • pybind11 buffer protocol
    • test_buffers.cpp
    • test_buffers.py
  • uTensor
    • https://github.com/uTensor/uTensor