Crafting Your Own Numpy: Do More in C++ and Make It Python @ PyCon JP 2024
AnChiLiu
198 views
31 slides
Sep 28, 2024
Slide 1 of 31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
About This Presentation
The slides presenting for the talk "Crafting Your Own Numpy: Do More in C++ and Make It Python" at PyCon JP 2024, https://2024.pycon.jp/en/talk/XXCCQR
Size: 4.62 MB
Language: en
Added: Sep 28, 2024
Slides: 31 pages
Slide Content
Crafting Your Own Numpy
Do More in C++ and Make It Python
An-Chi Liu
2024/09/27 at PyConJP 2024
About Me
•Open source contributor with ID
@tigercosmos
•Sciworkcommunity
•PcapPlusPlus maintainer
•Software engineer at Mujin, Inc, Tokyo
•Writing a new book (release in the end
of the year)
•Hobbies: Snowboarding, Photography
2
Why a new array?
•Arrays are fundamental components everywhere
•More features on top of the array
•Better integration with other library modules
•Better efficiency of the memory management
3
Speed is the king
4
Write Python for speed
•C++ is fast
•Easy python interface& C++ efficient
performance
•Python is easy to make prototypes
•Use Python to drive C++ code
•Python❤️CPP❤️Pybind11
•More flexible and efficient to manage
the memory management
5
Continue to the last year talk
6
This year, more details about the implementation!
PyConAPAC 2023
How do we use NumPy
# Array of integers
arr_int= np.array([1, 2, 3, 4], dtype=np.int32)
# Array of floats
arr_float= np.arange(2* 3* 4, dtype=np.float64)
# Array of complex numbers
arr_complex= np.arange(2* 3* 4,dtype=np.complex128)
7
Raw buffer:
arr[i, j, k]
offset
Python binding array: pybind11 part
namespacepy= pybind11;
PYBIND11_MODULE(example, m) {
py::class_<SimpleArray>(m, "SimpleArray")
.def(py::init<size_t>())
.def(py::init<conststd::vector<double>&>())
.def("size", &SimpleArray::size)
// ...
;
}
>>>from example import SimpleArray
>>>arr= SimpleArray([1, 2, 3, 4])
<example.SimpleArrayobject at 0x1047d25b0>
>>>arr.size() # 4
9
C++ class Python class
Where is the “dtype”
•Python array should take the “dtype” argument
•C++ template array to accept different data type
•SimpleArray<T>, e.g. SimpleArray<int>, SimpleArray<float>
10
We want to have typed arrays: C++ side
template<typenameT>
classSimpleArray{
public:
SimpleArray(size_tsize) : size_(size), data_(size) {}
SimpleArray(conststd::vector<T>&arr) :
size_(arr.size()), data_(arr) {}
// Methods ...
Tat(size_ti) const{ returndata_[i]; }
size_tsize() const{ returnsize_; }
private:
size_tsize_;
std::vector<T> data_;
};
11
The “dtype” challenge
•Only one type for NumPy array: numpy.ndarray
13
>>> arr_int32 = np.array([1, 2, 3], dtype=np.int32)
>>> type(arr_int32)
<class'numpy.ndarray’>
>>> arr_float64 = np.array([1, 2, 3], dtype=np.float64)
>>> type(arr_float64)
<class'numpy.ndarray'>
The “dtype” challenge (cont.)
•But we have multiple Python array types
e.g. SimpleArrayInt32, SimpleArrayfloat64
14
>>> arr_int32 = example.SimpleArrayInt32( 10)
>>> type(arr_int32)
<class'example.SimpleArrayInt32’>
>>> arr_float64 = example.SimpleArrayFloat64( 10)
>>> type(arr_float64)
<class'example.SimpleArrayFloat64'>
Missing SimpleArray(..., dtype=“...”)!
We need a new SimpleArray(in Python) that can accept a dtypeargument and
internally convert it to the corresponding SimpleArrayXXXtype.
How does NumPy do
15
typedefstruct{
PyObject_HEAD
char*data;// pointer to the raw data
intnd;// number of dimensions
npy_intp *dimensions;// shape of the array
npy_intp *strides;// strides for each dimension
PyArray_Descr *descr;// data-type descriptor
// ...
} PyArrayObject;
typedefstruct{
PyObject_HEAD
charkind; // e.g. 'i' for integer
chartype; // e.g. 'i' for int32
charbyteorder;// e.g. '=' for native
inttype_num;// e.g. NPY_INT32
intelsize; // e.g. 4
// ...
} PyArray_Descr;
Why not use the same approach as NumPy?
•We already had typed array implementation before the
“dtype” requirement
•We want to leverage the C++ advantages
17
Naïve method: do it on Python
18
classSimpleArray:
_data= None
_dtype= None
def__init__(self, size, dtype):
self._dtype= dtype
ifdtype== "int32":
self._data= SimpleArrayInt32(size)
elifdtype== "float64":
self._data= SimpleArrayFloat64(size)
# else: ...
def__getitem__(self, key):
returnself._data[key]
def__setitem__(self, key, value):
self._data[key] = value
def__len__(self):
returnlen(self._data)
The drawback of the Python wrapper
•We lost the help of static types from the C++ compiler
•We sacrifice the compilation-time speed offered by C++
•We increased the complexity of the memory manipulation
between Python and C++
20
However, we can make it at C++!
Modmeshapproach: C++ array
21
classSimpleArrayPlex
{
public:
explicitSimpleArrayPlex(constshape_type&shape, conststd::string&dtype_str);
template<typenameT>
SimpleArrayPlex(constSimpleArray<T> &array) { /* ... */ }
// Other constructors ...
// No other methods (used the methods from type arrays
private:
boolm_has_instance_ownership= false;/// ownership of the instance
void*m_instance_ptr= nullptr;/// the pointer of the SimpleArray<T> instance
DataTypem_data_type= DataType::Undefined;/// the data type for array casting
};
DataType::Int32, DataType::Float64, etc.
Modmeshapproach: Pybind11 wrapper
23
template<typenameA, typenameC>
staticautoexecute_callback_with_typed_array(A&arrayplex, C&&callback)
{
switch(arrayplex.data_type())
{
caseDataType::Int32:
{
autotyped_array=
reinterpret_cast<SimpleArrayInt32*>(arrayplex.mutable_instance_ptr());
returncallback(*typed_array); // call typed_array->func()
}
// ...
}
} No need to implement API for the
SimpleArrayPlex, just call APIs from the typed
array!
Modmeshapproach: Python casting
•The original codebase includes typed arrays
•We've introduced the new SimpleArrayPlex
•Both types of arrays need to work seamlessly together
•Typed arrays should be able to cast to SimpleArrayPlex, and
vice versa
25
Modmeshapproach: Python casting (cont.)
26
namespacepybind11::detail
{
template<> structtype_caster<modmesh::SimpleArrayInt32> :
publictype_caster_base<modmesh::SimpleArrayInt32>
{
usingbase= type_caster_base<modmesh::SimpleArrayInt32>;
public:
// Conversion from Python object to C++
boolload(pybind11::handlesrc, boolconvert){ /* ... */}
// Conversion from C++ to Python object
staticpybind11::handlecast(...) { /* ... */ }
}
}
Finally, we make it
28
>>> arrayplex = SimpleArray(10, dtype="int32")
>>> type(arrayplex)
<class'_modmesh.SimpleArray’>
>>> arrayplex = SimpleArray(10, dtype="float64")
>>> type(arrayplex)
<class'_modmesh.SimpleArray’>
Finally, we make it (cont.)
29
>>> arrayplex = SimpleArray(10, dtype="int32")
>>> type(arrayplex)
<class'_modmesh.SimpleArray’>
>>> arraytyped = arrayplex.typed
>>> type(arraytyped)
<class'_modmesh.SimpleArrayInt32’>
>>> arraytyped = SimpleArrayInt32(10)
>>> type(arraytyped)
<class'_modmesh.SimpleArrayInt32’>
>>> arrayplex = arraytyped.plex
>>> type(arrayplex)
<class'_modmesh.SimpleArray'>
Same data buffer
Same data buffer
Finally, we make it (cont.)
•Of course, all array methods are the same between the
typed array and the arrayplex
30
>>> arraytyped[4] == arrayplex[4]
True
>>> arraytyped.size== arrayplex.size
True
>>> arraytyped.stride== arrayplex.stride
True
>>> arraytyped.your_method()== arrayplex.your_method()
True