User Guide

Achieving optimal performance in GPU-centric workflows frequently requires customizing how GPU (”device”) memory is allocated.

RMM is a package that enables you to allocate device memory in a highly configurable way. For example, it enables you to allocate and use pools of GPU memory, or to use managed memory for allocations.

You can also easily configure other libraries like Numba and CuPy to use RMM for allocating device memory.

Installation

See the project README for how to install RMM.

Using RMM

There are two ways to use RMM in Python code:

  1. Using the rmm.DeviceBuffer API to explicitly create and manage device memory allocations

  2. Transparently via external libraries such as CuPy and Numba

RMM provides a MemoryResource abstraction to control how device memory is allocated in both the above uses.

DeviceBuffers

A DeviceBuffer represents an untyped, uninitialized device memory allocation. DeviceBuffers can be created by providing the size of the allocation in bytes:

>>> import rmm
>>> buf = rmm.DeviceBuffer(size=100)

The size of the allocation and the memory address associated with it can be accessed via the .size and .ptr attributes respectively:

>>> buf.size
100
>>> buf.ptr
140202544726016

DeviceBuffers can also be created by copying data from host memory:

>>> import rmm
>>> import numpy as np
>>> a = np.array([1, 2, 3], dtype='float64')
>>> buf = rmm.DeviceBuffer.to_device(a.view("int8"))  # to_device expects an 8-bit type or `bytes`
>>> buf.size
24

Conversely, the data underlying a DeviceBuffer can be copied to the host:

>>> np.frombuffer(buf.tobytes())
array([1., 2., 3.])

MemoryResource objects

MemoryResource objects are used to configure how device memory allocations are made by RMM.

By default if a MemoryResource is not set explicitly, RMM uses the CudaMemoryResource, which uses cudaMalloc for allocating device memory.

rmm.reinitialize() provides an easy way to initialize RMM with specific memory resource options across multiple devices. See help(rmm.reinitialize) for full details.

For lower-level control, the rmm.mr.set_current_device_resource() function can be used to set a different MemoryResource for the current CUDA device. For example, enabling the ManagedMemoryResource tells RMM to use cudaMallocManaged instead of cudaMalloc for allocating memory:

>>> import rmm
>>> rmm.mr.set_current_device_resource(rmm.mr.ManagedMemoryResource())

:warning: The default resource must be set for any device before allocating any device memory on that device. Setting or changing the resource after device allocations have been made can lead to unexpected behaviour or crashes.

As another example, PoolMemoryResource allows you to allocate a large “pool” of device memory up-front. Subsequent allocations will draw from this pool of already allocated memory. The example below shows how to construct a PoolMemoryResource with an initial size of 1 GiB and a maximum size of 4 GiB. The pool uses CudaMemoryResource as its underlying (”upstream”) memory resource:

>>> import rmm
>>> pool = rmm.mr.PoolMemoryResource(
...     rmm.mr.CudaMemoryResource(),
...     initial_pool_size=2**30,
...     maximum_pool_size=2**32
... )
>>> rmm.mr.set_current_device_resource(pool)

Similarly, to use a pool of managed memory:

>>> import rmm
>>> pool = rmm.mr.PoolMemoryResource(
...     rmm.mr.ManagedMemoryResource(),
...     initial_pool_size=2**30,
...     maximum_pool_size=2**32
... )
>>> rmm.mr.set_current_device_resource(pool)

Other MemoryResources include:

  • FixedSizeMemoryResource for allocating fixed blocks of memory

  • BinningMemoryResource for allocating blocks within specified “bin” sizes from different memory resources

MemoryResources are highly configurable and can be composed together in different ways. See help(rmm.mr) for more information.

Using RMM with third-party libraries

A number of libraries provide hooks to control their device allocations. RMM provides implementations of these for CuPy, numba, and PyTorch in the rmm.allocators submodule. All these approaches configure the library to use the current RMM memory resource for device allocations.

Using RMM with CuPy

You can configure CuPy to use RMM for memory allocations by setting the CuPy CUDA allocator to rmm.allocators.cupy.rmm_cupy_allocator:

>>> from rmm.allocators.cupy import rmm_cupy_allocator
>>> import cupy
>>> cupy.cuda.set_allocator(rmm_cupy_allocator)

Using RMM with Numba

You can configure Numba to use RMM for memory allocations using the Numba EMM Plugin.

This can be done in two ways:

  1. Setting the environment variable NUMBA_CUDA_MEMORY_MANAGER:

$ NUMBA_CUDA_MEMORY_MANAGER=rmm.allocators.numba python (args)
  1. Using the set_memory_manager() function provided by Numba:

>>> from numba import cuda
>>> from rmm.allocators.numba import RMMNumbaManager
>>> cuda.set_memory_manager(RMMNumbaManager)

Using RMM with PyTorch

You can configure PyTorch to use RMM for memory allocations using their by configuring the current allocator.

from rmm.allocators.torch import rmm_torch_allocator
import torch

torch.cuda.memory.change_current_allocator(rmm_torch_allocator)