Using cuML on CPU, GPU, or both#

This notebook demonstrates the CPU/GPU interoperability feature.

[1]:

import pickle
import cuml
from cuml.common.device_selection import using_device_type
from cuml.common.device_selection import set_global_device_type, get_global_device_type
from cuml.neighbors import NearestNeighbors
from cuml.manifold import UMAP
from cuml.linear_model import LinearRegression
from cuml.datasets import make_regression, make_blobs
from cuml.model_selection import train_test_split

X_blobs, y_blobs = make_blobs(n_samples=2000, n_features=20)
X_train_blobs, X_test_blobs, y_train_blobs, y_test_blobs = train_test_split(X_blobs, y_blobs, test_size=0.2, shuffle=True)

X_reg, y_reg = make_regression(n_samples=2000, n_features=20)
X_train_reg, X_test_reg, y_train_reg, y_tes_reg = train_test_split(X_reg, y_reg, test_size=0.2, shuffle=True)

Don’t have a GPU at your disposal at the moment? You can work on prototyping and run estimators in CPU-mode.

[2]:

nn = NearestNeighbors()
with using_device_type('cpu'):
    nn.fit(X_train_blobs)
    nearest_neighbors = nn.kneighbors(X_test_blobs)

Need to train your estimator with a special feature or hyperparameter only available in the paired CPU library? Initialize the cuML model with it and train on CPU.

[3]:

umap_model = UMAP(angular_rp_forest=True) # `angular_rp_forest` hyperparameter only available in UMAP library
with using_device_type('cpu'):
    umap_model.fit(X_train_blobs) # will run the UMAP library with the hyperparameter
with using_device_type('gpu'):
    transformed = umap_model.transform(X_test_blobs) # will run the cuML implementation of UMAP

[I] [14:46:20.500110] Unused keyword parameter: angular_rp_forest during cuML estimator initialization

/home/vic/mambaforge/envs/all_cuda-115_arch-x86_64/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

While ML training workflows almost always benefit from the superior speed of GPUs, small-scale applications with limited traffic and loose latency requirements may be able to perform inference on CPU. Please note that this feature would only work with models implementing pickle serialization and GPU to CPU transfers.

To train a model on GPU but deploy it on CPU : first, train the estimator on device and save it to disk

[4]:

lin_reg = LinearRegression()
with using_device_type('gpu'):
    lin_reg.fit(X_train_reg, y_train_reg)

pickle.dump(lin_reg, open("lin_reg.pkl", "wb"))
del lin_reg

Then, on the server, recover the estimator and run the inference on host.

[5]:

recovered_lin_reg = pickle.load(open("lin_reg.pkl", "rb"))
with using_device_type('cpu'):
    predictions = recovered_lin_reg.predict(X_test_reg)

The GPU/device is the default execution platform :

[6]:

initial_device_type = get_global_device_type()
print('default execution device:', initial_device_type)

default execution device: DeviceType.device

Estimators trainings and inferences inside a using_device_type context will be executed according to the execution platform selected :

[7]:

for param in ['cpu', 'host', 'gpu', 'device']:
    with using_device_type(param):
        print('using_device_type({}):'.format(param), get_global_device_type())

using_device_type(cpu): DeviceType.host
using_device_type(host): DeviceType.host
using_device_type(gpu): DeviceType.device
using_device_type(device): DeviceType.device

The execution platform can also be set at the global level from the set_global_device_type function.

[8]:

set_global_device_type('gpu')
print('new device type:', get_global_device_type())

new device type: DeviceType.device