{ "cells": [ { "cell_type": "markdown", "id": "8e5e6878", "metadata": {}, "source": [ "# Interoperability between cuDF and CuPy\n", "\n", "This notebook provides introductory examples of how you can use cuDF and CuPy together to take advantage of CuPy array functionality (such as advanced linear algebra operations)." ] }, { "cell_type": "code", "execution_count": 1, "id": "8b2d45c3", "metadata": {}, "outputs": [], "source": [ "import timeit\n", "from packaging import version\n", "\n", "import cupy as cp\n", "import cudf\n", "\n", "if version.parse(cp.__version__) >= version.parse(\"10.0.0\"):\n", " cupy_from_dlpack = cp.from_dlpack\n", "else:\n", " cupy_from_dlpack = cp.fromDlpack" ] }, { "cell_type": "markdown", "id": "e7e64b1a", "metadata": {}, "source": [ "### Converting a cuDF DataFrame to a CuPy Array\n", "\n", "If we want to convert a cuDF DataFrame to a CuPy ndarray, There are multiple ways to do it:\n", "\n", "1. We can use the [dlpack](https://github.com/dmlc/dlpack) interface.\n", "\n", "2. We can also use `DataFrame.values`.\n", "\n", "3. We can also convert via the [CUDA array interface](https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html) by using cuDF's `to_cupy` functionality." ] }, { "cell_type": "code", "execution_count": 2, "id": "45c482ab", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "118 µs ± 77.2 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)\n", "360 µs ± 6.04 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n", "355 µs ± 722 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n" ] } ], "source": [ "nelem = 10000\n", "df = cudf.DataFrame({'a':range(nelem),\n", " 'b':range(500, nelem + 500),\n", " 'c':range(1000, nelem + 1000)}\n", " )\n", "\n", "%timeit arr_cupy = cupy_from_dlpack(df.to_dlpack())\n", "%timeit arr_cupy = df.values\n", "%timeit arr_cupy = df.to_cupy()" ] }, { "cell_type": "code", "execution_count": 3, "id": "a565effc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 500, 1000],\n", " [ 1, 501, 1001],\n", " [ 2, 502, 1002],\n", " ...,\n", " [ 9997, 10497, 10997],\n", " [ 9998, 10498, 10998],\n", " [ 9999, 10499, 10999]])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr_cupy = cupy_from_dlpack(df.to_dlpack())\n", "arr_cupy" ] }, { "cell_type": "markdown", "id": "0759ab29", "metadata": {}, "source": [ "### Converting a cuDF Series to a CuPy Array" ] }, { "cell_type": "markdown", "id": "4f35ffbd", "metadata": {}, "source": [ "There are also multiple ways to convert a cuDF Series to a CuPy array:\n", "\n", "1. We can pass the Series to `cupy.asarray` as cuDF Series exposes [`__cuda_array_interface__`](https://docs-cupy.chainer.org/en/stable/reference/interoperability.html).\n", "2. We can leverage the dlpack interface `to_dlpack()`. \n", "3. We can also use `Series.values`" ] }, { "cell_type": "code", "execution_count": 4, "id": "8f97f304", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "54.4 µs ± 66 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)\n", "125 µs ± 1.21 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)\n", "119 µs ± 805 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)\n" ] } ], "source": [ "col = 'a'\n", "\n", "%timeit cola_cupy = cp.asarray(df[col])\n", "%timeit cola_cupy = cupy_from_dlpack(df[col].to_dlpack())\n", "%timeit cola_cupy = df[col].values" ] }, { "cell_type": "code", "execution_count": 5, "id": "f96d5676", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, ..., 9997, 9998, 9999])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cola_cupy = cp.asarray(df[col])\n", "cola_cupy" ] }, { "cell_type": "markdown", "id": "c36e5b88", "metadata": {}, "source": [ "From here, we can proceed with normal CuPy workflows, such as reshaping the array, getting the diagonal, or calculating the norm." ] }, { "cell_type": "code", "execution_count": 6, "id": "2a7ae43f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, ..., 197, 198, 199],\n", " [ 200, 201, 202, ..., 397, 398, 399],\n", " [ 400, 401, 402, ..., 597, 598, 599],\n", " ...,\n", " [9400, 9401, 9402, ..., 9597, 9598, 9599],\n", " [9600, 9601, 9602, ..., 9797, 9798, 9799],\n", " [9800, 9801, 9802, ..., 9997, 9998, 9999]])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reshaped_arr = cola_cupy.reshape(50, 200)\n", "reshaped_arr" ] }, { "cell_type": "code", "execution_count": 7, "id": "b442a30c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 201, 402, 603, 804, 1005, 1206, 1407, 1608, 1809, 2010,\n", " 2211, 2412, 2613, 2814, 3015, 3216, 3417, 3618, 3819, 4020, 4221,\n", " 4422, 4623, 4824, 5025, 5226, 5427, 5628, 5829, 6030, 6231, 6432,\n", " 6633, 6834, 7035, 7236, 7437, 7638, 7839, 8040, 8241, 8442, 8643,\n", " 8844, 9045, 9246, 9447, 9648, 9849])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reshaped_arr.diagonal()" ] }, { "cell_type": "code", "execution_count": 8, "id": "be7f4d32", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(577306.967739)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cp.linalg.norm(reshaped_arr)" ] }, { "cell_type": "markdown", "id": "b353bded", "metadata": {}, "source": [ "### Converting a CuPy Array to a cuDF DataFrame\n", "\n", "We can also convert a CuPy ndarray to a cuDF DataFrame. Like before, there are multiple ways to do it:\n", "\n", "1. **Easiest;** We can directly use the `DataFrame` constructor.\n", "\n", "2. We can use CUDA array interface with the `DataFrame` constructor.\n", "\n", "3. We can also use the [dlpack](https://github.com/dmlc/dlpack) interface.\n", "\n", "For the latter two cases, we'll need to make sure that our CuPy array is Fortran contiguous in memory (if it's not already). We can either transpose the array or simply coerce it to be Fortran contiguous beforehand." ] }, { "cell_type": "code", "execution_count": 9, "id": "8887b253", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "14.3 ms ± 33.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" ] } ], "source": [ "%timeit reshaped_df = cudf.DataFrame(reshaped_arr)" ] }, { "cell_type": "code", "execution_count": 10, "id": "08ec4ffa", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "... | \n", "190 | \n", "191 | \n", "192 | \n", "193 | \n", "194 | \n", "195 | \n", "196 | \n", "197 | \n", "198 | \n", "199 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "... | \n", "190 | \n", "191 | \n", "192 | \n", "193 | \n", "194 | \n", "195 | \n", "196 | \n", "197 | \n", "198 | \n", "199 | \n", "
1 | \n", "200 | \n", "201 | \n", "202 | \n", "203 | \n", "204 | \n", "205 | \n", "206 | \n", "207 | \n", "208 | \n", "209 | \n", "... | \n", "390 | \n", "391 | \n", "392 | \n", "393 | \n", "394 | \n", "395 | \n", "396 | \n", "397 | \n", "398 | \n", "399 | \n", "
2 | \n", "400 | \n", "401 | \n", "402 | \n", "403 | \n", "404 | \n", "405 | \n", "406 | \n", "407 | \n", "408 | \n", "409 | \n", "... | \n", "590 | \n", "591 | \n", "592 | \n", "593 | \n", "594 | \n", "595 | \n", "596 | \n", "597 | \n", "598 | \n", "599 | \n", "
3 | \n", "600 | \n", "601 | \n", "602 | \n", "603 | \n", "604 | \n", "605 | \n", "606 | \n", "607 | \n", "608 | \n", "609 | \n", "... | \n", "790 | \n", "791 | \n", "792 | \n", "793 | \n", "794 | \n", "795 | \n", "796 | \n", "797 | \n", "798 | \n", "799 | \n", "
4 | \n", "800 | \n", "801 | \n", "802 | \n", "803 | \n", "804 | \n", "805 | \n", "806 | \n", "807 | \n", "808 | \n", "809 | \n", "... | \n", "990 | \n", "991 | \n", "992 | \n", "993 | \n", "994 | \n", "995 | \n", "996 | \n", "997 | \n", "998 | \n", "999 | \n", "
5 rows × 200 columns
\n", "\n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "... | \n", "190 | \n", "191 | \n", "192 | \n", "193 | \n", "194 | \n", "195 | \n", "196 | \n", "197 | \n", "198 | \n", "199 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "... | \n", "190 | \n", "191 | \n", "192 | \n", "193 | \n", "194 | \n", "195 | \n", "196 | \n", "197 | \n", "198 | \n", "199 | \n", "
1 | \n", "200 | \n", "201 | \n", "202 | \n", "203 | \n", "204 | \n", "205 | \n", "206 | \n", "207 | \n", "208 | \n", "209 | \n", "... | \n", "390 | \n", "391 | \n", "392 | \n", "393 | \n", "394 | \n", "395 | \n", "396 | \n", "397 | \n", "398 | \n", "399 | \n", "
2 | \n", "400 | \n", "401 | \n", "402 | \n", "403 | \n", "404 | \n", "405 | \n", "406 | \n", "407 | \n", "408 | \n", "409 | \n", "... | \n", "590 | \n", "591 | \n", "592 | \n", "593 | \n", "594 | \n", "595 | \n", "596 | \n", "597 | \n", "598 | \n", "599 | \n", "
3 | \n", "600 | \n", "601 | \n", "602 | \n", "603 | \n", "604 | \n", "605 | \n", "606 | \n", "607 | \n", "608 | \n", "609 | \n", "... | \n", "790 | \n", "791 | \n", "792 | \n", "793 | \n", "794 | \n", "795 | \n", "796 | \n", "797 | \n", "798 | \n", "799 | \n", "
4 | \n", "800 | \n", "801 | \n", "802 | \n", "803 | \n", "804 | \n", "805 | \n", "806 | \n", "807 | \n", "808 | \n", "809 | \n", "... | \n", "990 | \n", "991 | \n", "992 | \n", "993 | \n", "994 | \n", "995 | \n", "996 | \n", "997 | \n", "998 | \n", "999 | \n", "
5 rows × 200 columns
\n", "\n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "... | \n", "190 | \n", "191 | \n", "192 | \n", "193 | \n", "194 | \n", "195 | \n", "196 | \n", "197 | \n", "198 | \n", "199 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "... | \n", "190 | \n", "191 | \n", "192 | \n", "193 | \n", "194 | \n", "195 | \n", "196 | \n", "197 | \n", "198 | \n", "199 | \n", "
1 | \n", "200 | \n", "201 | \n", "202 | \n", "203 | \n", "204 | \n", "205 | \n", "206 | \n", "207 | \n", "208 | \n", "209 | \n", "... | \n", "390 | \n", "391 | \n", "392 | \n", "393 | \n", "394 | \n", "395 | \n", "396 | \n", "397 | \n", "398 | \n", "399 | \n", "
2 | \n", "400 | \n", "401 | \n", "402 | \n", "403 | \n", "404 | \n", "405 | \n", "406 | \n", "407 | \n", "408 | \n", "409 | \n", "... | \n", "590 | \n", "591 | \n", "592 | \n", "593 | \n", "594 | \n", "595 | \n", "596 | \n", "597 | \n", "598 | \n", "599 | \n", "
3 | \n", "600 | \n", "601 | \n", "602 | \n", "603 | \n", "604 | \n", "605 | \n", "606 | \n", "607 | \n", "608 | \n", "609 | \n", "... | \n", "790 | \n", "791 | \n", "792 | \n", "793 | \n", "794 | \n", "795 | \n", "796 | \n", "797 | \n", "798 | \n", "799 | \n", "
4 | \n", "800 | \n", "801 | \n", "802 | \n", "803 | \n", "804 | \n", "805 | \n", "806 | \n", "807 | \n", "808 | \n", "809 | \n", "... | \n", "990 | \n", "991 | \n", "992 | \n", "993 | \n", "994 | \n", "995 | \n", "996 | \n", "997 | \n", "998 | \n", "999 | \n", "
5 rows × 200 columns
\n", "\n", " | a0 | \n", "a1 | \n", "a2 | \n", "a3 | \n", "a4 | \n", "a5 | \n", "a6 | \n", "a7 | \n", "a8 | \n", "a9 | \n", "a10 | \n", "a11 | \n", "a12 | \n", "a13 | \n", "a14 | \n", "a15 | \n", "a16 | \n", "a17 | \n", "a18 | \n", "a19 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.00000 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "11.308953 | \n", "
1 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "-5.241297 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "17.58476 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "
2 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.00000 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "
3 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.00000 | \n", "10.869279 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "
4 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "2.526274 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.00000 | \n", "0.000000 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "