Pandas Compatibility Notes#

Pandas Compatibility Note

DataFrame.quantile

One notable difference from Pandas is when DataFrame is of non-numeric types and result is expected to be a Series in case of Pandas. cuDF will return a DataFrame as it doesn't support mixed types under Series.

[source]

Pandas Compatibility Note

DataFrame.query

One difference from pandas is that query currently only supports numeric, datetime, timedelta, or bool dtypes.

[source]

Pandas Compatibility Note

DataFrame.reindex

Note: One difference from Pandas is that NA is used for rows that do not match, rather than NaN. One side effect of this is that the column http_status retains an integer dtype in cuDF where it is cast to float in Pandas.

[source]

Pandas Compatibility Note

DataFrame.truncate, Series.truncate

The copy parameter is only present for API compatibility, but copy=False is not supported. This method always generates a copy.

[source]

Pandas Compatibility Note

Note that where treats missing values as falsy, in parallel with pandas treatment of nullable data:

>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0       1
1    <NA>
2    <NA>
dtype: int64
>>> gsr.where([True, False, False])
0       1
1    <NA>
2    <NA>
dtype: int64

[source]

Pandas Compatibility Note

MultiIndex.get_loc

The return types of this function may deviates from the method provided by Pandas. If the index is neither lexicographically sorted nor unique, a best effort attempt is made to coerce the found indices into a slice. For example:

>>> import pandas as pd
>>> import cudf
>>> x = pd.MultiIndex.from_tuples([
...     (2, 1, 1), (1, 2, 3), (1, 2, 1),
...     (1, 1, 1), (1, 1, 1), (2, 2, 1),
... ])
>>> x.get_loc(1)
array([False,  True,  True,  True,  True, False])
>>> cudf.from_pandas(x).get_loc(1)
slice(1, 5, 1)

[source]

Pandas Compatibility Note

Series.reindex

Note: One difference from Pandas is that NA is used for rows that do not match, rather than NaN. One side effect of this is that the series retains an integer dtype in cuDF where it is cast to float in Pandas.

[source]

Pandas Compatibility Note

DataFrame.truncate, Series.truncate

The copy parameter is only present for API compatibility, but copy=False is not supported. This method always generates a copy.

[source]

Pandas Compatibility Note

Note that where treats missing values as falsy, in parallel with pandas treatment of nullable data:

>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0       1
1    <NA>
2    <NA>
dtype: int64
>>> gsr.where([True, False, False])
0       1
1    <NA>
2    <NA>
dtype: int64

[source]

Pandas Compatibility Note

groupby.apply

cuDF's groupby.apply is limited compared to pandas. In some situations, Pandas returns the grouped keys as part of the index while cudf does not due to redundancy. For example:

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'a': [1, 1, 2, 2],
...     'b': [1, 2, 1, 2],
...     'c': [1, 2, 3, 4],
... })
>>> gdf = cudf.from_pandas(df)
>>> df.groupby('a').apply(lambda x: x.iloc[[0]])
     a  b  c
a
1 0  1  1  1
2 2  2  1  3
>>> gdf.groupby('a').apply(lambda x: x.iloc[[0]])
   a  b  c
0  1  1  1
2  2  1  3

[source]