cudf.core.column.string.StringMethods.extract#

StringMethods.extract(pat: str, flags: int = 0, expand: bool = True) → SeriesOrIndex#

Extract capture groups in the regex pat as columns in a DataFrame.

For each subject string in the Series, extract groups from the first match of regular expression pat.

Parameters

patstr: Regular expression pattern with capturing groups.
flagsint, default 0 (no flags): Flags to pass through to the regex engine (e.g. re.MULTILINE)
expandbool, default True: If True, return DataFrame with one column per capture group. If False, return a Series/Index if there is one capture group or DataFrame if there are multiple capture groups.

Returns

DataFrame or Series/Index: A DataFrame with one row for each subject string, and one column for each group. If expand=False and pat has only one capture group, then return a Series/Index.

Notes

The flags parameter currently only supports re.DOTALL and re.MULTILINE.

Examples

>>> import cudf
>>> s = cudf.Series(['a1', 'b2', 'c3'])
>>> s.str.extract(r'([ab])(\d)')
      0     1
0     a     1
1     b     2
2  <NA>  <NA>

A pattern with one group will return a DataFrame with one column if expand=True.

>>> s.str.extract(r'[ab](\d)', expand=True)
      0
0     1
1     2
2  <NA>

A pattern with one group will return a Series if expand=False.

>>> s.str.extract(r'[ab](\d)', expand=False)
0       1
1       2
2    <NA>
dtype: object