libcudf  24.04.00
All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
Public Types | Public Member Functions | List of all members
cudf::hash_join Class Reference

Hash join that builds hash table in creation and probes results in subsequent *_join member functions. More...

#include <join.hpp>

Public Types

using impl_type = typename cudf::detail::hash_join< cudf::hashing::detail::MurmurHash3_x86_32< cudf::hash_value_type > >
 Implementation type.
 

Public Member Functions

 hash_join (hash_join const &)=delete
 
 hash_join (hash_join &&)=delete
 
hash_joinoperator= (hash_join const &)=delete
 
hash_joinoperator= (hash_join &&)=delete
 
 hash_join (cudf::table_view const &build, null_equality compare_nulls, rmm::cuda_stream_view stream=cudf::get_default_stream())
 Construct a hash join object for subsequent probe calls. More...
 
 hash_join (cudf::table_view const &build, nullable_join has_nulls, null_equality compare_nulls, rmm::cuda_stream_view stream=cudf::get_default_stream())
 Construct a hash join object for subsequent probe calls. More...
 
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > inner_join (cudf::table_view const &probe, std::optional< std::size_t > output_size={}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) const
 
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > left_join (cudf::table_view const &probe, std::optional< std::size_t > output_size={}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) const
 
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > full_join (cudf::table_view const &probe, std::optional< std::size_t > output_size={}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) const
 
std::size_t inner_join_size (cudf::table_view const &probe, rmm::cuda_stream_view stream=cudf::get_default_stream()) const
 
std::size_t left_join_size (cudf::table_view const &probe, rmm::cuda_stream_view stream=cudf::get_default_stream()) const
 
std::size_t full_join_size (cudf::table_view const &probe, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) const
 

Detailed Description

Hash join that builds hash table in creation and probes results in subsequent *_join member functions.

This class enables the hash join scheme that builds hash table once, and probes as many times as needed (possibly in parallel).

Definition at line 283 of file join.hpp.

Constructor & Destructor Documentation

◆ hash_join() [1/2]

cudf::hash_join::hash_join ( cudf::table_view const &  build,
null_equality  compare_nulls,
rmm::cuda_stream_view  stream = cudf::get_default_stream() 
)

Construct a hash join object for subsequent probe calls.

Note
The hash_join object must not outlive the table viewed by build, else behavior is undefined.
Parameters
buildThe build table, from which the hash table is built
compare_nullsControls whether null join-key values should match or not
streamCUDA stream used for device memory operations and kernel launches

◆ hash_join() [2/2]

cudf::hash_join::hash_join ( cudf::table_view const &  build,
nullable_join  has_nulls,
null_equality  compare_nulls,
rmm::cuda_stream_view  stream = cudf::get_default_stream() 
)

Construct a hash join object for subsequent probe calls.

Note
The hash_join object must not outlive the table viewed by build, else behavior is undefined.
Parameters
buildThe build table, from which the hash table is built
compare_nullsControls whether null join-key values should match or not
streamCUDA stream used for device memory operations and kernel launches
has_nullsFlag to indicate if there exists any nulls in the build table or any probe table that will be used later for join

Member Function Documentation

◆ full_join()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::hash_join::full_join ( cudf::table_view const &  probe,
std::optional< std::size_t >  output_size = {},
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
) const

Returns the row indices that can be used to construct the result of performing a full join between two tables.

See also
cudf::full_join(). Behavior is undefined if the provided output_size is smaller than the actual output size.
Parameters
probeThe probe table, from which the tuples are probed
output_sizeOptional value which allows users to specify the exact output size
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory.
Exceptions
cudf::logic_errorIf the input probe table has nulls while this hash_join object was not constructed with null check.
Returns
A pair of columns [left_indices, right_indices] that can be used to construct the result of performing a full join between two tables with build and probe as the join keys .

◆ full_join_size()

std::size_t cudf::hash_join::full_join_size ( cudf::table_view const &  probe,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
) const

Returns the exact number of matches (rows) when performing a full join with the specified probe table.

Parameters
probeThe probe table, from which the tuples are probed
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the intermediate table and columns' device memory.
Exceptions
cudf::logic_errorIf the input probe table has nulls while this hash_join object was not constructed with null check.
Returns
The exact number of output when performing a full join between two tables with build and probe as the join keys .

◆ inner_join()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::hash_join::inner_join ( cudf::table_view const &  probe,
std::optional< std::size_t >  output_size = {},
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
) const

Returns the row indices that can be used to construct the result of performing an inner join between two tables.

See also
cudf::inner_join(). Behavior is undefined if the provided output_size is smaller than the actual output size.
Parameters
probeThe probe table, from which the tuples are probed
output_sizeOptional value which allows users to specify the exact output size
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory.
Exceptions
cudf::logic_errorIf the input probe table has nulls while this hash_join object was not constructed with null check.
Returns
A pair of columns [left_indices, right_indices] that can be used to construct the result of performing an inner join between two tables with build and probe as the join keys .

◆ inner_join_size()

std::size_t cudf::hash_join::inner_join_size ( cudf::table_view const &  probe,
rmm::cuda_stream_view  stream = cudf::get_default_stream() 
) const

Returns the exact number of matches (rows) when performing an inner join with the specified probe table.

Parameters
probeThe probe table, from which the tuples are probed
streamCUDA stream used for device memory operations and kernel launches
Exceptions
cudf::logic_errorIf the input probe table has nulls while this hash_join object was not constructed with null check.
Returns
The exact number of output when performing an inner join between two tables with build and probe as the join keys .

◆ left_join()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::hash_join::left_join ( cudf::table_view const &  probe,
std::optional< std::size_t >  output_size = {},
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
) const

Returns the row indices that can be used to construct the result of performing a left join between two tables.

See also
cudf::left_join(). Behavior is undefined if the provided output_size is smaller than the actual output size.
Parameters
probeThe probe table, from which the tuples are probed
output_sizeOptional value which allows users to specify the exact output size
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory.
Exceptions
cudf::logic_errorIf the input probe table has nulls while this hash_join object was not constructed with null check.
Returns
A pair of columns [left_indices, right_indices] that can be used to construct the result of performing a left join between two tables with build and probe as the join keys .

◆ left_join_size()

std::size_t cudf::hash_join::left_join_size ( cudf::table_view const &  probe,
rmm::cuda_stream_view  stream = cudf::get_default_stream() 
) const

Returns the exact number of matches (rows) when performing a left join with the specified probe table.

Parameters
probeThe probe table, from which the tuples are probed
streamCUDA stream used for device memory operations and kernel launches
Exceptions
cudf::logic_errorIf the input probe table has nulls while this hash_join object was not constructed with null check.
Returns
The exact number of output when performing a left join between two tables with build and probe as the join keys .

The documentation for this class was generated from the following file: