libcudf
24.04.00
|
Files | |
file | avro.hpp |
file | csv.hpp |
file | io/json.hpp |
file | orc.hpp |
file | parquet.hpp |
Enumerations | |
enum class | cudf::io::json_recovery_mode_t { cudf::io::FAIL , cudf::io::RECOVER_WITH_NULL } |
Control the error recovery behavior of the json parser. More... | |
Functions | |
table_with_metadata | cudf::io::read_avro (avro_reader_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Reads an Avro dataset into a set of columns. More... | |
table_with_metadata | cudf::io::read_csv (csv_reader_options options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Reads a CSV dataset into a set of columns. More... | |
table_with_metadata | cudf::io::read_json (json_reader_options options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Reads a JSON dataset into a set of columns. More... | |
table_with_metadata | cudf::io::read_orc (orc_reader_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Reads an ORC dataset into a set of columns. More... | |
raw_orc_statistics | cudf::io::read_raw_orc_statistics (source_info const &src_info, rmm::cuda_stream_view stream=cudf::get_default_stream()) |
Reads file-level and stripe-level statistics of ORC dataset. More... | |
parsed_orc_statistics | cudf::io::read_parsed_orc_statistics (source_info const &src_info, rmm::cuda_stream_view stream=cudf::get_default_stream()) |
Reads file-level and stripe-level statistics of ORC dataset. More... | |
orc_metadata | cudf::io::read_orc_metadata (source_info const &src_info, rmm::cuda_stream_view stream=cudf::get_default_stream()) |
Reads metadata of ORC dataset. More... | |
table_with_metadata | cudf::io::read_parquet (parquet_reader_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) |
Reads a Parquet dataset into a set of columns. More... | |
parquet_metadata | cudf::io::read_parquet_metadata (source_info const &src_info) |
Reads metadata of parquet dataset. More... | |
Variables | |
constexpr size_t | cudf::io::default_stripe_size_bytes = 64 * 1024 * 1024 |
64MB default orc stripe size | |
constexpr size_type | cudf::io::default_stripe_size_rows = 1000000 |
1M rows default orc stripe rows | |
constexpr size_type | cudf::io::default_row_index_stride = 10000 |
10K rows default orc row index stride | |
constexpr size_t | cudf::io::default_row_group_size_bytes = 128 * 1024 * 1024 |
128MB per row group | |
constexpr size_type | cudf::io::default_row_group_size_rows = 1000000 |
1 million rows per row group | |
constexpr size_t | cudf::io::default_max_page_size_bytes = 512 * 1024 |
512KB per page | |
constexpr size_type | cudf::io::default_max_page_size_rows = 20000 |
20k rows per page | |
constexpr int32_t | cudf::io::default_column_index_truncate_length = 64 |
truncate to 64 bytes | |
constexpr size_t | cudf::io::default_max_dictionary_size = 1024 * 1024 |
1MB dictionary size | |
constexpr size_type | cudf::io::default_max_page_fragment_size = 5000 |
5000 rows per page fragment | |
|
strong |
Control the error recovery behavior of the json parser.
Enumerator | |
---|---|
FAIL | Does not recover from an error when encountering an invalid format. |
RECOVER_WITH_NULL | Recovers from an error, replacing invalid records with null. |
Definition at line 60 of file io/json.hpp.
table_with_metadata cudf::io::read_avro | ( | avro_reader_options const & | options, |
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Reads an Avro dataset into a set of columns.
The following code snippet demonstrates how to read a dataset from a file:
options | Settings for controlling reading behavior |
mr | Device memory resource used to allocate device memory of the table in the returned table_with_metadata |
table_with_metadata cudf::io::read_csv | ( | csv_reader_options | options, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Reads a CSV dataset into a set of columns.
The following code snippet demonstrates how to read a dataset from a file:
options | Settings for controlling reading behavior |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate device memory of the table in the returned table_with_metadata |
table_with_metadata cudf::io::read_json | ( | json_reader_options | options, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Reads a JSON dataset into a set of columns.
The following code snippet demonstrates how to read a dataset from a file:
options | Settings for controlling reading behavior |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate device memory of the table in the returned table_with_metadata. |
table_with_metadata cudf::io::read_orc | ( | orc_reader_options const & | options, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Reads an ORC dataset into a set of columns.
The following code snippet demonstrates how to read a dataset from a file:
options | Settings for controlling reading behavior |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate device memory of the table in the returned table_with_metadata. |
orc_metadata cudf::io::read_orc_metadata | ( | source_info const & | src_info, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Reads metadata of ORC dataset.
src_info | Dataset source |
stream | CUDA stream used for device memory operations and kernel launches |
table_with_metadata cudf::io::read_parquet | ( | parquet_reader_options const & | options, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Reads a Parquet dataset into a set of columns.
The following code snippet demonstrates how to read a dataset from a file:
options | Settings for controlling reading behavior |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate device memory of the table in the returned table_with_metadata |
parquet_metadata cudf::io::read_parquet_metadata | ( | source_info const & | src_info | ) |
Reads metadata of parquet dataset.
src_info | Dataset source |
parsed_orc_statistics cudf::io::read_parsed_orc_statistics | ( | source_info const & | src_info, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Reads file-level and stripe-level statistics of ORC dataset.
src_info | Dataset source |
stream | CUDA stream used for device memory operations and kernel launches |
raw_orc_statistics cudf::io::read_raw_orc_statistics | ( | source_info const & | src_info, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
) |
Reads file-level and stripe-level statistics of ORC dataset.
The following code snippet demonstrates how to read statistics of a dataset from a file:
src_info | Dataset source |
stream | CUDA stream used for device memory operations and kernel launches |