Skip to content

Expose stream-ordering in scalar and avro APIs #17766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jan 29, 2025
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion cpp/include/cudf/io/avro.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020-2024, NVIDIA CORPORATION.
* Copyright (c) 2020-2025, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -208,13 +208,15 @@ class avro_reader_options_builder {
* @endcode
*
* @param options Settings for controlling reading behavior
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate device memory of the table in the returned
* table_with_metadata
*
* @return The set of columns along with metadata
*/
table_with_metadata read_avro(
avro_reader_options const& options,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref());

/** @} */ // end of group
Expand Down
25 changes: 17 additions & 8 deletions cpp/include/cudf/scalar/scalar.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2024, NVIDIA CORPORATION.
* Copyright (c) 2019-2025, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -177,9 +177,12 @@ class fixed_width_scalar : public scalar {
void set_value(T value, rmm::cuda_stream_view stream = cudf::get_default_stream());

/**
* @brief Explicit conversion operator to get the value of the scalar on the host.
* @brief Returns the value of the scalar on the host.
*
* @param stream CUDA stream used for device memory operations.
* @return The value of the scalar
*/
explicit operator value_type() const;
T get_value(rmm::cuda_stream_view stream = cudf::get_default_stream()) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RMM just calls this value. We often try to avoid “get” in our function names.

Suggested change
T get_value(rmm::cuda_stream_view stream = cudf::get_default_stream()) const;
T value(rmm::cuda_stream_view stream = cudf::get_default_stream()) const;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, doesn’t value already exist just below here? We may not need get_value at all. Maybe we just need to delete the conversion operator since it does not take a stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I chose to introduce another function here because value can return either host or device data depending on the type. string_scalar::value returns a cudf::string_view on the device, while fixed_point::value and fixed_width::value copy to host. I was also unsure of using fixed_point::value directly since the conversion operator returns the fixed_point_value instead of the unscaled value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to modify string_scalar::value to return a host string view since it is used in several places that directly access the scalar on device. One option is to rename this function to string_scalar::d_value and introduce another string_scalar::value that returns a std::string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string_scalar::value() should return a cudf::string_view which is possible from host or device since it just wraps a pointer and a size held by the string_scalar. No device code is needed. Actually the stream parameter is not used at all.
To return a host std::string one would use the string_scalar::to_string() function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Bradley we should not have the get_value() functions and just keep the value() ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David, Bradley. I've retained the value functions and got rid of the conversion operators.


/**
* @brief Get the value of the scalar.
Expand Down Expand Up @@ -386,7 +389,7 @@ class fixed_point_scalar : public scalar {
rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref());

/**
* @brief Get the value of the scalar.
* @brief Get the unscaled value of the scalar.
*
* @param stream CUDA stream used for device memory operations.
* @return The value of the scalar
Expand All @@ -403,9 +406,12 @@ class fixed_point_scalar : public scalar {
rmm::cuda_stream_view stream = cudf::get_default_stream()) const;

/**
* @brief Explicit conversion operator to get the value of the scalar on the host.
* @brief Returns the value of the scalar as decimal32, decimal64 or decimal128 on the host.
*
* @param stream CUDA stream used for device memory operations.
* @return The value of the scalar
*/
explicit operator value_type() const;
T get_value(rmm::cuda_stream_view stream = cudf::get_default_stream()) const;

/**
* @brief Returns a raw pointer to the value in device memory.
Expand Down Expand Up @@ -516,9 +522,12 @@ class string_scalar : public scalar {
rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref());

/**
* @brief Explicit conversion operator to get the value of the scalar in a host std::string.
* @brief Returns the value of the scalar in a host std::string.
*
* @param stream CUDA stream used for device memory operations.
* @return The value of the scalar
*/
explicit operator std::string() const;
std::string get_value(rmm::cuda_stream_view stream = cudf::get_default_stream()) const;

/**
* @brief Get the value of the scalar in a host std::string.
Expand Down
6 changes: 4 additions & 2 deletions cpp/src/io/functions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,9 @@ std::vector<std::unique_ptr<data_sink>> make_datasinks(sink_info const& info)

} // namespace

table_with_metadata read_avro(avro_reader_options const& options, rmm::device_async_resource_ref mr)
table_with_metadata read_avro(avro_reader_options const& options,
rmm::cuda_stream_view stream,
rmm::device_async_resource_ref mr)
{
namespace avro = cudf::io::detail::avro;

Expand All @@ -199,7 +201,7 @@ table_with_metadata read_avro(avro_reader_options const& options, rmm::device_as

CUDF_EXPECTS(datasources.size() == 1, "Only a single source is currently supported.");

return avro::read_avro(std::move(datasources[0]), options, cudf::get_default_stream(), mr);
return avro::read_avro(std::move(datasources[0]), options, stream, mr);
}

table_with_metadata read_json(json_reader_options options,
Expand Down
15 changes: 9 additions & 6 deletions cpp/src/scalar/scalar.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2024, NVIDIA CORPORATION.
* Copyright (c) 2019-2025, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -110,7 +110,10 @@ size_type string_scalar::size() const { return _data.size(); }

char const* string_scalar::data() const { return static_cast<char const*>(_data.data()); }

string_scalar::operator std::string() const { return this->to_string(cudf::get_default_stream()); }
std::string string_scalar::get_value(rmm::cuda_stream_view stream) const
{
return this->to_string(stream);
}

std::string string_scalar::to_string(rmm::cuda_stream_view stream) const
{
Expand Down Expand Up @@ -184,9 +187,9 @@ T fixed_point_scalar<T>::fixed_point_value(rmm::cuda_stream_view stream) const
}

template <typename T>
fixed_point_scalar<T>::operator value_type() const
T fixed_point_scalar<T>::get_value(rmm::cuda_stream_view stream) const
{
return this->fixed_point_value(cudf::get_default_stream());
return this->fixed_point_value(stream);
}

template <typename T>
Expand Down Expand Up @@ -267,9 +270,9 @@ T const* fixed_width_scalar<T>::data() const
}

template <typename T>
fixed_width_scalar<T>::operator value_type() const
T fixed_width_scalar<T>::get_value(rmm::cuda_stream_view stream) const
{
return this->value(cudf::get_default_stream());
return this->value(stream);
}

/**
Expand Down
1 change: 1 addition & 0 deletions cpp/tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -725,6 +725,7 @@ ConfigureTest(STREAM_REPLACE_TEST streams/replace_test.cpp STREAM_MODE testing)
ConfigureTest(STREAM_RESHAPE_TEST streams/reshape_test.cpp STREAM_MODE testing)
ConfigureTest(STREAM_ROLLING_TEST streams/rolling_test.cpp STREAM_MODE testing)
ConfigureTest(STREAM_ROUND_TEST streams/round_test.cpp STREAM_MODE testing)
ConfigureTest(STREAM_SCALAR_TEST streams/scalar_test.cpp STREAM_MODE testing)
ConfigureTest(STREAM_SEARCH_TEST streams/search_test.cpp STREAM_MODE testing)
ConfigureTest(STREAM_SORTING_TEST streams/sorting_test.cpp STREAM_MODE testing)
ConfigureTest(STREAM_STREAM_COMPACTION_TEST streams/stream_compaction_test.cpp STREAM_MODE testing)
Expand Down
6 changes: 3 additions & 3 deletions cpp/tests/binaryop/assert-binops.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2023, NVIDIA CORPORATION.
* Copyright (c) 2019-2025, NVIDIA CORPORATION.
*
* Copyright 2018-2019 BlazingDB, Inc.
* Copyright 2018 Christian Noboa Mardini <christian@blazingdb.com>
Expand Down Expand Up @@ -81,7 +81,7 @@ void ASSERT_BINOP(cudf::column_view const& out,
TypeOp&& op,
ValueComparator const& value_comparator = ValueComparator())
{
auto lhs_h = static_cast<ScalarType const&>(lhs).operator TypeLhs();
auto lhs_h = static_cast<ScalarType const&>(lhs).get_value(cudf::get_default_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refactor the tests like this:

Suggested change
auto lhs_h = static_cast<ScalarType const&>(lhs).get_value(cudf::get_default_stream());
auto lhs_h = static_cast<ScalarType const&>(lhs).value(cudf::get_default_stream());

Copy link
Contributor Author

@shrshi shrshi Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a scalar_host_value function that checks which scalar class is passed since the conversion operator in string_scalar and fixed_point_scalar called to_string and fixed_point_value respectively.

auto rhs_h = cudf::test::to_host<TypeRhs>(rhs);
auto rhs_data = rhs_h.first;
auto out_h = cudf::test::to_host<TypeOut>(out);
Expand Down Expand Up @@ -129,7 +129,7 @@ void ASSERT_BINOP(cudf::column_view const& out,
TypeOp&& op,
ValueComparator const& value_comparator = ValueComparator())
{
auto rhs_h = static_cast<ScalarType const&>(rhs).operator TypeRhs();
auto rhs_h = static_cast<ScalarType const&>(rhs).get_value(cudf::get_default_stream());
auto lhs_h = cudf::test::to_host<TypeLhs>(lhs);
auto lhs_data = lhs_h.first;
auto out_h = cudf::test::to_host<TypeOut>(out);
Expand Down
50 changes: 50 additions & 0 deletions cpp/tests/streams/scalar_test.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
/*
* Copyright (c) 2023-2025, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include <cudf_test/base_fixture.hpp>
#include <cudf_test/column_wrapper.hpp>
#include <cudf_test/default_stream.hpp>
#include <cudf_test/table_utilities.hpp>
#include <cudf_test/testing_main.hpp>
#include <cudf_test/type_lists.hpp>

#include <cudf/scalar/scalar.hpp>

template <typename T>
struct TypedScalarTest : public cudf::test::BaseFixture {};

TYPED_TEST_SUITE(TypedScalarTest, cudf::test::FixedWidthTypes);

TYPED_TEST(TypedScalarTest, DefaultValidity)
{
using Type = cudf::device_storage_type_t<TypeParam>;
Type value = static_cast<Type>(cudf::test::make_type_param_scalar<TypeParam>(7));
cudf::scalar_type_t<TypeParam> s(value);
CUDF_EXPECT_NO_THROW(s.get_value(cudf::test::get_default_stream()));

EXPECT_TRUE(s.is_valid());
EXPECT_EQ(value, s.value());
}

struct StringScalarTest : public cudf::test::BaseFixture {};

TEST_F(StringScalarTest, DefaultValidity)
{
std::string value = "test string";
auto s = cudf::string_scalar(value);
CUDF_EXPECT_NO_THROW(s.get_value(cudf::test::get_default_stream()));
EXPECT_TRUE(s.is_valid());
}
Loading