Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enh]: cast expr in SparkLike #1743

Open
lucas-nelson-uiuc opened this issue Jan 6, 2025 · 0 comments
Open

[Enh]: cast expr in SparkLike #1743

lucas-nelson-uiuc opened this issue Jan 6, 2025 · 0 comments

Comments

@lucas-nelson-uiuc
Copy link
Contributor

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

No response

Please describe the purpose of the new feature or describe the problem to solve.

Currently working on implementing the cast expression in SparkLike - wanted to take this issue to discuss + list out all the key considerations with Spark's data types.

Suggest a solution if possible.

So far, I'm able to implement a handful of data types. However, I noticed that some types cannot (yet) be implemented or I'm uncertain how they'd be implemented.

Able to implement (currently testing)

  • Float64, Float32, Int64, Int32, Int16, Decimal
  • String
  • Boolean
  • ArrayType (see comment below)
  • Struct, Field

Cannot (yet) implement

  • No native support for unsigned integers (UInt8, UInt16, UInt32, UInt64)
  • No native support for categorical types (Enum, Categorical)
  • No direct parameterization of datetime information (would need to set in/get from Spark configuration)
  • No native support for other dtypes (Object, Unknown)

Unsure how to implement

  • pyspark.types.StructField contains more than just name and dtype - is it worth updating narwhals.dtypes.Field to have these additional (optional) parameters to accommodate PySpark?
    • nullable: whether the field can be null (None) or not
    • metadata: additional information about the field
  • I think PySpark's ArrayType functions like Polars' List type (at least it doesn't have a width constraint)
    • Should we map pyspark.types.ArrayType to narwhals.dtypes.List?
    • Is there a way we could implement an Array type that aligns with Polars?

Let me know what anyone thinks about the above. Feel free to add onto this with other types we could add in. Thanks!

If you have tried alternatives, please describe them below.

No response

Additional information that may help us understand your needs.

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant