You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?
No response
Please describe the purpose of the new feature or describe the problem to solve.
Currently working on implementing the cast expression in SparkLike - wanted to take this issue to discuss + list out all the key considerations with Spark's data types.
Suggest a solution if possible.
So far, I'm able to implement a handful of data types. However, I noticed that some types cannot (yet) be implemented or I'm uncertain how they'd be implemented.
Able to implement (currently testing)
Float64, Float32, Int64, Int32, Int16, Decimal
String
Boolean
ArrayType (see comment below)
Struct, Field
Cannot (yet) implement
No native support for unsigned integers (UInt8, UInt16, UInt32, UInt64)
No native support for categorical types (Enum, Categorical)
No direct parameterization of datetime information (would need to set in/get from Spark configuration)
No native support for other dtypes (Object, Unknown)
Unsure how to implement
pyspark.types.StructField contains more than just name and dtype - is it worth updating narwhals.dtypes.Field to have these additional (optional) parameters to accommodate PySpark?
nullable: whether the field can be null (None) or not
metadata: additional information about the field
I think PySpark's ArrayType functions like Polars' List type (at least it doesn't have a width constraint)
Should we map pyspark.types.ArrayType to narwhals.dtypes.List?
Is there a way we could implement an Array type that aligns with Polars?
Let me know what anyone thinks about the above. Feel free to add onto this with other types we could add in. Thanks!
If you have tried alternatives, please describe them below.
No response
Additional information that may help us understand your needs.
No response
The text was updated successfully, but these errors were encountered:
We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?
No response
Please describe the purpose of the new feature or describe the problem to solve.
Currently working on implementing the
cast
expression inSparkLike
- wanted to take this issue to discuss + list out all the key considerations with Spark's data types.Suggest a solution if possible.
So far, I'm able to implement a handful of data types. However, I noticed that some types cannot (yet) be implemented or I'm uncertain how they'd be implemented.
Able to implement (currently testing)
Float64
,Float32
,Int64
,Int32
,Int16
,Decimal
String
Boolean
ArrayType
(see comment below)Struct
,Field
Cannot (yet) implement
UInt8
,UInt16
,UInt32
,UInt64
)Enum
,Categorical
)Object
,Unknown
)Unsure how to implement
pyspark.types.StructField
contains more than justname
anddtype
- is it worth updatingnarwhals.dtypes.Field
to have these additional (optional) parameters to accommodate PySpark?nullable
: whether the field can be null (None) or notmetadata
: additional information about the fieldArrayType
functions like Polars'List
type (at least it doesn't have a width constraint)pyspark.types.ArrayType
tonarwhals.dtypes.List
?Array
type that aligns with Polars?Let me know what anyone thinks about the above. Feel free to add onto this with other types we could add in. Thanks!
If you have tried alternatives, please describe them below.
No response
Additional information that may help us understand your needs.
No response
The text was updated successfully, but these errors were encountered: