chore: move pyspark tests into main test suite #1761

FBruzzesi · 2025-01-08T11:15:56Z

What type of PR is this? (check all applicable)

Related issues

Closes [Enh]: Move spark tests and constructor into main test suite #1755

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

FBruzzesi · 2025-01-08T11:16:36Z

narwhals/_spark_like/dataframe.py

+        expr = plx.all_horizontal(
+            *chain(predicates, (plx.col(name) == v for name, v in constraints.items()))
+        )


Needed to implement Expr.__eq__ to get this to work. It overlaps with @EdAbati PR

narwhals/_spark_like/group_by.py

FBruzzesi · 2025-01-08T11:21:55Z

tests/expr_and_series/reduction_test.py

@@ -62,7 +62,7 @@ def test_scalar_reduction_with_columns(
    expected: dict[str, list[Any]],
    request: pytest.FixtureRequest,
 ) -> None:
-    if "duckdb" in str(constructor):
+    if "duckdb" in str(constructor) or ("pyspark" in str(constructor)):


Only one of the 5 tests is passing (leading to [XPASS(strict)])

…ev/narwhals into tests/pyspark-to-main

MarcoGorelli

awesome @FBruzzesi - not sure what's happening with tests in running in ci?

pyproject.toml

FBruzzesi · 2025-01-08T14:32:04Z

awesome @FBruzzesi - not sure what's happening with tests in running in ci?

We are not installing pyspark at all, therefore but now --all-cpu-constructors includes pyspark. However, some python version would not support pyspark at all I believe (3.12 and 3.13)

MarcoGorelli · 2025-01-08T14:48:14Z

ah i see - maybe --all-cpu-constructors should only include those which are available?

FBruzzesi · 2025-01-08T16:36:55Z

tests/conftest.py

+            if constructor == "pyspark":
+                if sys.version_info < (3, 12):
+                    constructors.append(pyspark_lazy_constructor())
+                else:
+                    continue


@MarcoGorelli maybe this is too much? 🙈

with pyspark 4.0.0 this would go 🤞

i think this is fine 👍

camriddell · 2025-01-08T22:14:58Z

tests/conftest.py

+                    module="pyspark",
+                    category=DeprecationWarning,
+                )
+                pd_df = pd.DataFrame(obj).replace({float("nan"): None}).reset_index()


If the objects that come into these constructors are (always?) dictionaries I think we can skip the trip through pandas and construct from a built-in Python object that spark knows how to ingest directly (list of dictionaries). Could be overly cautions, but Spark may infer data types differently if it is handed a pandas DataFrame rather than lists of Python objects.

Since pyspark supports a list of records we could convert dict → list of dicts like so

if isinstance(obj, dict): obj = [{k: v for k, v in zip(obj, row)} for row in zip(*obj.values())]

Or could pass in the rows & schema separately

if isinstance(obj, dict): df = ...createDataFrame([*zip(*obj.values())], schema=[*obj.keys()])

I remember having issues with some tests, where we may need to specify the schema with column type. (but I don't remember exactly what was the problem)

But if we can skip pandas here, it would be 👌👌👌

I had the same thought when migrating the codebase, yet I can confirm the data type being an issue for a subset of the tests. I would say to keep it like this for now and eventually address it

EdAbati

Thank you very much for doing this! 🙌

EdAbati · 2025-01-08T19:52:23Z

tests/conftest.py

-        yield session
-    session.stop()
+
+        register(session.stop)


TIL atexit.register, nice!

narwhals/_spark_like/group_by.py

EdAbati · 2025-01-08T20:33:28Z

tests/conftest.py

+            if constructor == "pyspark":
+                if sys.version_info < (3, 12):
+                    constructors.append(pyspark_lazy_constructor())
+                else:
+                    continue


with pyspark 4.0.0 this would go 🤞

FBruzzesi · 2025-01-09T11:49:05Z

pyproject.toml

+  'ignore:.*The distutils package is deprecated and slated for removal in Python 3.12:DeprecationWarning:pyspark',
+  'ignore:.*distutils Version classes are deprecated. Use packaging.version instead.*:DeprecationWarning:pyspark',


@MarcoGorelli I moved these back to pyproject.toml, yet targeting pyspark module. Would that work for you?

chore: move pyspark tests into main test suite

3aac922

FBruzzesi commented Jan 8, 2025

View reviewed changes

narwhals/_spark_like/group_by.py Outdated Show resolved Hide resolved

FBruzzesi commented Jan 8, 2025

View reviewed changes

FBruzzesi and others added 4 commits January 8, 2025 12:30

Merge branch 'main' into tests/pyspark-to-main

16162af

delay call to pyspark constructor

789a05c

xfail from_dict, from_numpy

2934687

Merge branch 'tests/pyspark-to-main' of https://github.com/narwhals-d…

582081e

…ev/narwhals into tests/pyspark-to-main

MarcoGorelli reviewed Jan 8, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

FBruzzesi added 2 commits January 8, 2025 16:20

WIP

87d7ea5

very dynamic pyspark

71be730

FBruzzesi commented Jan 8, 2025

View reviewed changes

one more

b605ed3

FBruzzesi marked this pull request as ready for review January 8, 2025 17:00

FBruzzesi added internal tests labels Jan 8, 2025

camriddell reviewed Jan 8, 2025

View reviewed changes

EdAbati reviewed Jan 9, 2025

View reviewed changes

FBruzzesi added 5 commits January 9, 2025 08:53

feedback and tests

b5484fd

missing condition to xfail

f72fcc7

move warnings to pyproject

5ae3fee

merge main

387e089

statement order?

1bd6ffc

FBruzzesi commented Jan 9, 2025

View reviewed changes

FBruzzesi added 2 commits January 9, 2025 12:53

pragma no cover branch

0fbeb17

Merge branch 'main' into tests/pyspark-to-main

e41cea5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: move pyspark tests into main test suite #1761

chore: move pyspark tests into main test suite #1761

FBruzzesi commented Jan 8, 2025

FBruzzesi Jan 8, 2025

FBruzzesi Jan 8, 2025

MarcoGorelli left a comment

FBruzzesi commented Jan 8, 2025

MarcoGorelli commented Jan 8, 2025

FBruzzesi Jan 8, 2025

EdAbati Jan 8, 2025

MarcoGorelli Jan 9, 2025

camriddell Jan 8, 2025 •

edited

Loading

EdAbati Jan 9, 2025

FBruzzesi Jan 9, 2025

EdAbati left a comment

EdAbati Jan 8, 2025

EdAbati Jan 8, 2025

FBruzzesi Jan 9, 2025

MarcoGorelli Jan 9, 2025

		'ignore:.*The distutils package is deprecated and slated for removal in Python 3.12:DeprecationWarning:pyspark',
		'ignore:.distutils Version classes are deprecated. Use packaging.version instead.:DeprecationWarning:pyspark',

chore: move pyspark tests into main test suite #1761

Are you sure you want to change the base?

chore: move pyspark tests into main test suite #1761

Conversation

FBruzzesi commented Jan 8, 2025

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

FBruzzesi commented Jan 8, 2025

MarcoGorelli commented Jan 8, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

camriddell Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EdAbati left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

camriddell Jan 8, 2025 •

edited

Loading