chore: move pyspark tests into main test suite #1761

FBruzzesi · 2025-01-08T11:15:56Z

What type of PR is this? (check all applicable)

Related issues

Closes [Enh]: Move spark tests and constructor into main test suite #1755

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

FBruzzesi · 2025-01-08T11:16:36Z

narwhals/_spark_like/dataframe.py

+        expr = plx.all_horizontal(
+            *chain(predicates, (plx.col(name) == v for name, v in constraints.items()))
+        )


Needed to implement Expr.__eq__ to get this to work. It overlaps with @EdAbati PR

narwhals/_spark_like/group_by.py

tests/expr_and_series/reduction_test.py

…ev/narwhals into tests/pyspark-to-main

MarcoGorelli

awesome @FBruzzesi - not sure what's happening with tests in running in ci?

pyproject.toml

FBruzzesi · 2025-01-08T14:32:04Z

awesome @FBruzzesi - not sure what's happening with tests in running in ci?

We are not installing pyspark at all, therefore but now --all-cpu-constructors includes pyspark. However, some python version would not support pyspark at all I believe (3.12 and 3.13)

MarcoGorelli · 2025-01-08T14:48:14Z

ah i see - maybe --all-cpu-constructors should only include those which are available?

FBruzzesi · 2025-01-08T16:36:55Z

tests/conftest.py

+            if constructor == "pyspark":
+                if sys.version_info < (3, 12):
+                    constructors.append(pyspark_lazy_constructor())
+                else:
+                    continue


@MarcoGorelli maybe this is too much? 🙈

with pyspark 4.0.0 this would go 🤞

i think this is fine 👍

camriddell · 2025-01-08T22:14:58Z

tests/conftest.py

+                    module="pyspark",
+                    category=DeprecationWarning,
+                )
+                pd_df = pd.DataFrame(obj).replace({float("nan"): None}).reset_index()


If the objects that come into these constructors are (always?) dictionaries I think we can skip the trip through pandas and construct from a built-in Python object that spark knows how to ingest directly (list of dictionaries). Could be overly cautions, but Spark may infer data types differently if it is handed a pandas DataFrame rather than lists of Python objects.

Since pyspark supports a list of records we could convert dict → list of dicts like so

if isinstance(obj, dict): obj = [{k: v for k, v in zip(obj, row)} for row in zip(*obj.values())]

Or could pass in the rows & schema separately

if isinstance(obj, dict): df = ...createDataFrame([*zip(*obj.values())], schema=[*obj.keys()])

I remember having issues with some tests, where we may need to specify the schema with column type. (but I don't remember exactly what was the problem)

But if we can skip pandas here, it would be 👌👌👌

I had the same thought when migrating the codebase, yet I can confirm the data type being an issue for a subset of the tests. I would say to keep it like this for now and eventually address it

EdAbati

Thank you very much for doing this! 🙌

EdAbati · 2025-01-08T19:52:23Z

tests/conftest.py

-        yield session
-    session.stop()
+
+        register(session.stop)


TIL atexit.register, nice!

narwhals/_spark_like/group_by.py

EdAbati · 2025-01-08T20:33:28Z

tests/conftest.py

+            if constructor == "pyspark":
+                if sys.version_info < (3, 12):
+                    constructors.append(pyspark_lazy_constructor())
+                else:
+                    continue


with pyspark 4.0.0 this would go 🤞

FBruzzesi · 2025-01-09T11:49:05Z

pyproject.toml

+  'ignore:.*The distutils package is deprecated and slated for removal in Python 3.12:DeprecationWarning:pyspark',
+  'ignore:.*distutils Version classes are deprecated. Use packaging.version instead.*:DeprecationWarning:pyspark',


@MarcoGorelli I moved these back to pyproject.toml, yet targeting pyspark module. Would that work for you?

MarcoGorelli

thanks @FBruzzesi

non-tests changes look good

i'm about to go out so I didn't finish reading through all the changes in the tests folder, but if you checked them and there's no rogue changes feel free to merge

nice one! 🙌

…ev/narwhals into tests/pyspark-to-main

FBruzzesi · 2025-01-09T16:53:56Z

non-tests changes look good

i'm about to go out so I didn't finish reading through all the changes in the tests folder, but if you checked them and there's no rogue changes feel free to merge

nice one! 🙌

Thanks Marco! Aside for CI time increasing significantly for when pyspark runs (maybe we could skip the windows one and run pyspark on ubuntu only), I don't see a big risk for merging now.

It is such a better developer experience to add features with tests already there 🙈

EdAbati · 2025-01-09T17:21:08Z

Yeees thank you @FBruzzesi 🥳🥳🥳

MarcoGorelli · 2025-01-09T17:56:45Z

maybe we could skip the windows one and run pyspark on ubuntu only

yes, 👍 to this, windows is already really slow to run...

chore: move pyspark tests into main test suite

3aac922

FBruzzesi commented Jan 8, 2025

View reviewed changes

narwhals/_spark_like/group_by.py Outdated Show resolved Hide resolved

FBruzzesi commented Jan 8, 2025

View reviewed changes

tests/expr_and_series/reduction_test.py Outdated Show resolved Hide resolved

FBruzzesi and others added 4 commits January 8, 2025 12:30

Merge branch 'main' into tests/pyspark-to-main

16162af

delay call to pyspark constructor

789a05c

xfail from_dict, from_numpy

2934687

Merge branch 'tests/pyspark-to-main' of https://github.com/narwhals-d…

582081e

…ev/narwhals into tests/pyspark-to-main

MarcoGorelli reviewed Jan 8, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

FBruzzesi added 2 commits January 8, 2025 16:20

WIP

87d7ea5

very dynamic pyspark

71be730

FBruzzesi commented Jan 8, 2025

View reviewed changes

one more

b605ed3

FBruzzesi marked this pull request as ready for review January 8, 2025 17:00

FBruzzesi added internal tests labels Jan 8, 2025

camriddell reviewed Jan 8, 2025

View reviewed changes

EdAbati reviewed Jan 9, 2025

View reviewed changes

FBruzzesi added 5 commits January 9, 2025 08:53

feedback and tests

b5484fd

missing condition to xfail

f72fcc7

move warnings to pyproject

5ae3fee

merge main

387e089

statement order?

1bd6ffc

FBruzzesi commented Jan 9, 2025

View reviewed changes

FBruzzesi and others added 3 commits January 9, 2025 12:53

pragma no cover branch

0fbeb17

Merge branch 'main' into tests/pyspark-to-main

e41cea5

Merge branch 'main' into tests/pyspark-to-main

e4c8281

FBruzzesi requested a review from MarcoGorelli January 9, 2025 15:54

MarcoGorelli added 2 commits January 9, 2025 16:15

Merge remote-tracking branch 'upstream/main' into test-pyspark

d56b995

Merge remote-tracking branch 'upstream/main' into test-pyspark

1aba20d

MarcoGorelli approved these changes Jan 9, 2025

View reviewed changes

FBruzzesi added 2 commits January 9, 2025 17:42

solve tests conflicts

931993e

Merge branch 'tests/pyspark-to-main' of https://github.com/narwhals-d…

2dd890a

…ev/narwhals into tests/pyspark-to-main

FBruzzesi merged commit 20eb53b into main Jan 9, 2025
23 checks passed

FBruzzesi deleted the tests/pyspark-to-main branch January 9, 2025 16:54

		'ignore:.*The distutils package is deprecated and slated for removal in Python 3.12:DeprecationWarning:pyspark',
		'ignore:.distutils Version classes are deprecated. Use packaging.version instead.:DeprecationWarning:pyspark',

chore: move pyspark tests into main test suite #1761

chore: move pyspark tests into main test suite #1761

Uh oh!

Conversation

FBruzzesi commented Jan 8, 2025

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MarcoGorelli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FBruzzesi commented Jan 8, 2025

Uh oh!

MarcoGorelli commented Jan 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

camriddell Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EdAbati left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli left a comment

Choose a reason for hiding this comment

Uh oh!

FBruzzesi commented Jan 9, 2025

Uh oh!

Uh oh!

EdAbati commented Jan 9, 2025

Uh oh!

MarcoGorelli commented Jan 9, 2025

Uh oh!

Uh oh!

camriddell Jan 8, 2025 •

edited

Loading