python[minor] update evaluate to be concurrent #1345

isahers1 · 2024-12-19T02:46:47Z

No description provided.

hinthornw · 2024-12-19T18:18:20Z

python/langsmith/evaluation/_arunner.py

@@ -642,6 +651,61 @@ async def astart(self) -> _AsyncExperimentManager:
            upload_results=self._upload_results,
        )

+    async def awith_predictions_and_evaluators(


We could probably do something similar to what we do in the sync version to avoid having to duplicate logic here (basically share a semaphor)

baskaryan · 2024-12-23T14:51:27Z

python/langsmith/evaluation/_arunner.py

+        evaluators = _resolve_evaluators(evaluators)
+
+        if not hasattr(self, "_evaluator_executor"):
+            self._evaluator_executor = cf.ThreadPoolExecutor(max_workers=4)


ooc where's the 4 come from?

I copied the value from _ascore - not really sure beyond that

baskaryan · 2024-12-23T14:53:20Z

python/langsmith/evaluation/_arunner.py

                    )
+                    async with lock:


could we just return the selected_results in _run_single_evaluator and construct the eval_results after the asycio.gather? to avoid needing to lock?

should be fixed, but someone should check I did it correctly

baskaryan · 2024-12-23T14:53:48Z

python/langsmith/evaluation/evaluator.py

+                        {
+                            name: {
+                                "presigned_url": value["presigned_url"],
+                                "reader": io.BytesIO(value["reader"].getvalue()),


would love @agola11's input on this bit

agola11 · 2025-01-10T01:31:42Z

python/langsmith/evaluation/_arunner.py

+                new_attachments[name] = {
+                    "presigned_url": attachment["presigned_url"],
+                    "reader": io.BytesIO(
+                        self._attachment_raw_data_dict[str(example.id) + name]


you're sure this doesn't copy the bytes?

No, you are correct. io.BytesIO copies the underlying bytes. This is wrong, I am working on a fix rn.

ehh actually I am going to walk back my statement. based on testing I don't think bytesIO copies the data.

python/langsmith/evaluation/_arunner.py

isahers1 · 2025-01-14T20:47:25Z

python/tests/unit_tests/evaluation/test_runner.py

@@ -617,7 +617,6 @@ def summary_eval_outputs_reference(outputs, reference_outputs):
        tolerance = 3
        assert total_slow < tolerance
        assert total_quick > (SPLIT_SIZE * NUM_REPETITIONS - 1) - tolerance
-        assert any([d > 1 for d in deltas])


@hinthornw I made this change to pass CI, but I would appreciate your review.

It's basically meant to test that we aren't iterating as two phases predict -> evaluate but instead doing generate -> evaluate as a continuous, eager stream.

I don't think we should remove this test

hinthornw · 2025-01-16T01:00:11Z

python/langsmith/evaluation/_runner.py

+        return schemas.Example(
+            id=example.id,
+            created_at=example.created_at,
+            dataset_id=example.dataset_id,


Seems likely that we'll forget to update this when we add a new field - if we haven't added a test for this in the previous version, would like one

hinthornw · 2025-01-16T01:01:08Z

python/langsmith/evaluation/_arunner.py

-            manager = await manager.awith_summary_evaluators(summary_evaluators)
+            if evaluators:
+                # Run predictions and evaluations in a single pipeline
+                manager = await manager.awith_predictions_and_evaluators(


If predictions are streamed out do we need a separate method?

draft

696e4bf

hinthornw reviewed Dec 19, 2024

View reviewed changes

baskaryan added 2 commits December 23, 2024 09:40

fmt

a558981

fmt

117e8c6

baskaryan reviewed Dec 23, 2024

View reviewed changes

isahers1 added 4 commits December 23, 2024 09:03

bagatur comments

3c8424c

fix test

2bf063f

fmt

982750c

fmt

c9071a4

agola11 reviewed Jan 10, 2025

View reviewed changes

isahers1 added 3 commits January 10, 2025 09:58

ankush comment

41cd64f

fmt

628c3c3

edits

f00e2f6

agola11 approved these changes Jan 14, 2025

View reviewed changes

isahers1 and others added 4 commits January 14, 2025 07:32

Merge branch 'main' into isaac/evaluateconcurrent

86b44a9

fmt

bb30da1

test fix

92cb47a

test fix

1652e01

isahers1 commented Jan 14, 2025

View reviewed changes

hinthornw reviewed Jan 16, 2025

View reviewed changes

baskaryan changed the title ~~[DRAFT] update evaluate to be concurrent~~ python[minor] update evaluate to be concurrent Jan 21, 2025

baskaryan changed the base branch from main to py-version-0.3.0 January 21, 2025 03:30

baskaryan added 5 commits January 20, 2025 19:32

merge

700868a

undo

4d7c2e1

bump time limit

8252fe4

loosen

9364932

loosen

a81b4df

baskaryan added 4 commits January 20, 2025 21:42

loosen

8a6a7e2

update

303b2c1

fmt

3892b23

update

3cfe340

baskaryan merged commit b812149 into py-version-0.3.0 Jan 21, 2025
5 checks passed

baskaryan deleted the isaac/evaluateconcurrent branch January 21, 2025 06:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python[minor] update evaluate to be concurrent #1345

python[minor] update evaluate to be concurrent #1345

isahers1 commented Dec 19, 2024

hinthornw Dec 19, 2024

baskaryan Dec 23, 2024

isahers1 Dec 23, 2024

baskaryan Dec 23, 2024

isahers1 Dec 23, 2024

baskaryan Dec 23, 2024

agola11 Jan 10, 2025

isahers1 Jan 10, 2025

isahers1 Jan 10, 2025

isahers1 Jan 14, 2025

hinthornw Jan 15, 2025

hinthornw Jan 16, 2025

hinthornw Jan 16, 2025

hinthornw Jan 16, 2025

python[minor] update evaluate to be concurrent #1345

python[minor] update evaluate to be concurrent #1345

Conversation

isahers1 commented Dec 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment