Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(py): Rename output_keys to reference_output_keys #1499

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 31 additions & 19 deletions python/langsmith/testing/_internal.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from __future__ import annotations

Check notice on line 1 in python/langsmith/testing/_internal.py

View workflow job for this annotation

GitHub Actions / benchmark

Benchmark results

........... create_5_000_run_trees: Mean +- std dev: 680 ms +- 64 ms ........... create_10_000_run_trees: Mean +- std dev: 1.35 sec +- 0.13 sec ........... create_20_000_run_trees: Mean +- std dev: 2.68 sec +- 0.17 sec ........... dumps_class_nested_py_branch_and_leaf_200x400: Mean +- std dev: 714 us +- 17 us ........... dumps_class_nested_py_leaf_50x100: Mean +- std dev: 25.1 ms +- 0.4 ms ........... dumps_class_nested_py_leaf_100x200: Mean +- std dev: 104 ms +- 3 ms ........... dumps_dataclass_nested_50x100: Mean +- std dev: 25.2 ms +- 0.2 ms ........... WARNING: the benchmark result may be unstable * the standard deviation (16.4 ms) is 23% of the mean (72.1 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. dumps_pydantic_nested_50x100: Mean +- std dev: 72.1 ms +- 16.4 ms ........... dumps_pydanticv1_nested_50x100: Mean +- std dev: 197 ms +- 3 ms

Check notice on line 1 in python/langsmith/testing/_internal.py

View workflow job for this annotation

GitHub Actions / benchmark

Comparison against main

+-----------------------------------------------+----------+------------------------+ | Benchmark | main | changes | +===============================================+==========+========================+ | dumps_pydanticv1_nested_50x100 | 221 ms | 197 ms: 1.12x faster | +-----------------------------------------------+----------+------------------------+ | create_10_000_run_trees | 1.42 sec | 1.35 sec: 1.05x faster | +-----------------------------------------------+----------+------------------------+ | create_20_000_run_trees | 2.78 sec | 2.68 sec: 1.03x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_50x100 | 25.7 ms | 25.1 ms: 1.02x faster | +-----------------------------------------------+----------+------------------------+ | dumps_dataclass_nested_50x100 | 25.7 ms | 25.2 ms: 1.02x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_100x200 | 105 ms | 104 ms: 1.01x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_branch_and_leaf_200x400 | 715 us | 714 us: 1.00x faster | +-----------------------------------------------+----------+------------------------+ | create_5_000_run_trees | 674 ms | 680 ms: 1.01x slower | +-----------------------------------------------+----------+------------------------+ | dumps_pydantic_nested_50x100 | 68.6 ms | 72.1 ms: 1.05x slower | +-----------------------------------------------+----------+------------------------+ | Geometric mean | (ref) | 1.02x faster | +-----------------------------------------------+----------+------------------------+

import atexit
import contextlib
Expand Down Expand Up @@ -69,6 +69,7 @@
*,
id: Optional[uuid.UUID] = None,
output_keys: Optional[Sequence[str]] = None,
reference_output_keys: Optional[Sequence[str]] = None,
client: Optional[ls_client.Client] = None,
test_suite_name: Optional[str] = None,
) -> Callable[[Callable], Callable]: ...
Expand All @@ -86,9 +87,13 @@
- id (Optional[uuid.UUID]): A unique identifier for the test case. If not
provided, an ID will be generated based on the test function's module
and name.
- output_keys (Optional[Sequence[str]]): A list of keys to be considered as
the output keys for the test case. These keys will be extracted from the
test function's inputs and stored as the expected outputs.
- output_keys (Optional[Sequence[str]], deprecated): Use
"reference_output_keys" instead. A list of keys to be considered as the
output keys for the test case.
- reference_output_keys (Optional[Sequence[str]]): A list of keys to be
considered as the reference output keys for the test case. These keys
will be extracted from the test function's inputs and stored as the
expected outputs.
- client (Optional[ls_client.Client]): An instance of the LangSmith client
to be used for communication with the LangSmith service. If not provided,
a default client will be used.
Expand Down Expand Up @@ -238,7 +243,7 @@
import pytest


@pytest.mark.langsmith(output_keys=["expected"])
@pytest.mark.langsmith(reference_output_keys=["expected"])
@pytest.mark.parametrize(
"a, b, expected",
[
Expand Down Expand Up @@ -266,7 +271,7 @@
assert 3 * 4 == 12

By default, all test inputs are saved as "inputs" to a dataset.
You can specify the `output_keys` argument to persist those keys
You can specify the `reference_output_keys` argument to persist those keys
within the dataset's "outputs" fields.

.. code-block:: python
Expand All @@ -279,7 +284,7 @@
return "input"


@pytest.mark.langsmith(output_keys=["expected_output"])
@pytest.mark.langsmith(reference_output_keys=["expected_output"])
def test_with_expected_output(some_input: str, expected_output: str):
assert expected_output in some_input

Expand All @@ -297,9 +302,18 @@
test_openai_says_hello()
test_addition_with_multiple_inputs(1, 2, 3)
"""
if "output_keys" in kwargs:
warnings.warn(
"The `output_keys` keyword argument is deprecated."
"Please use `reference_output_keys` instead.",
DeprecationWarning,
)
reference_output_keys = kwargs.pop("output_keys")
else:
reference_output_keys = kwargs.pop("reference_output_keys", None)
langtest_extra = _UTExtra(
id=kwargs.pop("id", None),
output_keys=kwargs.pop("output_keys", None),
reference_output_keys=reference_output_keys,
client=kwargs.pop("client", None),
test_suite_name=kwargs.pop("test_suite_name", None),
cache=ls_utils.get_cache_dir(kwargs.pop("cache", None)),
Expand Down Expand Up @@ -691,6 +705,8 @@
self.pytest_plugin = pytest_plugin
self.pytest_nodeid = pytest_nodeid
self._logged_reference_outputs: Optional[dict] = None
self.inputs = inputs
self.reference_outputs = reference_outputs

if pytest_plugin and pytest_nodeid:
pytest_plugin.add_process_to_test_suite(
Expand Down Expand Up @@ -787,7 +803,7 @@
class _UTExtra(TypedDict, total=False):
client: Optional[ls_client.Client]
id: Optional[uuid.UUID]
output_keys: Optional[Sequence[str]]
reference_output_keys: Optional[Sequence[str]]
test_suite_name: Optional[str]
cache: Optional[str]

Expand All @@ -808,19 +824,19 @@
**kwargs: Any,
) -> _TestCase:
client = langtest_extra["client"] or rt.get_cached_client()
output_keys = langtest_extra["output_keys"]
reference_output_keys = langtest_extra["reference_output_keys"]
signature = inspect.signature(func)
inputs = rh._get_inputs_safe(signature, *args, **kwargs) or None
outputs = None
if output_keys:
if reference_output_keys:
outputs = {}
if not inputs:
msg = (
"'output_keys' should only be specified when marked test function has "
"input arguments."
"`reference_output_keys` should only be specified when",
"marked test function has input arguments.",
)
raise ValueError(msg)
for k in output_keys:
for k in reference_output_keys:
outputs[k] = inputs.pop(k, None)
test_suite = _LangSmithTestSuite.from_test(
client, func, langtest_extra.get("test_suite_name")
Expand Down Expand Up @@ -866,16 +882,14 @@
langtest_extra=langtest_extra,
)
_TEST_CASE.set(test_case)
func_sig = inspect.signature(func)
func_inputs = rh._get_inputs_safe(func_sig, *test_args, **test_kwargs)

def _test():
test_case.start_time()
with rh.trace(
name=getattr(func, "__name__", "Test"),
run_id=test_case.run_id,
reference_example_id=test_case.example_id,
inputs=func_inputs,
inputs=test_case.inputs,
project_name=test_case.test_suite.name,
exceptions_to_handle=(SkipException,),
_end_on_exit=False,
Expand Down Expand Up @@ -936,16 +950,14 @@
langtest_extra=langtest_extra,
)
_TEST_CASE.set(test_case)
func_sig = inspect.signature(func)
func_inputs = rh._get_inputs_safe(func_sig, *test_args, **test_kwargs)

async def _test():
test_case.start_time()
with rh.trace(
name=getattr(func, "__name__", "Test"),
run_id=test_case.run_id,
reference_example_id=test_case.example_id,
inputs=func_inputs,
inputs=test_case.inputs,
project_name=test_case.test_suite.name,
exceptions_to_handle=(SkipException,),
_end_on_exit=False,
Expand Down
4 changes: 2 additions & 2 deletions python/tests/evaluation/test_decorator.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ async def test_openai_says_hello():
reason="LANGSMITH_TRACING environment variable not set",
)
@pytest.mark.xfail(reason="Test failure output case")
@pytest.mark.langsmith(output_keys=["expected"])
@pytest.mark.langsmith(reference_output_keys=["expected"])
@pytest.mark.parametrize(
"a, b, expected",
[
Expand Down Expand Up @@ -98,7 +98,7 @@ def reference_outputs() -> int:
not os.getenv("LANGSMITH_TRACING"),
reason="LANGSMITH_TRACING environment variable not set",
)
@pytest.mark.langsmith(output_keys=["reference_outputs"])
@pytest.mark.langsmith(reference_output_keys=["reference_outputs"])
def test_fixture(inputs: int, reference_outputs: int):
result = 2 * inputs
t.log_outputs({"d": result})
Expand Down