Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Adds Git tags to most recent dataset version #1432

Merged
merged 62 commits into from
Jan 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
ed71173
rfc: manually set test case inputs/outputs
baskaryan Dec 30, 2024
80cbf66
fmt
baskaryan Dec 31, 2024
b33f75a
fmt
baskaryan Jan 2, 2025
37f43cb
fmt
baskaryan Jan 2, 2025
cf82fd6
Merge branch 'main' into bagatur/rfc_set_test_vals
baskaryan Jan 2, 2025
80d4205
Merge branch 'main' into bagatur/rfc_set_test_vals
baskaryan Jan 3, 2025
fa8882f
Merge branch 'main' into bagatur/rfc_set_test_vals
baskaryan Jan 3, 2025
6b999b4
fmt
baskaryan Jan 3, 2025
cbbf3a3
fmt
baskaryan Jan 3, 2025
a61d7d0
fmt
baskaryan Jan 3, 2025
cf37a91
fmt
baskaryan Jan 3, 2025
c9addf0
fmt
baskaryan Jan 3, 2025
d814ec5
Merge branch 'main' into bagatur/rfc_set_test_vals
baskaryan Jan 6, 2025
8700d4b
fmt
baskaryan Jan 6, 2025
376a645
fmt
baskaryan Jan 6, 2025
81e41f4
Merge branch 'main' into bagatur/rfc_set_test_vals
baskaryan Jan 8, 2025
d5d4ebb
rc release
baskaryan Jan 8, 2025
6940ace
Merge branch 'main' into bagatur/rfc_set_test_vals
baskaryan Jan 8, 2025
d3ed9c6
Merge branch 'main' into bagatur/rfc_set_test_vals
baskaryan Jan 8, 2025
bd72391
add better error messaging
isahers1 Jan 8, 2025
542d76e
Merge branch 'main' into bagatur/rfc_set_test_vals
baskaryan Jan 8, 2025
aaf41c9
fmt
baskaryan Jan 8, 2025
d7beea8
wait example updates
baskaryan Jan 8, 2025
c4de666
Merge branch 'bagatur/rfc_set_test_vals' of github.com:langchain-ai/l…
baskaryan Jan 8, 2025
44897c7
fmt
baskaryan Jan 9, 2025
2192c09
fmt
baskaryan Jan 9, 2025
7b433a2
fmt
baskaryan Jan 9, 2025
48670e3
rc2
baskaryan Jan 9, 2025
78eeabf
fmt
baskaryan Jan 9, 2025
e1c9d7c
fmt
baskaryan Jan 9, 2025
400e38b
ptyest plugin
baskaryan Jan 10, 2025
9250bbf
Merge branch 'main' into bagatur/rfc_set_test_vals
baskaryan Jan 10, 2025
414ef69
Merge branch 'main' into bagatur/rfc_set_test_vals
baskaryan Jan 10, 2025
1446452
fmt
baskaryan Jan 11, 2025
790617d
Merge branch 'main' into bagatur/rfc_set_test_vals
baskaryan Jan 11, 2025
37ceca0
fmt
baskaryan Jan 13, 2025
1fca587
rc6
baskaryan Jan 13, 2025
88e6f63
fmt
baskaryan Jan 13, 2025
43509c2
Merge branch 'main' into bagatur/rfc_set_test_vals
baskaryan Jan 14, 2025
233ae72
update table
baskaryan Jan 14, 2025
996e21f
fmt
baskaryan Jan 15, 2025
d79bec9
Merge branch 'main' into bagatur/rfc_set_test_vals
baskaryan Jan 15, 2025
99069fb
group by test suite
baskaryan Jan 15, 2025
6076c29
fmt
baskaryan Jan 15, 2025
d4d1695
fmt
baskaryan Jan 15, 2025
441a8b1
fmt
baskaryan Jan 15, 2025
05dbf16
fmt
baskaryan Jan 15, 2025
82cd582
rc7
baskaryan Jan 15, 2025
e56a68f
fmt
baskaryan Jan 15, 2025
d049218
update api ref
baskaryan Jan 16, 2025
e400c93
fix LANGSMITH_TEST_TRACKING=false
baskaryan Jan 16, 2025
5752078
fmt
baskaryan Jan 16, 2025
8f336f4
fix entrypoint
baskaryan Jan 16, 2025
a5ad0ef
rm script
baskaryan Jan 17, 2025
6aa0062
rename plugin
baskaryan Jan 17, 2025
65dbe26
rename output plugin
baskaryan Jan 17, 2025
3773cb8
rc12
baskaryan Jan 17, 2025
4faad45
Adds Git tags to most recent dataset version
jacoblee93 Jan 18, 2025
92ee5b7
Format
jacoblee93 Jan 18, 2025
001ff31
merge
baskaryan Jan 21, 2025
df2128c
fix
baskaryan Jan 21, 2025
22bc541
cr
baskaryan Jan 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions python/langsmith/client.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Client for interacting with the LangSmith API.

Check notice on line 1 in python/langsmith/client.py

View workflow job for this annotation

GitHub Actions / benchmark

Benchmark results

........... WARNING: the benchmark result may be unstable * the standard deviation (92.1 ms) is 13% of the mean (693 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. create_5_000_run_trees: Mean +- std dev: 693 ms +- 92 ms ........... create_10_000_run_trees: Mean +- std dev: 1.38 sec +- 0.09 sec ........... create_20_000_run_trees: Mean +- std dev: 2.74 sec +- 0.20 sec ........... dumps_class_nested_py_branch_and_leaf_200x400: Mean +- std dev: 715 us +- 5 us ........... dumps_class_nested_py_leaf_50x100: Mean +- std dev: 25.7 ms +- 0.4 ms ........... dumps_class_nested_py_leaf_100x200: Mean +- std dev: 108 ms +- 7 ms ........... dumps_dataclass_nested_50x100: Mean +- std dev: 26.2 ms +- 0.4 ms ........... WARNING: the benchmark result may be unstable * the standard deviation (21.9 ms) is 27% of the mean (82.6 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. dumps_pydantic_nested_50x100: Mean +- std dev: 82.6 ms +- 21.9 ms ........... dumps_pydanticv1_nested_50x100: Mean +- std dev: 204 ms +- 3 ms

Check notice on line 1 in python/langsmith/client.py

View workflow job for this annotation

GitHub Actions / benchmark

Comparison against main

+-----------------------------------------------+----------+------------------------+ | Benchmark | main | changes | +===============================================+==========+========================+ | dumps_pydanticv1_nested_50x100 | 214 ms | 204 ms: 1.05x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_branch_and_leaf_200x400 | 707 us | 715 us: 1.01x slower | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_50x100 | 25.3 ms | 25.7 ms: 1.01x slower | +-----------------------------------------------+----------+------------------------+ | dumps_dataclass_nested_50x100 | 25.4 ms | 26.2 ms: 1.03x slower | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_100x200 | 104 ms | 108 ms: 1.03x slower | +-----------------------------------------------+----------+------------------------+ | create_5_000_run_trees | 665 ms | 693 ms: 1.04x slower | +-----------------------------------------------+----------+------------------------+ | create_20_000_run_trees | 2.59 sec | 2.74 sec: 1.06x slower | +-----------------------------------------------+----------+------------------------+ | create_10_000_run_trees | 1.29 sec | 1.38 sec: 1.07x slower | +-----------------------------------------------+----------+------------------------+ | dumps_pydantic_nested_50x100 | 67.6 ms | 82.6 ms: 1.22x slower | +-----------------------------------------------+----------+------------------------+ | Geometric mean | (ref) | 1.05x slower | +-----------------------------------------------+----------+------------------------+

Use the client to customize API keys / workspace ocnnections, SSl certs,
etc. for tracing.
Expand Down Expand Up @@ -4698,6 +4698,7 @@
split=split,
attachments_operations=attachments_operations,
)
example = {k: v for k, v in example.items() if v is not None}
response = self.request_with_retries(
"PATCH",
f"/examples/{_as_uuid(example_id, 'example_id')}",
Expand Down
91 changes: 66 additions & 25 deletions python/langsmith/testing/_internal.py
Original file line number Diff line number Diff line change
Expand Up @@ -429,7 +429,7 @@ def _start_experiment(


def _get_example_id(
func: Callable, inputs: dict, suite_id: uuid.UUID
func: Callable, inputs: Optional[dict], suite_id: uuid.UUID
) -> Tuple[uuid.UUID, str]:
try:
file_path = str(Path(inspect.getfile(func)).relative_to(Path.cwd()))
Expand All @@ -447,16 +447,30 @@ def _get_example_id(

def _end_tests(test_suite: _LangSmithTestSuite):
git_info = ls_env.get_git_info() or {}
test_suite.shutdown()
dataset_version = test_suite.get_version()
dataset_id = test_suite._dataset.id
test_suite.client.update_project(
test_suite.experiment_id,
end_time=datetime.datetime.now(datetime.timezone.utc),
metadata={
**git_info,
"dataset_version": test_suite.get_version(),
"dataset_version": dataset_version,
"revision_id": ls_env.get_langchain_env_var_metadata().get("revision_id"),
},
)
test_suite.shutdown()
if dataset_version and git_info["commit"] is not None:
test_suite.client.update_dataset_tag(
dataset_id=dataset_id,
as_of=dataset_version,
tag=f'git:commit:{git_info["commit"]}',
)
if dataset_version and git_info["branch"] is not None:
test_suite.client.update_dataset_tag(
dataset_id=dataset_id,
as_of=dataset_version,
tag=f'git:branch:{git_info["branch"]}',
)


VT = TypeVar("VT", bound=Optional[dict])
Expand Down Expand Up @@ -485,6 +499,7 @@ def __init__(
self._version: Optional[datetime.datetime] = None
self._executor = ls_utils.ContextThreadPoolExecutor(max_workers=1)
self._example_futures: dict[ID_TYPE, list[Future]] = defaultdict(list)
self._example_modified_at: dict[ID_TYPE, datetime.datetime] = {}
atexit.register(_end_tests, self)

@property
Expand Down Expand Up @@ -521,14 +536,20 @@ def from_test(
def name(self):
return self._experiment.name

def update_version(self, version: datetime.datetime) -> None:
def update_version(self, version: datetime.datetime, example_id: uuid.UUID) -> None:
with self._lock:
if self._version is None or version > self._version:
self._version = version
self._example_modified_at[example_id] = version

def get_version(self) -> Optional[datetime.datetime]:
def get_version(
self, example_id: Optional[uuid.UUID] = None
) -> Optional[datetime.datetime]:
with self._lock:
return self._version
if not example_id:
return self._version
else:
return self._example_modified_at[example_id]

def submit_result(
self,
Expand Down Expand Up @@ -585,34 +606,39 @@ def _sync_example(
outputs: Optional[dict],
metadata: Optional[dict],
) -> None:
inputs_ = _serde_example_values(inputs) if inputs else inputs
outputs_ = _serde_example_values(outputs) if outputs else outputs
inputs = _serde_example_values(inputs)
outputs = _serde_example_values(outputs)
try:
example = self.client.read_example(example_id=example_id)
except ls_utils.LangSmithNotFoundError:
example = self.client.create_example(
example_id=example_id,
inputs=inputs_,
outputs=outputs_,
inputs=inputs,
outputs=outputs,
dataset_id=self.id,
metadata=metadata,
created_at=self._experiment.start_time,
)
modified_at = example.modified_at
else:
if (
inputs_ != example.inputs
or outputs_ != example.outputs
(inputs is not None and inputs != example.inputs)
or (outputs is not None and outputs != example.outputs)
or (metadata is not None and metadata != example.metadata)
or str(example.dataset_id) != str(self.id)
):
self.client.update_example(
response = self.client.update_example(
example_id=example.id,
inputs=inputs_,
outputs=outputs_,
inputs=inputs,
outputs=outputs,
metadata=metadata,
dataset_id=self.id,
)
if example.modified_at:
self.update_version(example.modified_at)
modified_at = datetime.datetime.fromisoformat(response["modified_at"])
else:
modified_at = example.modified_at
if modified_at:
self.update_version(modified_at, example_id=example_id)

def _submit_feedback(
self,
Expand Down Expand Up @@ -645,27 +671,33 @@ def wait_example_updates(self, example_id: ID_TYPE):
self._example_futures[example_id].pop().result()

def end_run(
self, run_tree, example_id, outputs, pytest_plugin=None, pytest_nodeid=None
self,
run_tree,
example_id,
outputs,
end_time,
pytest_plugin=None,
pytest_nodeid=None,
) -> Future:
return self._executor.submit(
self._end_run,
run_tree=run_tree,
example_id=example_id,
outputs=outputs,
end_time=end_time,
pytest_plugin=pytest_plugin,
pytest_nodeid=pytest_nodeid,
)

def _end_run(
self, run_tree, example_id, outputs, pytest_plugin, pytest_nodeid
self, run_tree, example_id, outputs, end_time, pytest_plugin, pytest_nodeid
) -> None:
# TODO: remove this hack so that run durations are correct
# Ensure example is fully updated
self.wait_example_updates(example_id)
# Ensure that run end time is after example modified at.
end_time = cast(
datetime.datetime, self.client.read_example(example_id).modified_at
) + datetime.timedelta(seconds=0.01)
# TODO: remove this hack so that run durations are correct
example_modified_at = self.get_version(example_id=example_id)
end_time = max(example_modified_at, end_time)
run_tree.end(outputs=outputs, end_time=end_time)
run_tree.patch()
pytest_plugin.update_process_status(pytest_nodeid, {"logged": True})
Expand Down Expand Up @@ -748,10 +780,12 @@ def end_time(self) -> None:
def end_run(self, run_tree, outputs: Any) -> Future:
if not (outputs is None or isinstance(outputs, dict)):
outputs = {"output": outputs}
end_time = datetime.datetime.now(datetime.timezone.utc)
return self.test_suite.end_run(
run_tree,
self.example_id,
outputs,
end_time=end_time,
pytest_plugin=self.pytest_plugin,
pytest_nodeid=self.pytest_nodeid,
)
Expand Down Expand Up @@ -786,9 +820,16 @@ def _create_test_case(
client = langtest_extra["client"] or rt.get_cached_client()
output_keys = langtest_extra["output_keys"]
signature = inspect.signature(func)
inputs: dict = rh._get_inputs_safe(signature, *args, **kwargs)
outputs = {}
inputs = rh._get_inputs_safe(signature, *args, **kwargs) or None
outputs = None
if output_keys:
outputs = {}
if not inputs:
msg = (
"'output_keys' should only be specified when marked test function has "
"input arguments."
)
raise ValueError(msg)
for k in output_keys:
outputs[k] = inputs.pop(k, None)
test_suite = _LangSmithTestSuite.from_test(
Expand Down
Loading