Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC]: Update improper usage of $contains operator in example code and add test to validate it #4096

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

hesreallyhim
Copy link

@hesreallyhim hesreallyhim commented Mar 27, 2025

Description of changes

In various files in the codebase, the string: {$contains: {"text": "hello"}} was used as an example of a where_document query. This is ill-formed for two reasons: (i) $contains does not appear in quotation marks (OK in JS, not in python); (ii) the structure does not conform to the WhereDocument Type: $contains cannot be the key for a value that is a dictionary of two strings. See:

WhereDocumentOperator = Union[
    Literal["$contains"], Literal["$not_contains"], LogicalOperator
]
WhereDocument = Dict[WhereDocumentOperator, Union[str, List["WhereDocument"]]]

in base_types.py

Regrettably, this snippet was referenced in the API docs for langchain-chroma, which is what led me to find and make this fix.

Summarize the changes made by this PR.

  • Replace occurrences of {$contains: {"text": "hello"}} with {"$contains": "hello"} in any comments or documentation.
  • Add a failing test for {"$contains": {"text": "hello"}}

Test plan

How are these changes tested?

  • I added a test (failure) case for the ill-formed {$contains: {"text": "hello"}}. After setting up my basic environment, I was able to run the unit test, and see that it passed. However, I've been struggling to get a full development environment setup, so I have not been able to fully run all the tests.
  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?

hesreallyhim and others added 4 commits March 26, 2025 18:57
* Fix incorrect example of `where_document` in `docs/docs.trychroma.com/markdoc/content/reference/python/collection.md` which does not conform to `WhereDocument` type
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@@ -156,7 +156,7 @@ export class Collection {
* limit: 10,
* offset: 0,
* include: ["embeddings", "metadatas", "documents"],
* whereDocument: { $contains: "value" },
* whereDocument: { "$contains": "value" },
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strictly speaking, this is not necessary, but it seems to follow the stylistic conventions of the current file, so I made this change

@hesreallyhim hesreallyhim changed the title Patch 1 DOCS: Update improper usage of $contains operator in example code and add test to validate it Mar 27, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the incorrect example usage of the "$contains" operator in documentation and code comments and adds a negative unit test to validate that the ill-formed input raises a ValueError.

  • Corrects documentation examples in Python, JavaScript, and async API endpoints to use the valid format {"$contains": "hello"} instead of {$contains: {"text": "hello"}}.
  • Adds a failing test case in the Python test suite to ensure inappropriate where_document queries are rejected.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
chromadb/test/test_api.py Added tests ensuring that improper where_document queries raise a ValueError
clients/js/packages/chromadb-core/src/Collection.ts Updated documentation comment examples to enclose $contains in quotes
docs/docs.trychroma.com/markdoc/content/reference/js/collection.md Updated example usage of whereDocument to the correct format
docs/docs.trychroma.com/markdoc/content/reference/python/collection.md Corrected the example for where_document to show {"$contains": "hello"}
chromadb/api/models/AsyncCollection.py Adjusted docstring examples for the where_document parameter to the proper format
chromadb/api/models/Collection.py Made similar updates in docstrings for the where_document parameter

@hesreallyhim hesreallyhim changed the title DOCS: Update improper usage of $contains operator in example code and add test to validate it [DOC]: Update improper usage of $contains operator in example code and add test to validate it Mar 27, 2025
@hesreallyhim
Copy link
Author

btw it looks like the workflow keeps failing even after I updated the title to conform to the convention, I'm not sure if it's flakey or I need to do something. Thanks.

Copy link
Contributor

@jairad26 jairad26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR! Will take a look at the failing title, if it happens again you can push an empty commit and it'll resolve, the header looks fine now

@hesreallyhim
Copy link
Author

oh ok yeah i'll try that, i think it said it wouldn't run until there was another commit, but i noticed a few other PR's were hanging, so I wasn't sure if the CI/CD was flakey

Copy link

vercel bot commented Mar 27, 2025

Deployment failed with the following error:

The provided GitHub repository does not contain the requested branch or commit reference. Please ensure the repository is not empty.

@hesreallyhim
Copy link
Author

@itaismith @jairad26 sorry i'm not sure what the status is regarding the build, I don't want this to just idle here, lmk if you need anything from me.

* Add a complementary failing test for `$not_contains`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants