Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add document summary to ingestion #1573

Merged
merged 4 commits into from
Nov 11, 2024

Conversation

emrgnt-cmplxty
Copy link
Contributor

@emrgnt-cmplxty emrgnt-cmplxty commented Nov 11, 2024

Important

Adds document summary generation to ingestion process and refactors search settings to SearchSettings.

  • Behavior:
    • Introduces document summary generation during ingestion in ingestion_service.py and ingestion_workflow.py.
    • Adds augment_document_info() to generate summaries using LLM and store embeddings.
    • Updates ingestion status to include AUGMENTING.
  • Search Settings:
    • Renames VectorSearchSettings and DocumentSearchSettings to SearchSettings across multiple files.
    • Updates search methods to use SearchSettings in retrieval_service.py, retrieval_router.py, and vector_search_pipe.py.
  • Database:
    • Modifies PostgresDocumentHandler to include summary and summary_embedding fields.
    • Adds full-text and semantic search capabilities in document.py.
  • Configuration:
    • Updates configuration files to include document summary settings.
    • Adds default_summary.yaml for summary prompt configuration.
  • Misc:
    • Refactors search pipelines and pipes to accommodate new search settings.
    • Updates API models and responses to reflect changes in document search results.

This description was created by Ellipsis for 39c5fae. It will automatically update as commits are pushed.

Copy link

vercel bot commented Nov 11, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
yc-demo ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 11, 2024 11:24pm

@emrgnt-cmplxty emrgnt-cmplxty marked this pull request as ready for review November 11, 2024 23:24
@emrgnt-cmplxty emrgnt-cmplxty merged commit c3a0273 into dev-minor Nov 11, 2024
13 of 15 checks passed
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 39c5fae in 1 minute and 17 seconds

More details
  • Looked at 1959 lines of code in 32 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/shared/abstractions/search.py:184
  • Draft comment:
    The SearchSettings class combines functionalities of both VectorSearchSettings and DocumentSearchSettings. Ensure that this change is well-documented to avoid confusion. Also, consider removing or refactoring the filters field since it's marked as deprecated but still used in the code.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The PR involves renaming VectorSearchSettings and DocumentSearchSettings to SearchSettings. This change is consistent across the codebase, but there are some potential issues with backward compatibility and clarity. The new SearchSettings class combines functionalities of both previous classes, which might lead to confusion if not documented properly. Additionally, the filters field is marked as deprecated, but it's still being used in the code. This could lead to confusion for developers using this codebase.

Workflow ID: wflow_yjRDsMIvzCXNEsvb


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

emrgnt-cmplxty added a commit that referenced this pull request Nov 13, 2024
* add add-hoc rerank implementation to embedding, add async rerank (#1572)

* add HF defaults

* Feature/add document summary to ingestion (#1573)

* adds document summary to ingestion pipeline

* cleanup impl

* new hybrid document search

* implement hybrid document search

* Feature/add document summary to ingestion (#1575)

* adds document summary to ingestion pipeline

* cleanup impl

* new hybrid document search

* implement hybrid document search

* add migration script

* make the summary change non-breaking (#1576)

* make the summary change non-breaking

* rollbk

* up

* Feature/tweak downgrade logic (#1577)

* tweak downgrade

* fix js sdk

* fix js sdk

* fix upgrade logic

* up
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant