-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release 3.1 #1011
Release 3.1 #1011
Conversation
* Feature/remove extra r2r abstraction (#996) * moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * removes an unnecessary abstraction * sync changes --------- Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com> * first commit * move towards orchestration * tweaks * check in working ingestion * move * kg enrichment * update future, postgres compose * hatchetize ingestion pipeline * ready for prime time * finish --------- Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>
* add update files workflow * rm ingestion pipeline
* add update files workflow * rm ingestion pipeline * v0 restructure orch
* add update files workflow * rm ingestion pipeline * v0 restructure orch * kg orchestration * finish kg orchestration * update service * merge * cleanups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 79eac6f in 1 minute and 44 seconds
More details
- Looked at
7014
lines of code in100
files - Skipped
1
files when reviewing. - Skipped posting
7
drafted comments based on config settings.
1. py/core/main/api/management_router.py:5
- Draft comment:
The import forOrchestrationProvider
is unused and can be removed to clean up the code. - Reason this comment was not posted:
Confidence changes required:10%
The code inpy/core/main/api/management_router.py
has a redundant import statement forOrchestrationProvider
. This import is not used anywhere in the file, and it should be removed to clean up the code.
2. py/core/main/api/management_router.py:79
- Draft comment:
Theauth_user
dependency is repeated multiple times. Consider refactoring to a common method or decorator to reduce redundancy. - Reason this comment was not posted:
Confidence changes required:20%
Inpy/core/main/api/management_router.py
, theauth_user
dependency is repeated multiple times. It might be beneficial to refactor this to a common method or decorator to reduce redundancy and improve maintainability.
3. py/core/main/api/management_router.py:87
- Draft comment:
Theauth_user
dependency is repeated multiple times. Consider refactoring to a common method or decorator to reduce redundancy. - Reason this comment was not posted:
Confidence changes required:20%
Inpy/core/main/api/management_router.py
, theauth_user
dependency is repeated multiple times. It might be beneficial to refactor this to a common method or decorator to reduce redundancy and improve maintainability.
4. py/core/main/api/management_router.py:97
- Draft comment:
Theauth_user
dependency is repeated multiple times. Consider refactoring to a common method or decorator to reduce redundancy. - Reason this comment was not posted:
Confidence changes required:20%
Inpy/core/main/api/management_router.py
, theauth_user
dependency is repeated multiple times. It might be beneficial to refactor this to a common method or decorator to reduce redundancy and improve maintainability.
5. py/core/main/api/management_router.py:105
- Draft comment:
Theauth_user
dependency is repeated multiple times. Consider refactoring to a common method or decorator to reduce redundancy. - Reason this comment was not posted:
Confidence changes required:20%
Inpy/core/main/api/management_router.py
, theauth_user
dependency is repeated multiple times. It might be beneficial to refactor this to a common method or decorator to reduce redundancy and improve maintainability.
6. py/core/main/api/management_router.py:127
- Draft comment:
Theauth_user
dependency is repeated multiple times. Consider refactoring to a common method or decorator to reduce redundancy. - Reason this comment was not posted:
Confidence changes required:20%
Inpy/core/main/api/management_router.py
, theauth_user
dependency is repeated multiple times. It might be beneficial to refactor this to a common method or decorator to reduce redundancy and improve maintainability.
7. py/core/main/api/management_router.py:142
- Draft comment:
Theauth_user
dependency is repeated multiple times. Consider refactoring to a common method or decorator to reduce redundancy. - Reason this comment was not posted:
Confidence changes required:20%
Inpy/core/main/api/management_router.py
, theauth_user
dependency is repeated multiple times. It might be beneficial to refactor this to a common method or decorator to reduce redundancy and improve maintainability.
Workflow ID: wflow_wAuH97VuRlszJ3Tj
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
* moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * Add support for CharacterTextSplitter (#986) * Add support for CharacterTextSplitter Allows R2R client to override the text splitter. Example: ```python ingestion_response = client.ingest_files( file_paths=[file_path], metadatas=metadata, # optionally override chunking settings at runtime chunking_settings={ "provider": "r2r", "method": "character", "extra_fields": { "separator": "---" }, } ) ``` * fixup! Add support for CharacterTextSplitter * fixup! fixup! Add support for CharacterTextSplitter * Patch/ollama base cli (#992) * Dev (#990) * moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * Add support for CharacterTextSplitter (#986) * Add support for CharacterTextSplitter Allows R2R client to override the text splitter. Example: ```python ingestion_response = client.ingest_files( file_paths=[file_path], metadatas=metadata, # optionally override chunking settings at runtime chunking_settings={ "provider": "r2r", "method": "character", "extra_fields": { "separator": "---" }, } ) ``` * fixup! Add support for CharacterTextSplitter * fixup! fixup! Add support for CharacterTextSplitter --------- Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com> Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com> * fix ollama cli --------- Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com> Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com> * Ingestion refactor (#991) * fix test (#993) * Increase Neo4j memory limits, add GDS plugin, and update LLM concurrency limit to 256. * Update ingestion sample file, disable KG node extraction pipe, add community processing in clustering, and enhance graph clustering queries. * Update runners (#1007) * Refactor KG clustering process to simplify community processing and enhance entity-triple retrieval from Neo4j. * Refactor Neo4j configuration for memory settings and update graph clustering logic in the KG provider. * Fix pipeline by enabling node extraction and refactor community processing logic in KGClusteringPipe. * hatchet works * throw error if you run global search before enrichment * Fix communities in local search * turn off node desc embedding * fix rag endpoint * Increase hatchet msg size * Update ingestion.py * Refactor and clean up code formatting * modified workflow * Add graph creation functionality * Refactor KG parameters and logging. * review * up --------- Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com> Co-authored-by: emrgnt-cmplxty <owen@algofi.org> Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com> Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
* add update files workflow * rm ingestion pipeline * v0 restructure orch * kg orchestration * finish kg orchestration * update service * merge * cleanups * add hatchet api key setup * cleanup * add hatchet api key setup (#1037) * add hatchet api key setup * cleanup * fix merge * cleanups
* Update runners (#1007) * Check in logs --------- Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>
* Pull in subnet and graph PR * Add in templates
* dockerfile * Update ingestion file with new sample URL and enhance unstructured chunking configuration and error handling. * clean up * clean up dockerfile * up * Update sample file and clean code * Add hatchet-sdk dependency in project. * Update providers to include local option.
* Draft of file provider * Some cleanup * Regenearte lock * Stream it * Use document_id as primary key * Pydantic v2 * File provider finished
Skipped PR review on 39a4326 because no changed files had a supported extension. If you think this was in error, please contact us and we'll fix it right away. Generated with ❤️ by ellipsis.dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 6fde6e6 in 22 seconds
More details
- Looked at
72
lines of code in4
files - Skipped
0
files when reviewing. - Skipped posting
0
drafted comments based on config settings.
Workflow ID: wflow_JwpAjCpEzN8KnhoN
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
* Fix * Fix parsing pipeline * working
* improve documentation * fix unstr * add ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on c5bbdb3 in 26 seconds
More details
- Looked at
13
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. py/compose.yaml:106
- Draft comment:
Verify that the production image 'ragtoriches/prod:main-unstructured' is compatible with the current setup and includes all necessary updates and configurations. - Reason this comment was not posted:
Confidence changes required:50%
The change in the image reference for the 'r2r' service from a local image to a production image is significant. This change should be verified to ensure that the production image is compatible with the current setup and that it includes all necessary updates and configurations.
Workflow ID: wflow_fYDnP1l7jGdWYwIs
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on e0cb7e6 in 34 seconds
More details
- Looked at
415
lines of code in10
files - Skipped
0
files when reviewing. - Skipped posting
4
drafted comments based on config settings.
1. py/Dockerfile.unstructured:36
- Draft comment:
Ensure that removing theCOPY --from=builder /usr/share/tesseract-ocr /usr/share/tesseract-ocr
line does not affect the application's functionality if Tesseract OCR is required at runtime. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment starts with 'Ensure that...', which suggests it might not be useful according to the rules. It is speculative, as it asks the author to verify the impact of the change rather than pointing out a definite issue. The comment does not suggest a specific code change or improvement.
The comment could be relevant if the removal of Tesseract OCR files is likely to cause a runtime issue, but it doesn't provide a definite issue or solution.
The rules clearly state not to make speculative comments or ask the author to ensure behavior, so the comment should be removed.
Remove the comment as it is speculative and asks the author to ensure behavior, which is against the rules.
2. py/core/base/providers/chunking.py:87
- Draft comment:
Ensure that thestrategy
attribute inUnstructuredChunkingConfig
matches the intended configuration in the TOML files (e.g.,hi_res
instead ofauto
). - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment asks to ensure that the 'strategy' attribute matches the intended configuration, which is speculative and not actionable. It doesn't point out a definite issue with the code. The rules specify not to ask the author to confirm intentions or ensure behavior, which this comment does.
I might be overlooking the importance of ensuring the 'strategy' attribute is correct, but the rules clearly state not to make speculative comments or ask for confirmation of intentions.
Even if ensuring the 'strategy' is correct is important, the comment is speculative and not actionable, which violates the rules.
The comment should be deleted because it is speculative and not actionable, asking the author to ensure something rather than pointing out a definite issue.
3. py/core/main/hatchet/ingestion_workflow.py:22
- Draft comment:
Consider storingcontext.workflow_input()
in a variable to avoid multiple calls and improve readability. - Reason this comment was not posted:
Confidence changes required:50%
Theparse_file
method iningestion_workflow.py
usescontext.workflow_input()
multiple times to access the same data. This can be optimized by storing the result in a variable and reusing it.
4. py/core/providers/parsing/unstructured_parsing.py:87
- Draft comment:
Removeprint
statements or replace them with logging for better production practices. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is about print statements that were not changed in the diff. According to the rules, comments should not be made on unchanged code. Therefore, this comment should be removed.
I might be missing the context where the print statements are relevant to the changes made, but the rules clearly state not to comment on unchanged code.
The rules are clear about not commenting on unchanged code, so the context of the print statements is irrelevant in this case.
Remove the comment because it addresses code that was not changed in the diff.
Workflow ID: wflow_uJ4aL807skZAXq8J
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
* Move to self.execute_query * Check in push * Check in * Get file provider running * Actually use file provider * Final touches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 91fcf70 in 30 seconds
More details
- Looked at
39
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
2
drafted comments based on config settings.
1. py/r2r.toml:11
- Draft comment:
Ensure that the change in chunking provider and method is reflected in the codebase where these configurations are used. This might affect document processing logic. - Reason this comment was not posted:
Confidence changes required:50%
The change in the chunking provider and method inr2r.toml
should be reflected in the codebase to ensure consistency. This change might affect how documents are processed, so it's important to verify that the code using these configurations is updated accordingly.
2. py/compose.yaml:106
- Draft comment:
Avoid using hardcoded passwords. Consider parameterizing the Neo4j password or managing it securely. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is not about a change made in the diff, as it refers to a hardcoded password issue that was not altered in the current changes. According to the rules, comments should only be made on lines that were changed in the diff.
I might be missing the context where the password is indirectly affected by the change, but the rules are clear about commenting only on changed lines.
The rules are strict about not commenting on unchanged lines, so even if the comment is valid, it should be removed if it doesn't pertain to a change in the diff.
Remove the comment because it is not about a change made in the diff.
Workflow ID: wflow_Vxi2AGFA9rlpkG3V
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
* fix unstr err * tweak * by_title default * cleanups * checkin * merge * Graph docs (#1058) * Add document chunks and enrich graph endpoints. * up * Add KG creation and enrichment responses * up * Remove duplicate UnstructuredChunkingConfig entry. * cleanup docs * up --------- Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>
* fix unstr err * tweak * by_title default * cleanups * Add document chunks and enrich graph endpoints. * up * Add KG creation and enrichment responses * checkin * merge * up * Remove duplicate UnstructuredChunkingConfig entry. * Remove unused kg_search settings. * Refactor knowledge graph settings handling. * Update image and clean up logs. --------- Co-authored-by: emrgnt-cmplxty <owen@algofi.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 1dc5043 in 19 seconds
More details
- Looked at
20
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. py/core/main/hatchet/ingestion_workflow.py:22
- Draft comment:
Ensure that necessary logging is in place for monitoring and debugging, as print statements have been removed. - Reason this comment was not posted:
Confidence changes required:50%
The print statements for debugging purposes have been removed, which is a good practice for production code. However, it's important to ensure that any necessary logging is in place for monitoring and debugging purposes.
Workflow ID: wflow_hWu5fU1EZggKcfF0
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 8178d76 in 50 seconds
More details
- Looked at
14
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. py/compose.yaml:106
- Draft comment:
Avoid logging sensitive information like tokens to the console. This can be a security risk if logs are accessible. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment does not seem to be directly related to the change made in the diff, which is the modification of the image line for the 'r2r' service. The comment is more of a general security advice rather than pointing out a specific issue caused by the change. According to the rules, comments should only be made if there is a clear code change required, and this comment does not seem to meet that criterion.
I might be missing the context in which the comment was made, as it could be related to a broader change not visible in the diff. However, based on the information provided, it seems unrelated to the specific change in the diff.
The rules clearly state that comments should only be made if they are directly related to a change in the diff. Since this comment is not related to the specific change, it should be removed.
Remove the comment as it is not directly related to the change made in the diff and does not indicate a specific code change required.
Workflow ID: wflow_mMStO1UE1QuaTdCO
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
* fix unstr err * tweak * by_title default * cleanups * Add document chunks and enrich graph endpoints. * up * Add KG creation and enrichment responses * checkin * merge * up * Remove duplicate UnstructuredChunkingConfig entry. * Remove unused kg_search settings. * Refactor knowledge graph settings handling. * Update image and clean up logs. * Implement fallback parsing mechanism * Fallback parser * Refactor code for readability and formatting * Refactor and enhance media parsers * Update response types in router. * Remove telemetry and add logging * Refactor logging format in parsers * Refactor image and movie parsers * Fix formatting in movie_parser.py * Remove debug logging statements * Remove debug logging for chunking config * Rename debug option to build. --------- Co-authored-by: emrgnt-cmplxty <owen@algofi.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 41be1a6 in 32 seconds
More details
- Looked at
33
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. py/core/main/api/restructure_router.py:110
- Draft comment:
Json
is not a valid type for request bodies in FastAPI. UseKGEnrichmentSettings
directly as it is a Pydantic model. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is addressing a potential issue with the use of 'Json' in the 'enrich_graph' function, which is part of the changes in the diff. If 'Json' is indeed not a valid type for request bodies in FastAPI, this would require a code change.
I might be missing the context of whether 'Json' is actually supported by FastAPI or if there is a specific reason it is used here. The comment assumes 'Json' is incorrect without providing evidence.
The comment is likely based on a common understanding of FastAPI's type handling, and if 'Json' is not typically used, the comment is valid.
The comment should be kept as it addresses a potential issue with the use of 'Json' in the changed code, which may require a code change.
Workflow ID: wflow_gJsfrsd5IBmMut3Z
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
* ready for merge * fix agent
* ready for merge * fix agent * fix import
* ready for merge * fix agent * fix import
* Fix fallback parsing * Fix * Compose * up
* add orchestration docs * docs iteration * iterate * add images * add images
* add orchestration docs * docs iteration * iterate * add images * add images * run pre-commit * reclean
Summary:
Refactored R2R system, introduced Hatchet orchestration, updated configurations, aligned test cases, and renamed response classes.
Key points:
py/core/providers/orchestration/hatchet.py
.chunking_settings
tochunking_config
injs/sdk/src/r2rClient.ts
.py/cli/commands/ingestion.py
.--exclude-hatchet
option inpy/cli/commands/server.py
.py/cli/utils/docker_utils.py
.py/compose.hatchet.yaml
for Hatchet service configuration.DocumentStatus
toIngestionStatus
andRestructureStatus
in multiple files.asearch
withsearch
inpy/core/agent/rag.py
.R2RSerializable
class for serialization inpy/core/base/abstractions/base.py
.Document
class to handle base64 encoding inpy/core/base/abstractions/document.py
.py/core/main/services
.py/pyproject.toml
to require Python 3.10+.py/tests/test_end_to_end.py
to align with new ingestion logic.py/core/main/hatchet/ingestion_workflow.py
.py/compose.yaml
to useragtoriches/prod
image forr2r
service.KGCreationResponse
toWrappedKGCreationResponse
inpy/core/main/api/restructure_router.py
.KGEnrichmentResponse
toWrappedKGEnrichmentResponse
inpy/core/main/api/restructure_router.py
.Generated with ❤️ by ellipsis.dev