Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 3.1 #1011

Merged
merged 42 commits into from
Sep 6, 2024
Merged

Release 3.1 #1011

merged 42 commits into from
Sep 6, 2024

Conversation

emrgnt-cmplxty
Copy link
Contributor

@emrgnt-cmplxty emrgnt-cmplxty commented Aug 30, 2024

🚀 This description was created by Ellipsis for commit 41be1a6

Summary:

Refactored R2R system, introduced Hatchet orchestration, updated configurations, aligned test cases, and renamed response classes.

Key points:

  • Refactored R2R system with variable renaming and updated configurations.
  • Introduced Hatchet orchestration in py/core/providers/orchestration/hatchet.py.
  • Updated chunking_settings to chunking_config in js/sdk/src/r2rClient.ts.
  • Reordered URLs in py/cli/commands/ingestion.py.
  • Added --exclude-hatchet option in py/cli/commands/server.py.
  • Updated Docker setup to include Hatchet in py/cli/utils/docker_utils.py.
  • Introduced py/compose.hatchet.yaml for Hatchet service configuration.
  • Refactored DocumentStatus to IngestionStatus and RestructureStatus in multiple files.
  • Replaced asearch with search in py/core/agent/rag.py.
  • Added R2RSerializable class for serialization in py/core/base/abstractions/base.py.
  • Updated Document class to handle base64 encoding in py/core/base/abstractions/document.py.
  • Refactored ingestion and restructuring logic in py/core/main/services.
  • Updated py/pyproject.toml to require Python 3.10+.
  • Modified test cases in py/tests/test_end_to_end.py to align with new ingestion logic.
  • Removed debug print statements in py/core/main/hatchet/ingestion_workflow.py.
  • Updated py/compose.yaml to use ragtoriches/prod image for r2r service.
  • Renamed KGCreationResponse to WrappedKGCreationResponse in py/core/main/api/restructure_router.py.
  • Renamed KGEnrichmentResponse to WrappedKGEnrichmentResponse in py/core/main/api/restructure_router.py.

Generated with ❤️ by ellipsis.dev

emrgnt-cmplxty and others added 4 commits August 29, 2024 17:14
* Feature/remove extra r2r abstraction (#996)

* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* removes an unnecessary abstraction

* sync changes

---------

Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>

* first commit

* move towards orchestration

* tweaks

* check in working ingestion

* move

* kg enrichment

* update future, postgres compose

* hatchetize ingestion pipeline

* ready for prime time

* finish

---------

Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>
* add update files workflow

* rm ingestion pipeline
* add update files workflow

* rm ingestion pipeline

* v0 restructure orch
* add update files workflow

* rm ingestion pipeline

* v0 restructure orch

* kg orchestration

* finish kg orchestration

* update service

* merge

* cleanups
@emrgnt-cmplxty emrgnt-cmplxty marked this pull request as ready for review September 4, 2024 02:03
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 79eac6f in 1 minute and 44 seconds

More details
  • Looked at 7014 lines of code in 100 files
  • Skipped 1 files when reviewing.
  • Skipped posting 7 drafted comments based on config settings.
1. py/core/main/api/management_router.py:5
  • Draft comment:
    The import for OrchestrationProvider is unused and can be removed to clean up the code.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The code in py/core/main/api/management_router.py has a redundant import statement for OrchestrationProvider. This import is not used anywhere in the file, and it should be removed to clean up the code.
2. py/core/main/api/management_router.py:79
  • Draft comment:
    The auth_user dependency is repeated multiple times. Consider refactoring to a common method or decorator to reduce redundancy.
  • Reason this comment was not posted:
    Confidence changes required: 20%
    In py/core/main/api/management_router.py, the auth_user dependency is repeated multiple times. It might be beneficial to refactor this to a common method or decorator to reduce redundancy and improve maintainability.
3. py/core/main/api/management_router.py:87
  • Draft comment:
    The auth_user dependency is repeated multiple times. Consider refactoring to a common method or decorator to reduce redundancy.
  • Reason this comment was not posted:
    Confidence changes required: 20%
    In py/core/main/api/management_router.py, the auth_user dependency is repeated multiple times. It might be beneficial to refactor this to a common method or decorator to reduce redundancy and improve maintainability.
4. py/core/main/api/management_router.py:97
  • Draft comment:
    The auth_user dependency is repeated multiple times. Consider refactoring to a common method or decorator to reduce redundancy.
  • Reason this comment was not posted:
    Confidence changes required: 20%
    In py/core/main/api/management_router.py, the auth_user dependency is repeated multiple times. It might be beneficial to refactor this to a common method or decorator to reduce redundancy and improve maintainability.
5. py/core/main/api/management_router.py:105
  • Draft comment:
    The auth_user dependency is repeated multiple times. Consider refactoring to a common method or decorator to reduce redundancy.
  • Reason this comment was not posted:
    Confidence changes required: 20%
    In py/core/main/api/management_router.py, the auth_user dependency is repeated multiple times. It might be beneficial to refactor this to a common method or decorator to reduce redundancy and improve maintainability.
6. py/core/main/api/management_router.py:127
  • Draft comment:
    The auth_user dependency is repeated multiple times. Consider refactoring to a common method or decorator to reduce redundancy.
  • Reason this comment was not posted:
    Confidence changes required: 20%
    In py/core/main/api/management_router.py, the auth_user dependency is repeated multiple times. It might be beneficial to refactor this to a common method or decorator to reduce redundancy and improve maintainability.
7. py/core/main/api/management_router.py:142
  • Draft comment:
    The auth_user dependency is repeated multiple times. Consider refactoring to a common method or decorator to reduce redundancy.
  • Reason this comment was not posted:
    Confidence changes required: 20%
    In py/core/main/api/management_router.py, the auth_user dependency is repeated multiple times. It might be beneficial to refactor this to a common method or decorator to reduce redundancy and improve maintainability.

Workflow ID: wflow_wAuH97VuRlszJ3Tj


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* Add support for CharacterTextSplitter (#986)

* Add support for CharacterTextSplitter

Allows R2R client to override the text splitter. Example:

```python
ingestion_response = client.ingest_files(
        file_paths=[file_path],
        metadatas=metadata,
        # optionally override chunking settings at runtime
        chunking_settings={
            "provider": "r2r",
            "method": "character",
            "extra_fields": {
                "separator": "---"
            },
        }
    )
```

* fixup! Add support for CharacterTextSplitter

* fixup! fixup! Add support for CharacterTextSplitter

* Patch/ollama base cli (#992)

* Dev (#990)

* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* Add support for CharacterTextSplitter (#986)

* Add support for CharacterTextSplitter

Allows R2R client to override the text splitter. Example:

```python
ingestion_response = client.ingest_files(
        file_paths=[file_path],
        metadatas=metadata,
        # optionally override chunking settings at runtime
        chunking_settings={
            "provider": "r2r",
            "method": "character",
            "extra_fields": {
                "separator": "---"
            },
        }
    )
```

* fixup! Add support for CharacterTextSplitter

* fixup! fixup! Add support for CharacterTextSplitter

---------

Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>
Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com>

* fix ollama cli

---------

Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>
Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com>

* Ingestion refactor (#991)

* fix test (#993)

* Increase Neo4j memory limits, add GDS plugin, and update LLM concurrency limit to 256.

* Update ingestion sample file, disable KG node extraction pipe, add community processing in clustering, and enhance graph clustering queries.

* Update runners (#1007)

* Refactor KG clustering process to simplify community processing and enhance entity-triple retrieval from Neo4j.

* Refactor Neo4j configuration for memory settings and update graph clustering logic in the KG provider.

* Fix pipeline by enabling node extraction and refactor community processing logic in KGClusteringPipe.

* hatchet works

* throw error if you run global search before enrichment

* Fix communities in local search

* turn off node desc embedding

* fix rag endpoint

* Increase hatchet msg size

* Update ingestion.py

* Refactor and clean up code formatting

* modified workflow

* Add graph creation functionality

* Refactor KG parameters and logging.

* review

* up

---------

Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com>
Co-authored-by: emrgnt-cmplxty <owen@algofi.org>
Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com>
Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>
Copy link

vercel bot commented Sep 4, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
recommendation_platform ⬜️ Ignored (Inspect) Sep 6, 2024 6:15pm

emrgnt-cmplxty and others added 11 commits September 3, 2024 19:53
* add update files workflow

* rm ingestion pipeline

* v0 restructure orch

* kg orchestration

* finish kg orchestration

* update service

* merge

* cleanups

* add hatchet api key setup

* cleanup

* add hatchet api key setup (#1037)

* add hatchet api key setup

* cleanup

* fix merge

* cleanups
* Update runners (#1007)

* Check in logs

---------

Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>
* Pull in subnet and graph PR

* Add in templates
* dockerfile

* Update ingestion file with new sample URL and enhance unstructured chunking configuration and error handling.

* clean up

* clean up dockerfile

* up

* Update sample file and clean code

* Add hatchet-sdk dependency in project.

* Update providers to include local option.
* Draft of file provider

* Some cleanup

* Regenearte lock

* Stream it

* Use document_id as primary key

* Pydantic v2

* File provider finished
Copy link
Contributor

ellipsis-dev bot commented Sep 4, 2024

Skipped PR review on 39a4326 because no changed files had a supported extension. If you think this was in error, please contact us and we'll fix it right away.


Generated with ❤️ by ellipsis.dev

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 6fde6e6 in 22 seconds

More details
  • Looked at 72 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_JwpAjCpEzN8KnhoN


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

* Fix

* Fix parsing pipeline

* working
* improve documentation

* fix unstr

* add ingestion
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on c5bbdb3 in 26 seconds

More details
  • Looked at 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/compose.yaml:106
  • Draft comment:
    Verify that the production image 'ragtoriches/prod:main-unstructured' is compatible with the current setup and includes all necessary updates and configurations.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The change in the image reference for the 'r2r' service from a local image to a production image is significant. This change should be verified to ensure that the production image is compatible with the current setup and that it includes all necessary updates and configurations.

Workflow ID: wflow_fYDnP1l7jGdWYwIs


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on e0cb7e6 in 34 seconds

More details
  • Looked at 415 lines of code in 10 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. py/Dockerfile.unstructured:36
  • Draft comment:
    Ensure that removing the COPY --from=builder /usr/share/tesseract-ocr /usr/share/tesseract-ocr line does not affect the application's functionality if Tesseract OCR is required at runtime.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment starts with 'Ensure that...', which suggests it might not be useful according to the rules. It is speculative, as it asks the author to verify the impact of the change rather than pointing out a definite issue. The comment does not suggest a specific code change or improvement.
    The comment could be relevant if the removal of Tesseract OCR files is likely to cause a runtime issue, but it doesn't provide a definite issue or solution.
    The rules clearly state not to make speculative comments or ask the author to ensure behavior, so the comment should be removed.
    Remove the comment as it is speculative and asks the author to ensure behavior, which is against the rules.
2. py/core/base/providers/chunking.py:87
  • Draft comment:
    Ensure that the strategy attribute in UnstructuredChunkingConfig matches the intended configuration in the TOML files (e.g., hi_res instead of auto).
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment asks to ensure that the 'strategy' attribute matches the intended configuration, which is speculative and not actionable. It doesn't point out a definite issue with the code. The rules specify not to ask the author to confirm intentions or ensure behavior, which this comment does.
    I might be overlooking the importance of ensuring the 'strategy' attribute is correct, but the rules clearly state not to make speculative comments or ask for confirmation of intentions.
    Even if ensuring the 'strategy' is correct is important, the comment is speculative and not actionable, which violates the rules.
    The comment should be deleted because it is speculative and not actionable, asking the author to ensure something rather than pointing out a definite issue.
3. py/core/main/hatchet/ingestion_workflow.py:22
  • Draft comment:
    Consider storing context.workflow_input() in a variable to avoid multiple calls and improve readability.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The parse_file method in ingestion_workflow.py uses context.workflow_input() multiple times to access the same data. This can be optimized by storing the result in a variable and reusing it.
4. py/core/providers/parsing/unstructured_parsing.py:87
  • Draft comment:
    Remove print statements or replace them with logging for better production practices.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment is about print statements that were not changed in the diff. According to the rules, comments should not be made on unchanged code. Therefore, this comment should be removed.
    I might be missing the context where the print statements are relevant to the changes made, but the rules clearly state not to comment on unchanged code.
    The rules are clear about not commenting on unchanged code, so the context of the print statements is irrelevant in this case.
    Remove the comment because it addresses code that was not changed in the diff.

Workflow ID: wflow_uJ4aL807skZAXq8J


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

* Move to self.execute_query

* Check in push

* Check in

* Get file provider running

* Actually use file provider

* Final touches
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 91fcf70 in 30 seconds

More details
  • Looked at 39 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. py/r2r.toml:11
  • Draft comment:
    Ensure that the change in chunking provider and method is reflected in the codebase where these configurations are used. This might affect document processing logic.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The change in the chunking provider and method in r2r.toml should be reflected in the codebase to ensure consistency. This change might affect how documents are processed, so it's important to verify that the code using these configurations is updated accordingly.
2. py/compose.yaml:106
  • Draft comment:
    Avoid using hardcoded passwords. Consider parameterizing the Neo4j password or managing it securely.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment is not about a change made in the diff, as it refers to a hardcoded password issue that was not altered in the current changes. According to the rules, comments should only be made on lines that were changed in the diff.
    I might be missing the context where the password is indirectly affected by the change, but the rules are clear about commenting only on changed lines.
    The rules are strict about not commenting on unchanged lines, so even if the comment is valid, it should be removed if it doesn't pertain to a change in the diff.
    Remove the comment because it is not about a change made in the diff.

Workflow ID: wflow_Vxi2AGFA9rlpkG3V


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

emrgnt-cmplxty and others added 5 commits September 5, 2024 14:37
* fix unstr err

* tweak

* by_title default

* cleanups

* checkin

* merge

* Graph docs (#1058)

* Add document chunks and enrich graph endpoints.

* up

* Add KG creation and enrichment responses

* up

* Remove duplicate UnstructuredChunkingConfig entry.

* cleanup docs

* up

---------

Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>
* fix unstr err

* tweak

* by_title default

* cleanups

* Add document chunks and enrich graph endpoints.

* up

* Add KG creation and enrichment responses

* checkin

* merge

* up

* Remove duplicate UnstructuredChunkingConfig entry.

* Remove unused kg_search settings.

* Refactor knowledge graph settings handling.

* Update image and clean up logs.

---------

Co-authored-by: emrgnt-cmplxty <owen@algofi.org>
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 1dc5043 in 19 seconds

More details
  • Looked at 20 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/core/main/hatchet/ingestion_workflow.py:22
  • Draft comment:
    Ensure that necessary logging is in place for monitoring and debugging, as print statements have been removed.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The print statements for debugging purposes have been removed, which is a good practice for production code. However, it's important to ensure that any necessary logging is in place for monitoring and debugging purposes.

Workflow ID: wflow_hWu5fU1EZggKcfF0


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 8178d76 in 50 seconds

More details
  • Looked at 14 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/compose.yaml:106
  • Draft comment:
    Avoid logging sensitive information like tokens to the console. This can be a security risk if logs are accessible.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment does not seem to be directly related to the change made in the diff, which is the modification of the image line for the 'r2r' service. The comment is more of a general security advice rather than pointing out a specific issue caused by the change. According to the rules, comments should only be made if there is a clear code change required, and this comment does not seem to meet that criterion.
    I might be missing the context in which the comment was made, as it could be related to a broader change not visible in the diff. However, based on the information provided, it seems unrelated to the specific change in the diff.
    The rules clearly state that comments should only be made if they are directly related to a change in the diff. Since this comment is not related to the specific change, it should be removed.
    Remove the comment as it is not directly related to the change made in the diff and does not indicate a specific code change required.

Workflow ID: wflow_mMStO1UE1QuaTdCO


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

* fix unstr err

* tweak

* by_title default

* cleanups

* Add document chunks and enrich graph endpoints.

* up

* Add KG creation and enrichment responses

* checkin

* merge

* up

* Remove duplicate UnstructuredChunkingConfig entry.

* Remove unused kg_search settings.

* Refactor knowledge graph settings handling.

* Update image and clean up logs.

* Implement fallback parsing mechanism

* Fallback parser

* Refactor code for readability and formatting

* Refactor and enhance media parsers

* Update response types in router.

* Remove telemetry and add logging

* Refactor logging format in parsers

* Refactor image and movie parsers

* Fix formatting in movie_parser.py

* Remove debug logging statements

* Remove debug logging for chunking config

* Rename debug option to build.

---------

Co-authored-by: emrgnt-cmplxty <owen@algofi.org>
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 41be1a6 in 32 seconds

More details
  • Looked at 33 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/core/main/api/restructure_router.py:110
  • Draft comment:
    Json is not a valid type for request bodies in FastAPI. Use KGEnrichmentSettings directly as it is a Pydantic model.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment is addressing a potential issue with the use of 'Json' in the 'enrich_graph' function, which is part of the changes in the diff. If 'Json' is indeed not a valid type for request bodies in FastAPI, this would require a code change.
    I might be missing the context of whether 'Json' is actually supported by FastAPI or if there is a specific reason it is used here. The comment assumes 'Json' is incorrect without providing evidence.
    The comment is likely based on a common understanding of FastAPI's type handling, and if 'Json' is not typically used, the comment is valid.
    The comment should be kept as it addresses a potential issue with the use of 'Json' in the changed code, which may require a code change.

Workflow ID: wflow_gJsfrsd5IBmMut3Z


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

emrgnt-cmplxty and others added 8 commits September 5, 2024 19:25
* ready for merge

* fix agent
* ready for merge

* fix agent

* fix import
* ready for merge

* fix agent

* fix import
* Fix fallback parsing

* Fix

* Compose

* up
* add orchestration docs

* docs iteration

* iterate

* add images

* add images
* add orchestration docs

* docs iteration

* iterate

* add images

* add images

* run pre-commit

* reclean
@emrgnt-cmplxty emrgnt-cmplxty changed the title Dev Release 3.1 Sep 6, 2024
@emrgnt-cmplxty emrgnt-cmplxty merged commit 51d2582 into main Sep 6, 2024
5 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants