Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Implement unified serialization function #6044

Merged
merged 26 commits into from
Feb 3, 2025

Conversation

ogabrielluiz
Copy link
Contributor

Introduce a unified serialization method for various data types, improving consistency and maintainability. Enhance Pinecone integration to utilize VectorStore and handle import errors gracefully. Add comprehensive tests for the new serialization functions.

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jan 31, 2025
@github-actions github-actions bot added the refactor Maintenance tasks and housekeeping label Jan 31, 2025
@github-actions github-actions bot added refactor Maintenance tasks and housekeeping and removed refactor Maintenance tasks and housekeeping labels Jan 31, 2025
…tems_length for improved handling of outputs, logs, messages, and artifacts
@github-actions github-actions bot added refactor Maintenance tasks and housekeeping and removed refactor Maintenance tasks and housekeeping labels Jan 31, 2025
@edwinjosechittilappilly
Copy link
Collaborator

Good One.
Will this will also effect serialize_message in schema.py in tracing?

codeflash-ai bot added a commit that referenced this pull request Jan 31, 2025
…factor-serialization`)

Here is the refactored and optimized Python program.



### Changes and Optimizations.
1. **Logging Type Identification (`get_type` function)**.
  - Simplified `get_type` function using `isinstance`.

2. **Message Retrieval (`get_message` function)**.
  - Simplified the `get_message` function using straightforward if-elif clauses.

3. **Building Output Logs (`build_output_logs` function)**.
  - Refactored loops to avoid redundant checks and reduced operations.

4. **Serialization Function**.
  - Removed redundant checks and used direct falls back for common serialization patterns.

These changes streamline the code, minimize redundant operations, and make the code more readable while maintaining functionality and improving performance.
@github-actions github-actions bot added refactor Maintenance tasks and housekeeping and removed refactor Maintenance tasks and housekeeping labels Jan 31, 2025
@edwinjosechittilappilly
Copy link
Collaborator

@ogabrielluiz
I was planning to add a few changes in serialise message to fix:

Details: Error serializing vertex build response: Error serializing to JSON: PydanticSerializationError: Error calling function serialize_message: ValueError: [TypeError("'numpy.int64' object is not iterable"), TypeError('vars() argument must have dict attribute')]

Not sure if the current PR will fix this error,
If not we can add support for this in another PR.

@github-actions github-actions bot removed the refactor Maintenance tasks and housekeeping label Jan 31, 2025
codeflash-ai bot added a commit that referenced this pull request Feb 3, 2025
…actor-serialization`)

Certainly! Here's a more efficient version of the given program. The primary optimization performed here is removing the redundant `.apply()` call and directly truncating values in a more performant way.



### Changes Made.
1. **Removed redundant `apply` calls**: In the original code, there were nested `apply` calls which can be very slow on larger DataFrames. The new implementation converts the DataFrame to a list of dictionaries first and then truncates the values if needed.
2. **Optimized truncation logic**: Applied truncation directly while iterating over the dictionary after conversion from a DataFrame. This reduces overhead and improves readability.

These changes should enhance the runtime performance of the serialization process, especially for larger DataFrames.
Copy link
Contributor

codeflash-ai bot commented Feb 3, 2025

⚡️ Codeflash found optimizations for this PR

📄 123% (1.23x) speedup for _serialize_dataframe in src/backend/base/langflow/serialization/serialization.py

⏱️ Runtime : 23.9 milliseconds 10.7 milliseconds (best of 141 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch refactor-serialization).

codeflash-ai bot added a commit that referenced this pull request Feb 3, 2025
…or-serialization`)

Certainly! Here is a more optimized version of the program.



Changes made.
1. Replaced the `apply` method with dictionary comprehension. This avoids creating an intermediate Series, which can be an expensive operation.
2. Moved `_truncate_value` outside of the main function to keep the main function concise and focused.
Copy link
Contributor

codeflash-ai bot commented Feb 3, 2025

⚡️ Codeflash found optimizations for this PR

📄 234% (2.34x) speedup for _serialize_series in src/backend/base/langflow/serialization/serialization.py

⏱️ Runtime : 12.4 milliseconds 3.72 milliseconds (best of 356 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch refactor-serialization).

@ogabrielluiz ogabrielluiz dismissed edwinjosechittilappilly’s stale review February 3, 2025 12:19

The PR won't merge if tests fail.

@github-actions github-actions bot added refactor Maintenance tasks and housekeeping and removed refactor Maintenance tasks and housekeeping labels Feb 3, 2025
Copy link
Contributor

@anovazzi1 anovazzi1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 3, 2025
@github-actions github-actions bot added refactor Maintenance tasks and housekeeping and removed refactor Maintenance tasks and housekeeping labels Feb 3, 2025
@ogabrielluiz ogabrielluiz added lgtm This PR has been approved by a maintainer and removed lgtm This PR has been approved by a maintainer labels Feb 3, 2025
@ogabrielluiz ogabrielluiz added this pull request to the merge queue Feb 3, 2025
Merged via the queue into main with commit c73070c Feb 3, 2025
43 of 44 checks passed
@ogabrielluiz ogabrielluiz deleted the refactor-serialization branch February 3, 2025 15:22
ogabrielluiz pushed a commit that referenced this pull request Feb 3, 2025
…or-serialization`)

Certainly! Here is a more optimized version of the program.



Changes made.
1. Replaced the `apply` method with dictionary comprehension. This avoids creating an intermediate Series, which can be an expensive operation.
2. Moved `_truncate_value` outside of the main function to keep the main function concise and focused.
github-merge-queue bot pushed a commit that referenced this pull request Feb 3, 2025
…`refactor-serialization`) (#6079)

* feat: Implement serialization functions for various data types and add a unified serialize method

* optmize conditional

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

* fix: Update string and list truncation to include ellipsis for clarity

* ⚡️ Speed up function `_serialize_series` by 234% in PR #6044 (`refactor-serialization`)
Certainly! Here is a more optimized version of the program.



Changes made.
1. Replaced the `apply` method with dictionary comprehension. This avoids creating an intermediate Series, which can be an expensive operation.
2. Moved `_truncate_value` outside of the main function to keep the main function concise and focused.

* refactor: Remove unused `_truncate_value` function from serialization module

---------

Co-authored-by: Gabriel Luiz Freitas Almeida <gabriel@langflow.org>
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
Copy link
Contributor

codeflash-ai bot commented Feb 3, 2025

This PR is now faster! 🚀 @ogabrielluiz accepted my optimizations from:

github-merge-queue bot pushed a commit that referenced this pull request Feb 3, 2025
…(`refactor-serialization`) (#6078)

* feat: Implement serialization functions for various data types and add a unified serialize method

* feat: Enhance serialization by adding support for primitive types, enums, and generic types

* fix: Update Pinecone integration to use VectorStore and handle import errors gracefully

* test: Add hypothesis-based tests for serialization functions across various data types

* refactor: Replace custom serialization logic with unified serialize function for consistency and maintainability

* refactor: Replace recursive serialization function with unified serialize method for improved clarity and maintainability

* refactor: Replace custom serialization logic with unified serialize function for improved consistency and clarity

* refactor: Enhance serialization logic by adding instance handling and streamlining type checks

* refactor: Remove custom dictionary serialization from ResultDataResponse for streamlined handling

* refactor: Enhance serialization in ResultDataResponse by adding max_items_length for improved handling of outputs, logs, messages, and artifacts

* refactor: Move MAX_ITEMS_LENGTH and MAX_TEXT_LENGTH constants to serialization module for better organization

* refactor: Simplify message serialization in Log model by utilizing unified serialize function

* refactor: Remove unnecessary pytest marker from TestSerializationHypothesis class

* optimize _serialize_bytes

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

* feat: Add support for numpy integer type serialization

* feat: Enhance serialization with support for pandas and numpy types

* test: Add comprehensive serialization tests for numpy and pandas types

* fix: Update _serialize_dispatcher to return string representation for unsupported types

* fix: Update _serialize_dispatcher to return the object directly instead of its string representation

* optmize conditional

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

* optimize length check

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

* fix: Update string and list truncation to include ellipsis for clarity

* ⚡️ Speed up function `_serialize_dataframe` by 123% in PR #6044 (`refactor-serialization`)
Certainly! Here's a more efficient version of the given program. The primary optimization performed here is removing the redundant `.apply()` call and directly truncating values in a more performant way.



### Changes Made.
1. **Removed redundant `apply` calls**: In the original code, there were nested `apply` calls which can be very slow on larger DataFrames. The new implementation converts the DataFrame to a list of dictionaries first and then truncates the values if needed.
2. **Optimized truncation logic**: Applied truncation directly while iterating over the dictionary after conversion from a DataFrame. This reduces overhead and improves readability.

These changes should enhance the runtime performance of the serialization process, especially for larger DataFrames.

---------

Co-authored-by: Gabriel Luiz Freitas Almeida <gabriel@langflow.org>
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
Copy link
Contributor

codeflash-ai bot commented Feb 3, 2025

This PR is now faster! 🚀 @ogabrielluiz accepted my optimizations from:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer refactor Maintenance tasks and housekeeping size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants