⚡️ Speed up method `AstraDBVectorStoreComponent.reset_database_list` by 155% in PR #6028 (`PlaygroundPage`) #6159

codeflash-ai · 2025-02-06T15:57:57Z

⚡️ This pull request contains optimizations for PR #6028

If you approve this dependent PR, these changes will be merged into the original PR branch PlaygroundPage.

This PR will be automatically closed if the original PR is merged.

📄 155% (1.55x) speedup for `AstraDBVectorStoreComponent.reset_database_list` in `src/backend/base/langflow/components/vectorstores/astradb.py`

⏱️ Runtime : 543 microseconds → 213 microseconds (best of 41 runs)

📝 Explanation and details

Certainly! I'll provide an optimized version of the given Python program. The primary focus will be on reducing redundancies and improving the performance.

First, notice that the initialization of database options involves fetching the database list and subsequently performing manipulations on it. We can optimize this by reducing intermediate steps and avoiding unnecessary looping.

Optimized Code.

Explanation of Changes.

Refactored _initialize_database_options and Combined Steps.
- Merged parsing of database information directly within the _get_and_parse_database_list method.
- Collected names and metadata in a single loop over the database list, avoiding the need for multiple list comprehensions which reduces overhead.
Singular Data Fetch and Parse Method.
- Created a method _get_and_parse_database_list that encapsulates the functionality of fetching and parsing database information, reducing redundant error handling.
Efficiently Set build_config.
- Directly updated build_config with names and metadata obtained from the parsing function.

This approach ensures the program handles the data more efficiently with less overhead and minimizes unnecessary looping structures.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 12 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	undefined

🌀 Generated Regression Tests Details

import pytest  # used for our unit tests
# function to test
from langflow.base.vectorstores.model import LCVectorStoreComponent
from langflow.components.vectorstores.astradb import \
    AstraDBVectorStoreComponent

# unit tests

# Mock class to override get_database_list method
class MockAstraDBVectorStoreComponent(AstraDBVectorStoreComponent):
    def __init__(self, mock_data):
        self.mock_data = mock_data

    def get_database_list(self):
        return self.mock_data

# Basic Functionality
def test_reset_database_list_multiple_databases():
    component = MockAstraDBVectorStoreComponent({
        "db1": {"collections": ["col1", "col2"], "api_endpoint": "endpoint1"},
        "db2": {"collections": ["col3"], "api_endpoint": "endpoint2"}
    })
    build_config = {"api_endpoint": {"options": [], "options_metadata": [], "value": ""}}
    expected_config = {
        "api_endpoint": {
            "options": ["db1", "db2"],
            "options_metadata": [
                {"collections": ["col1", "col2"], "api_endpoint": "endpoint1"},
                {"collections": ["col3"], "api_endpoint": "endpoint2"}
            ],
            "value": ""
        }
    }
    codeflash_output = component.reset_database_list(build_config)

def test_reset_database_list_single_database():
    component = MockAstraDBVectorStoreComponent({
        "db1": {"collections": ["col1"], "api_endpoint": "endpoint1"}
    })
    build_config = {"api_endpoint": {"options": [], "options_metadata": [], "value": ""}}
    expected_config = {
        "api_endpoint": {
            "options": ["db1"],
            "options_metadata": [
                {"collections": ["col1"], "api_endpoint": "endpoint1"}
            ],
            "value": ""
        }
    }
    codeflash_output = component.reset_database_list(build_config)

# Edge Cases
def test_reset_database_list_empty_database_list():
    component = MockAstraDBVectorStoreComponent({})
    build_config = {"api_endpoint": {"options": [], "options_metadata": [], "value": ""}}
    expected_config = {
        "api_endpoint": {
            "options": [],
            "options_metadata": [],
            "value": ""
        }
    }
    codeflash_output = component.reset_database_list(build_config)

def test_reset_database_list_missing_keys():
    component = MockAstraDBVectorStoreComponent({
        "db1": {"collections": ["col1"]}  # Missing "api_endpoint"
    })
    build_config = {"api_endpoint": {"options": [], "options_metadata": [], "value": ""}}
    with pytest.raises(ValueError):
        component.reset_database_list(build_config)


def test_reset_database_list_non_dict_build_config():
    component = MockAstraDBVectorStoreComponent({
        "db1": {"collections": ["col1"], "api_endpoint": "endpoint1"}
    })
    build_config = ["api_endpoint", {"options": [], "options_metadata": [], "value": ""}]
    with pytest.raises(TypeError):
        component.reset_database_list(build_config)


def test_reset_database_list_exception_in_initialize_database_options():
    class ExceptionMockAstraDBVectorStoreComponent(AstraDBVectorStoreComponent):
        def get_database_list(self):
            raise Exception("Test Exception")

    component = ExceptionMockAstraDBVectorStoreComponent()
    build_config = {"api_endpoint": {"options": [], "options_metadata": [], "value": ""}}
    with pytest.raises(ValueError, match="Error fetching database options: Test Exception"):
        component.reset_database_list(build_config)

# Large Scale Test Cases
def test_reset_database_list_large_number_of_databases():
    large_data = {f"db{i}": {"collections": [f"col{i}"], "api_endpoint": f"endpoint{i}"} for i in range(1000)}
    component = MockAstraDBVectorStoreComponent(large_data)
    build_config = {"api_endpoint": {"options": [], "options_metadata": [], "value": ""}}
    expected_config = {
        "api_endpoint": {
            "options": [f"db{i}" for i in range(1000)],
            "options_metadata": [
                {"collections": [f"col{i}"], "api_endpoint": f"endpoint{i}"} for i in range(1000)
            ],
            "value": ""
        }
    }
    codeflash_output = component.reset_database_list(build_config)

# Boundary Conditions
def test_reset_database_list_max_length_database_name():
    max_length_name = "a" * 255
    component = MockAstraDBVectorStoreComponent({
        max_length_name: {"collections": ["col1"], "api_endpoint": "endpoint1"}
    })
    build_config = {"api_endpoint": {"options": [], "options_metadata": [], "value": ""}}
    expected_config = {
        "api_endpoint": {
            "options": [max_length_name],
            "options_metadata": [
                {"collections": ["col1"], "api_endpoint": "endpoint1"}
            ],
            "value": ""
        }
    }
    codeflash_output = component.reset_database_list(build_config)

def test_reset_database_list_max_length_collections_and_api_endpoint():
    max_length_value = "a" * 255
    component = MockAstraDBVectorStoreComponent({
        "db1": {"collections": [max_length_value], "api_endpoint": max_length_value}
    })
    build_config = {"api_endpoint": {"options": [], "options_metadata": [], "value": ""}}
    expected_config = {
        "api_endpoint": {
            "options": ["db1"],
            "options_metadata": [
                {"collections": [max_length_value], "api_endpoint": max_length_value}
            ],
            "value": ""
        }
    }
    codeflash_output = component.reset_database_list(build_config)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest  # used for our unit tests
# function to test
from langflow.base.vectorstores.model import LCVectorStoreComponent
from langflow.components.vectorstores.astradb import \
    AstraDBVectorStoreComponent


# unit tests
class TestAstraDBVectorStoreComponent:
    @pytest.fixture
    def component(self, mocker):
        # Create an instance of the component and mock get_database_list method
        component = AstraDBVectorStoreComponent()
        mocker.patch.object(component, 'get_database_list')
        return component

…by 155% in PR #6028 (`PlaygroundPage`) Certainly! I'll provide an optimized version of the given Python program. The primary focus will be on reducing redundancies and improving the performance. First, notice that the initialization of database options involves fetching the database list and subsequently performing manipulations on it. We can optimize this by reducing intermediate steps and avoiding unnecessary looping. ### Optimized Code. ### Explanation of Changes. 1. **Refactored `_initialize_database_options` and Combined Steps**. - Merged parsing of database information directly within the `_get_and_parse_database_list` method. - Collected `names` and `metadata` in a single loop over the database list, avoiding the need for multiple list comprehensions which reduces overhead. 2. **Singular Data Fetch and Parse Method**. - Created a method `_get_and_parse_database_list` that encapsulates the functionality of fetching and parsing database information, reducing redundant error handling. 3. **Efficiently Set `build_config`**. - Directly updated `build_config` with names and metadata obtained from the parsing function. This approach ensures the program handles the data more efficiently with less overhead and minimizes unnecessary looping structures.

codeflash-ai · 2025-02-06T16:10:34Z

src/backend/base/langflow/components/vectorstores/astradb.py

+            names = []
+            metadata = []
+            for name, info in database_list.items():
+                names.append(name)
+                metadata.append(
+                    {
+                        "collections": info["collections"],
+                        "api_endpoint": info["api_endpoint"],
+                    }
+                )
+            return names, metadata


Suggested change

names = []

metadata = []

for name, info in database_list.items():

names.append(name)

metadata.append(

{

"collections": info["collections"],

"api_endpoint": info["api_endpoint"],

}

)

return names, metadata

names = list(database_list.keys())

metadata = [

{"collections": info["collections"], "api_endpoint": info["api_endpoint"]}

for info in database_list.values()

]

except ValueError as ve:

raise ValueError(f"Error fetching database options: {ve}") from ve

raise ValueError(f"An unexpected error occurred: {e}") from e

@staticmethod

def _get_db_info(db, client, token, env_string):

api_endpoint = f"https://{db.info.id}-{db.info.region}.apps.astra{env_string}.datastax.com"

try:

collections_count = len(

list(

client.get_database(

api_endpoint=api_endpoint, token=token, keyspace=db.info.keyspace

).list_collection_names(keyspace=db.info.keyspace)

)

)

return db.info.name, {"api_endpoint": api_endpoint, "collections": collections_count}

except Exception:

return db.info.name, None

codeflash-ai · 2025-02-06T16:10:37Z

⚡️ Codeflash found optimizations for this PR

📄 20% (0.20x) speedup for `AstraDBVectorStoreComponent._get_and_parse_database_list` in `src/backend/base/langflow/components/vectorstores/astradb.py`

⏱️ Runtime : 625 microseconds → 523 microseconds (best of 56 runs)

📝 Explanation and details

To optimize the performance of the given Python program, we need a more efficient way to collect and parse the database list. Here are several changes we can implement to achieve this goal.

Utilize parallel processing: To speed up the creation of API endpoints and the querying of collection names, we can use concurrent programming.
Reduce redundant operations: Avoid unnecessary list transformations and repeated dict lookups.
Error handling: Replace broad exception handling with more specific exceptions.

Key Improvements.

Concurrent Processing:
- Used ThreadPoolExecutor from the concurrent.futures module to parallelize collecting database information.
- Each database's information fetching is handled in a separate thread.
Improved Error Handling.
- Now should catch and raise more specific exceptions where appropriate to give clear error messages.

These changes should reduce the time needed to retrieve and process the database list, especially when dealing with a large number of databases.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 19 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	undefined

🌀 Generated Regression Tests Details

import pytest  # used for our unit tests
# function to test
from langflow.base.vectorstores.model import LCVectorStoreComponent
from langflow.components.vectorstores.astradb import \
    AstraDBVectorStoreComponent


# unit tests
class MockAstraDBVectorStoreComponent(AstraDBVectorStoreComponent):
    def __init__(self, mock_data):
        self.mock_data = mock_data

    def get_database_list(self):
        if isinstance(self.mock_data, Exception):
            raise self.mock_data
        return self.mock_data


def test_single_database_entry():
    component = MockAstraDBVectorStoreComponent({"db1": {"collections": ["col1"], "api_endpoint": "http://endpoint1"}})
    names, metadata = component._get_and_parse_database_list()


def test_multiple_database_entries():
    component = MockAstraDBVectorStoreComponent({
        "db1": {"collections": ["col1"], "api_endpoint": "http://endpoint1"},
        "db2": {"collections": ["col2"], "api_endpoint": "http://endpoint2"}
    })
    names, metadata = component._get_and_parse_database_list()


def test_empty_database_list():
    component = MockAstraDBVectorStoreComponent({})
    names, metadata = component._get_and_parse_database_list()


def test_missing_collections():
    component = MockAstraDBVectorStoreComponent({"db1": {"api_endpoint": "http://endpoint1"}})
    with pytest.raises(ValueError):
        component._get_and_parse_database_list()


def test_missing_api_endpoint():
    component = MockAstraDBVectorStoreComponent({"db1": {"collections": ["col1"]}})
    with pytest.raises(ValueError):
        component._get_and_parse_database_list()


def test_non_dict_database_list():
    component = MockAstraDBVectorStoreComponent(["db1", "db2"])
    with pytest.raises(ValueError):
        component._get_and_parse_database_list()


def test_non_dict_database_info():
    component = MockAstraDBVectorStoreComponent({"db1": ["col1", "col2"]})
    with pytest.raises(ValueError):
        component._get_and_parse_database_list()


def test_database_name_empty_string():
    component = MockAstraDBVectorStoreComponent({"": {"collections": ["col1"], "api_endpoint": "http://endpoint1"}})
    names, metadata = component._get_and_parse_database_list()


def test_empty_collections_list():
    component = MockAstraDBVectorStoreComponent({"db1": {"collections": [], "api_endpoint": "http://endpoint1"}})
    names, metadata = component._get_and_parse_database_list()


def test_exception_in_get_database_list():
    component = MockAstraDBVectorStoreComponent(Exception("Test exception"))
    with pytest.raises(ValueError, match="Error fetching database options: Test exception"):
        component._get_and_parse_database_list()


def test_large_number_of_databases():
    large_data = {f"db{i}": {"collections": [f"col{i}"], "api_endpoint": f"http://endpoint{i}"} for i in range(1000)}
    component = MockAstraDBVectorStoreComponent(large_data)
    names, metadata = component._get_and_parse_database_list()


def test_special_characters_in_names():
    component = MockAstraDBVectorStoreComponent({"db@1!": {"collections": ["col1"], "api_endpoint": "http://endpoint1"}})
    names, metadata = component._get_and_parse_database_list()


def test_special_characters_in_collections_and_api_endpoint():
    component = MockAstraDBVectorStoreComponent({"db1": {"collections": ["col@1!"], "api_endpoint": "http://end@point!1"}})
    names, metadata = component._get_and_parse_database_list()


def test_mixed_valid_and_invalid_entries():
    component = MockAstraDBVectorStoreComponent({
        "db1": {"collections": ["col1"], "api_endpoint": "http://endpoint1"},
        "db2": {"collections": ["col2"]}
    })
    with pytest.raises(ValueError):
        component._get_and_parse_database_list()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest  # used for our unit tests
# function to test
from langflow.base.vectorstores.model import LCVectorStoreComponent
from langflow.components.vectorstores.astradb import \
    AstraDBVectorStoreComponent

# unit tests

def test_standard_input(mocker):
    # Mock the get_database_list method to return a standard input
    mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value={
        "db1": {"collections": ["col1", "col2"], "api_endpoint": "http://api1.example.com"},
        "db2": {"collections": ["col3"], "api_endpoint": "http://api2.example.com"}
    })
    component = AstraDBVectorStoreComponent()
    names, metadata = component._get_and_parse_database_list()

def test_empty_database_list(mocker):
    # Mock the get_database_list method to return an empty dictionary
    mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value={})
    component = AstraDBVectorStoreComponent()
    names, metadata = component._get_and_parse_database_list()

def test_single_database_entry(mocker):
    # Mock the get_database_list method to return a single database entry
    mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value={
        "db1": {"collections": ["col1", "col2"], "api_endpoint": "http://api1.example.com"}
    })
    component = AstraDBVectorStoreComponent()
    names, metadata = component._get_and_parse_database_list()

def test_missing_keys_in_database_info(mocker):
    # Mock the get_database_list method to return entries with missing keys
    mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value={
        "db1": {"collections": ["col1", "col2"]},  # Missing api_endpoint
        "db2": {"api_endpoint": "http://api2.example.com"}  # Missing collections
    })
    component = AstraDBVectorStoreComponent()
    with pytest.raises(ValueError, match="Error fetching database options"):
        component._get_and_parse_database_list()

def test_invalid_database_list_structure(mocker):
    # Mock the get_database_list method to return an invalid structure
    mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value=["db1", "db2"])
    component = AstraDBVectorStoreComponent()
    with pytest.raises(ValueError, match="Error fetching database options"):
        component._get_and_parse_database_list()

def test_exception_during_fetching(mocker):
    # Mock the get_database_list method to raise an exception
    mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', side_effect=Exception("Network error"))
    component = AstraDBVectorStoreComponent()
    with pytest.raises(ValueError, match="Error fetching database options: Network error"):
        component._get_and_parse_database_list()

def test_large_number_of_databases(mocker):
    # Mock the get_database_list method to return a large number of database entries
    mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value={
        f"db{i}": {"collections": [f"col{i}"], "api_endpoint": f"http://api{i}.example.com"} for i in range(1000)
    })
    component = AstraDBVectorStoreComponent()
    names, metadata = component._get_and_parse_database_list()
    for i in range(1000):
        pass

def test_nested_structures_in_database_info(mocker):
    # Mock the get_database_list method to return nested structures in database info
    mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value={
        "db1": {"collections": [{"name": "col1", "type": "type1"}, {"name": "col2", "type": "type2"}], "api_endpoint": "http://api1.example.com"}
    })
    component = AstraDBVectorStoreComponent()
    names, metadata = component._get_and_parse_database_list()

def test_performance_with_large_data_samples(mocker):
    # Mock the get_database_list method to return a very large number of database entries
    mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value={
        f"db{i}": {"collections": [f"col{i}"], "api_endpoint": f"http://api{i}.example.com"} for i in range(1000)
    })
    component = AstraDBVectorStoreComponent()
    names, metadata = component._get_and_parse_database_list()
    for i in range(1000):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Feb 6, 2025

codeflash-ai bot mentioned this pull request Feb 6, 2025

feat: configure and update PlaygroundPage #6028

Draft

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. enhancement New feature or request labels Feb 6, 2025

[autofix.ci] apply automated fixes

ecd5e27

codeflash-ai bot commented Feb 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up method `AstraDBVectorStoreComponent.reset_database_list` by 155% in PR #6028 (`PlaygroundPage`) #6159

⚡️ Speed up method `AstraDBVectorStoreComponent.reset_database_list` by 155% in PR #6028 (`PlaygroundPage`) #6159

codeflash-ai bot commented Feb 6, 2025

codeflash-ai bot Feb 6, 2025

codeflash-ai bot commented Feb 6, 2025

Key Improvements.

-            names = []
-            metadata = []
-            for name, info in database_list.items():
-                names.append(name)
-                metadata.append(
-                    {
-                        "collections": info["collections"],
-                        "api_endpoint": info["api_endpoint"],
-                    }
-                )
-            return names, metadata
+            names = list(database_list.keys())
+            metadata = [
+                {"collections": info["collections"], "api_endpoint": info["api_endpoint"]}
+                for info in database_list.values()
+            ]
+        except ValueError as ve:
+            raise ValueError(f"Error fetching database options: {ve}") from ve
+            raise ValueError(f"An unexpected error occurred: {e}") from e
+    @staticmethod
+    def _get_db_info(db, client, token, env_string):
+        api_endpoint = f"https://{db.info.id}-{db.info.region}.apps.astra{env_string}.datastax.com"
+        try:
+            collections_count = len(
+                list(
+                    client.get_database(
+                        api_endpoint=api_endpoint, token=token, keyspace=db.info.keyspace
+                    ).list_collection_names(keyspace=db.info.keyspace)
+                )
+            )
+            return db.info.name, {"api_endpoint": api_endpoint, "collections": collections_count}
+        except Exception:
+            return db.info.name, None

⚡️ Speed up method AstraDBVectorStoreComponent.reset_database_list by 155% in PR #6028 (PlaygroundPage) #6159

Are you sure you want to change the base?

⚡️ Speed up method AstraDBVectorStoreComponent.reset_database_list by 155% in PR #6028 (PlaygroundPage) #6159

Conversation

codeflash-ai bot commented Feb 6, 2025

⚡️ This pull request contains optimizations for PR #6028

📄 155% (1.55x) speedup for AstraDBVectorStoreComponent.reset_database_list in src/backend/base/langflow/components/vectorstores/astradb.py

Optimized Code.

Explanation of Changes.

codeflash-ai bot Feb 6, 2025

Choose a reason for hiding this comment

codeflash-ai bot commented Feb 6, 2025

⚡️ Codeflash found optimizations for this PR

📄 20% (0.20x) speedup for AstraDBVectorStoreComponent._get_and_parse_database_list in src/backend/base/langflow/components/vectorstores/astradb.py

Key Improvements.

⚡️ Speed up method `AstraDBVectorStoreComponent.reset_database_list` by 155% in PR #6028 (`PlaygroundPage`) #6159

⚡️ Speed up method `AstraDBVectorStoreComponent.reset_database_list` by 155% in PR #6028 (`PlaygroundPage`) #6159

📄 155% (1.55x) speedup for `AstraDBVectorStoreComponent.reset_database_list` in `src/backend/base/langflow/components/vectorstores/astradb.py`

📄 20% (0.20x) speedup for `AstraDBVectorStoreComponent._get_and_parse_database_list` in `src/backend/base/langflow/components/vectorstores/astradb.py`