-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⚡️ Speed up method AstraDBVectorStoreComponent.reset_database_list
by 155% in PR #6028 (PlaygroundPage
)
#6159
base: PlaygroundPage
Are you sure you want to change the base?
⚡️ Speed up method AstraDBVectorStoreComponent.reset_database_list
by 155% in PR #6028 (PlaygroundPage
)
#6159
Conversation
…by 155% in PR #6028 (`PlaygroundPage`) Certainly! I'll provide an optimized version of the given Python program. The primary focus will be on reducing redundancies and improving the performance. First, notice that the initialization of database options involves fetching the database list and subsequently performing manipulations on it. We can optimize this by reducing intermediate steps and avoiding unnecessary looping. ### Optimized Code. ### Explanation of Changes. 1. **Refactored `_initialize_database_options` and Combined Steps**. - Merged parsing of database information directly within the `_get_and_parse_database_list` method. - Collected `names` and `metadata` in a single loop over the database list, avoiding the need for multiple list comprehensions which reduces overhead. 2. **Singular Data Fetch and Parse Method**. - Created a method `_get_and_parse_database_list` that encapsulates the functionality of fetching and parsing database information, reducing redundant error handling. 3. **Efficiently Set `build_config`**. - Directly updated `build_config` with names and metadata obtained from the parsing function. This approach ensures the program handles the data more efficiently with less overhead and minimizes unnecessary looping structures.
names = [] | ||
metadata = [] | ||
for name, info in database_list.items(): | ||
names.append(name) | ||
metadata.append( | ||
{ | ||
"collections": info["collections"], | ||
"api_endpoint": info["api_endpoint"], | ||
} | ||
) | ||
return names, metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
names = [] | |
metadata = [] | |
for name, info in database_list.items(): | |
names.append(name) | |
metadata.append( | |
{ | |
"collections": info["collections"], | |
"api_endpoint": info["api_endpoint"], | |
} | |
) | |
return names, metadata | |
names = list(database_list.keys()) | |
metadata = [ | |
{"collections": info["collections"], "api_endpoint": info["api_endpoint"]} | |
for info in database_list.values() | |
] | |
except ValueError as ve: | |
raise ValueError(f"Error fetching database options: {ve}") from ve | |
raise ValueError(f"An unexpected error occurred: {e}") from e | |
@staticmethod | |
def _get_db_info(db, client, token, env_string): | |
api_endpoint = f"https://{db.info.id}-{db.info.region}.apps.astra{env_string}.datastax.com" | |
try: | |
collections_count = len( | |
list( | |
client.get_database( | |
api_endpoint=api_endpoint, token=token, keyspace=db.info.keyspace | |
).list_collection_names(keyspace=db.info.keyspace) | |
) | |
) | |
return db.info.name, {"api_endpoint": api_endpoint, "collections": collections_count} | |
except Exception: | |
return db.info.name, None |
⚡️ Codeflash found optimizations for this PR📄 20% (0.20x) speedup for
|
Test | Status |
---|---|
⚙️ Existing Unit Tests | 🔘 None Found |
🌀 Generated Regression Tests | ✅ 19 Passed |
⏪ Replay Tests | 🔘 None Found |
🔎 Concolic Coverage Tests | 🔘 None Found |
📊 Tests Coverage | undefined |
🌀 Generated Regression Tests Details
import pytest # used for our unit tests
# function to test
from langflow.base.vectorstores.model import LCVectorStoreComponent
from langflow.components.vectorstores.astradb import \
AstraDBVectorStoreComponent
# unit tests
class MockAstraDBVectorStoreComponent(AstraDBVectorStoreComponent):
def __init__(self, mock_data):
self.mock_data = mock_data
def get_database_list(self):
if isinstance(self.mock_data, Exception):
raise self.mock_data
return self.mock_data
def test_single_database_entry():
component = MockAstraDBVectorStoreComponent({"db1": {"collections": ["col1"], "api_endpoint": "http://endpoint1"}})
names, metadata = component._get_and_parse_database_list()
def test_multiple_database_entries():
component = MockAstraDBVectorStoreComponent({
"db1": {"collections": ["col1"], "api_endpoint": "http://endpoint1"},
"db2": {"collections": ["col2"], "api_endpoint": "http://endpoint2"}
})
names, metadata = component._get_and_parse_database_list()
def test_empty_database_list():
component = MockAstraDBVectorStoreComponent({})
names, metadata = component._get_and_parse_database_list()
def test_missing_collections():
component = MockAstraDBVectorStoreComponent({"db1": {"api_endpoint": "http://endpoint1"}})
with pytest.raises(ValueError):
component._get_and_parse_database_list()
def test_missing_api_endpoint():
component = MockAstraDBVectorStoreComponent({"db1": {"collections": ["col1"]}})
with pytest.raises(ValueError):
component._get_and_parse_database_list()
def test_non_dict_database_list():
component = MockAstraDBVectorStoreComponent(["db1", "db2"])
with pytest.raises(ValueError):
component._get_and_parse_database_list()
def test_non_dict_database_info():
component = MockAstraDBVectorStoreComponent({"db1": ["col1", "col2"]})
with pytest.raises(ValueError):
component._get_and_parse_database_list()
def test_database_name_empty_string():
component = MockAstraDBVectorStoreComponent({"": {"collections": ["col1"], "api_endpoint": "http://endpoint1"}})
names, metadata = component._get_and_parse_database_list()
def test_empty_collections_list():
component = MockAstraDBVectorStoreComponent({"db1": {"collections": [], "api_endpoint": "http://endpoint1"}})
names, metadata = component._get_and_parse_database_list()
def test_exception_in_get_database_list():
component = MockAstraDBVectorStoreComponent(Exception("Test exception"))
with pytest.raises(ValueError, match="Error fetching database options: Test exception"):
component._get_and_parse_database_list()
def test_large_number_of_databases():
large_data = {f"db{i}": {"collections": [f"col{i}"], "api_endpoint": f"http://endpoint{i}"} for i in range(1000)}
component = MockAstraDBVectorStoreComponent(large_data)
names, metadata = component._get_and_parse_database_list()
def test_special_characters_in_names():
component = MockAstraDBVectorStoreComponent({"db@1!": {"collections": ["col1"], "api_endpoint": "http://endpoint1"}})
names, metadata = component._get_and_parse_database_list()
def test_special_characters_in_collections_and_api_endpoint():
component = MockAstraDBVectorStoreComponent({"db1": {"collections": ["col@1!"], "api_endpoint": "http://end@point!1"}})
names, metadata = component._get_and_parse_database_list()
def test_mixed_valid_and_invalid_entries():
component = MockAstraDBVectorStoreComponent({
"db1": {"collections": ["col1"], "api_endpoint": "http://endpoint1"},
"db2": {"collections": ["col2"]}
})
with pytest.raises(ValueError):
component._get_and_parse_database_list()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import pytest # used for our unit tests
# function to test
from langflow.base.vectorstores.model import LCVectorStoreComponent
from langflow.components.vectorstores.astradb import \
AstraDBVectorStoreComponent
# unit tests
def test_standard_input(mocker):
# Mock the get_database_list method to return a standard input
mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value={
"db1": {"collections": ["col1", "col2"], "api_endpoint": "http://api1.example.com"},
"db2": {"collections": ["col3"], "api_endpoint": "http://api2.example.com"}
})
component = AstraDBVectorStoreComponent()
names, metadata = component._get_and_parse_database_list()
def test_empty_database_list(mocker):
# Mock the get_database_list method to return an empty dictionary
mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value={})
component = AstraDBVectorStoreComponent()
names, metadata = component._get_and_parse_database_list()
def test_single_database_entry(mocker):
# Mock the get_database_list method to return a single database entry
mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value={
"db1": {"collections": ["col1", "col2"], "api_endpoint": "http://api1.example.com"}
})
component = AstraDBVectorStoreComponent()
names, metadata = component._get_and_parse_database_list()
def test_missing_keys_in_database_info(mocker):
# Mock the get_database_list method to return entries with missing keys
mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value={
"db1": {"collections": ["col1", "col2"]}, # Missing api_endpoint
"db2": {"api_endpoint": "http://api2.example.com"} # Missing collections
})
component = AstraDBVectorStoreComponent()
with pytest.raises(ValueError, match="Error fetching database options"):
component._get_and_parse_database_list()
def test_invalid_database_list_structure(mocker):
# Mock the get_database_list method to return an invalid structure
mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value=["db1", "db2"])
component = AstraDBVectorStoreComponent()
with pytest.raises(ValueError, match="Error fetching database options"):
component._get_and_parse_database_list()
def test_exception_during_fetching(mocker):
# Mock the get_database_list method to raise an exception
mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', side_effect=Exception("Network error"))
component = AstraDBVectorStoreComponent()
with pytest.raises(ValueError, match="Error fetching database options: Network error"):
component._get_and_parse_database_list()
def test_large_number_of_databases(mocker):
# Mock the get_database_list method to return a large number of database entries
mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value={
f"db{i}": {"collections": [f"col{i}"], "api_endpoint": f"http://api{i}.example.com"} for i in range(1000)
})
component = AstraDBVectorStoreComponent()
names, metadata = component._get_and_parse_database_list()
for i in range(1000):
pass
def test_nested_structures_in_database_info(mocker):
# Mock the get_database_list method to return nested structures in database info
mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value={
"db1": {"collections": [{"name": "col1", "type": "type1"}, {"name": "col2", "type": "type2"}], "api_endpoint": "http://api1.example.com"}
})
component = AstraDBVectorStoreComponent()
names, metadata = component._get_and_parse_database_list()
def test_performance_with_large_data_samples(mocker):
# Mock the get_database_list method to return a very large number of database entries
mocker.patch.object(AstraDBVectorStoreComponent, 'get_database_list', return_value={
f"db{i}": {"collections": [f"col{i}"], "api_endpoint": f"http://api{i}.example.com"} for i in range(1000)
})
component = AstraDBVectorStoreComponent()
names, metadata = component._get_and_parse_database_list()
for i in range(1000):
pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
⚡️ This pull request contains optimizations for PR #6028
If you approve this dependent PR, these changes will be merged into the original PR branch
PlaygroundPage
.📄 155% (1.55x) speedup for
AstraDBVectorStoreComponent.reset_database_list
insrc/backend/base/langflow/components/vectorstores/astradb.py
⏱️ Runtime :
543 microseconds
→213 microseconds
(best of41
runs)📝 Explanation and details
Certainly! I'll provide an optimized version of the given Python program. The primary focus will be on reducing redundancies and improving the performance.
First, notice that the initialization of database options involves fetching the database list and subsequently performing manipulations on it. We can optimize this by reducing intermediate steps and avoiding unnecessary looping.
Optimized Code.
Explanation of Changes.
Refactored
_initialize_database_options
and Combined Steps._get_and_parse_database_list
method.names
andmetadata
in a single loop over the database list, avoiding the need for multiple list comprehensions which reduces overhead.Singular Data Fetch and Parse Method.
_get_and_parse_database_list
that encapsulates the functionality of fetching and parsing database information, reducing redundant error handling.Efficiently Set
build_config
.build_config
with names and metadata obtained from the parsing function.This approach ensures the program handles the data more efficiently with less overhead and minimizes unnecessary looping structures.
✅ Correctness verification report:
🌀 Generated Regression Tests Details