Additional asv tests #2185

grusev · 2025-02-18T09:17:00Z

Reference Issues/PRs

More tests migrated to asv with S3 and new framework that allows LMDB, Amazon S3 and others:

batch tests
modification tests - update, append, delete
query benchamarks
finalized test data.

ASV run on S3 without errors here: https://github.com/man-group/ArcticDB/actions/runs/13414624562/job/37472454850
New link after allmodifications: https://github.com/man-group/ArcticDB/actions/runs/13703208873

What does this implement or fix?

Change Type (Required)

Patch (Bug fix or non-breaking improvement)
Minor (New feature, but backward compatible)
Major (Breaking changes)
Cherry pick

Any other comments?

Checklist

Checklist for code changes...

Have you updated the relevant docstrings, documentation and copyright notice?
Is this contribution tested against all ArcticDB's features?
Do all exceptions introduced raise appropriate error messages?
Are API changes highlighted in the PR description?
Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

IvoDD · 2025-02-26T14:29:25Z

python/arcticdb/util/environment_setup.py

@@ -664,6 +665,322 @@ def clear_symbols_cache(self):
            lib._nvs.version_store._clear_symbol_list_keys()


I have encountered a few confusing things not related to the PR itself but the previous PR. They are small but I think would be good to address in this PR while we're still at this? Adding as replies here as I can't add a comment to a non-edited part :/

Here the get_library_names is a bit confusing when using. We have in a few places code like:
self.get_library_names(suffix)[0] which is confusing because what does the [0] stand for?

What do you think about instead passing an argument to the function whether you want a modifiable or a persiatant library? E.g. I think this would be more readable:

self.get_library_name(suffix, lib_type=LibraryType.PERM)

I like the idea! Makes lots of sense and will be clear.

I see you've created more functions get_library_name and get_modifiable_library_name and kept the old one.

I meant just replacing the original get_library_name to take an argument from a new enum type. It should simplify this.

python/benchmarks/real_finalize_staged_data.py

python/arcticdb/util/utils.py

python/arcticdb/util/environment_setup.py

IvoDD · 2025-02-26T16:20:33Z

python/arcticdb/util/environment_setup.py

+        return next
+
+
+class GeneralSetupOfLibrariesWithSymbols(EnvConfigurationBase):


I personally find the names of all these EnvironmentConfigurations a bit confusing.

All of them start with General which I think just makes the names longer.

Also they live inside environment_setup so when importing environment_setup.General... it's clear that it is for a setup, so maybe let's also drop the setup from the naming.

So what do you think about the following renames:
GeneralSetupLibraryWithSymbols -> SingleLibrary
GeneralSetupSymblsVersionsSnapshots -> LibrariesWithVersionAndSnapshots
GeneralUseCaseNoSetup -> NoSetup
GeneralAppendSetup -> LibraryWithAppendData
GeneralSetupOfLibrariesWithSymbols -> LibrariesPerNumSymbols To indeicate each library is for a specific number of symbols.

I haven't thought too much about the names but it would be good to know e.g. this class will maintain many libraries and different libraries will have different number of symbols.

IvoDD · 2025-02-26T16:22:23Z

python/arcticdb/util/environment_setup.py

+        (list_rows, list_cols) = self._get_symbol_bounds()
+
+        for num_symbols in self._params[self.param_index_num_symbols]:
+            lib = self.get_library(num_symbols)


This will raise if the library does not exist? Should we have a try catch to return False if something in the check raised?
I think that holds for all check_ok steps, so maybe the try catch can live in the setup_environment?

I think you missed this one?

python/arcticdb/util/environment_setup.py

IvoDD · 2025-02-26T16:24:44Z

python/arcticdb/util/environment_setup.py

+        start = time.time()
+        for sym_num in range(symbols_number):
+            for row in list_rows:
+                for col in list_cols:


This looks like surprising behavior to me. Won't we generate num_symbols * num_rows * num_cols instead of just num_symbols?

You missed this?

IvoDD

You had a few missed comments.

Two large notes:

As discussed offline with Georgi we should simplify this framework quite a bit as it feels overengineered. But can probably leave this as a separate task to get these new benchmarks merged
Could you please test that all of the new bencharms you created are running in the CI and are working? I think there are some bugs in them and they would just fail
I think all of the real benchmarks are copy pasted from the previous ones. Is the plan to remove the old ones?

IvoDD · 2025-03-05T16:28:00Z

python/benchmarks/real_batch_functions.py

+        """
+        Returns a date range selecting last X% of rows of dataframe
+        pass percents as 0.0-1.0
+        """


This function is probably useful in general and we should move to utils.py?

IvoDD · 2025-03-05T16:43:33Z

python/benchmarks/real_batch_functions.py

+        self.df: pd.DataFrame = self.setup_env.generate_dataframe(num_rows, self.setup_env.default_number_cols)
+
+        #Construct read request with date_range
+        self.date_range = self.get_last_x_percent_date_range(num_rows, 0.05)


What is the intention of this date_range? It should be a tuple of start, end date. I think the get_last_x_percent_date_range returns a list of dates with an equal frequency?

I'm actually surprised this benchmark even works. Have you tried running them?

poodlewars · 2025-03-07T15:04:42Z

python/arcticdb/util/environment_setup.py

+    ideally should use process id as identification for symbol, library etc.
+
+    Typical use is to generate 4 or more dataframes. With first you will initiate 
+    write to a symbol and with next you can do appends poping out from list


typo poping

poodlewars · 2025-03-07T15:06:14Z

python/arcticdb/util/environment_setup.py

+        assert len(sequence_df_list) > 0
+        start = sequence_df_list[0].head(1).index.array[0]
+        last = sequence_df_list[-1].tail(1).index.array[0]
+        return (start, last)


Won't df.index[0], df.index[-1] give you these?

poodlewars · 2025-03-07T15:08:26Z

python/arcticdb/util/environment_setup.py

+        self.__init_time_number = TimestampNumber.from_timestamp(initial_timestamp, self._frequency)
+        return self
+
+    def set_frquency(self, freq):


typo frquency

poodlewars · 2025-03-07T15:10:01Z

python/arcticdb/util/environment_setup.py

+    """
+        Sets up multiple libraries, each containing specified number of symbols
+        and each symbols having specified row numbers (col numbers can be defined also)
+        Accepts ASV params:"


poodlewars · 2025-03-07T15:12:05Z

python/arcticdb/util/environment_setup.py

+        else:
+            start = start_timestamp
+
+        df = DFGenerator.generate_wide_dataframe(num_rows=number_rows, num_cols=self.__number_cols,


It would be good to make sure these generated a good variety of strings, including unicode strings, which we have to handle very differently.

poodlewars · 2025-03-07T15:15:09Z

python/arcticdb/util/environment_setup.py

        assert setup.check_ok()

        #Changing parameters should trigger not ok for setup
        setup.set_with_metadata_for_each_version()
        setup.set_with_snapshot_for_each_version()
        setup.set_params([2, 3])
        assert not setup.check_ok()
+
+    @classmethod
+    def test_setup_multiple_libs_with_symbols0(cls):


More meaningful name than test_setup_multiple_libs_with_symbols0

poodlewars

Just parking this review for now while you and Ivo discuss the design

poodlewars · 2025-03-07T15:15:23Z

python/arcticdb/util/environment_setup.py

+
+        assert not setup.check_ok()
+        setup.setup_environment()
+        #assert setup.check_ok()


commented out code

poodlewars · 2025-03-07T15:16:23Z

python/arcticdb/util/environment_setup.py

+            ln = setup.get_library_name(LibraryType.PERSISTENT, num_syms)
+            lib = ac.get_library(ln)
+            symbols_list = lib.list_symbols()
+            setup.logger().info(symbols_list)


How big are the symbol lists? This log output could be huge

poodlewars · 2025-03-07T15:17:10Z

python/arcticdb/util/environment_setup.py

+        for num_syms in params[0]: # the symbols list
+            ln = setup.get_library_name(LibraryType.PERSISTENT, num_syms)
+            lib = ac.get_library(ln)
+            symbols_list = lib.list_symbols()


Remember that making this call will compact the symbol list cache, so may affect any benchmarks of symbol list performance

poodlewars · 2025-03-07T15:17:58Z

python/arcticdb/util/environment_setup.py

+                assert setup.has_library(num_syms) 
+                lib = setup.get_library(num_syms)
+                symbol = setup.get_symbol_name(num_syms - 1, rows, cols)
+                assert lib.read(symbol).data is not None


This assertion seems quite slow and expensive for what it tells us. If you have to have it, you should at least call head rather than reading the entire symbol

poodlewars · 2025-03-07T15:23:42Z

python/benchmarks/real_query_builder.py

+
+class AWSQueryBuilderFunctions:
+    """
+    This is same test as :LocalQueryBuilderFunctions:`LocalQueryBuilderFunctions`


Is there really no way to share the code between them, for example by pulling out a subclass?

poodlewars · 2025-03-07T15:24:44Z

python/benchmarks/real_modification_functions.py

+                                    AWS30kColsWideDFLargeAppendDataModify.number)
+
+
+class LMDB30kColsWideDFLargeAppendDataModify(AWSLargeAppendDataModify):


This looks very odd, why is there an LMDB class inheriting from an AWS class?

#### Reference Issues/PRs  The test creates has a special fixture to setup needed environment (not in test as it slows extremely execution under memray) - creates library with predefined by test library options, then adds a symbol and creates many versions and a snapshot for each version. The dataframes created are growing in size when library is dynamic. The test runs is executed 2 times with different library options for segments size, dynamic type on/off, and encoding version. It covers head and tails executions with different parameters over different types of versions/snapshots of a symbol NOTE: utils.py is not part of this PR. It is part of #2185 but is there to reuse code as the other is not yet merged Additional notes: Why Linux threshold is so high see - 3.11 - https://github.com/man-group/ArcticDB/actions/runs/13517989453/job/37771476992?pr=2199 3.9 - https://github.com/man-group/ArcticDB/actions/runs/13517989453/job/37771214446 What can be done to address massive leaks which should not be considered leaks is filtering see - https://github.com/man-group/ArcticDB/actions/runs/13517989379/job/37770659554?pr=2199 Overall raised issue to address flakiness and have stress tests run on debug build perhaps only single python version on only 3 runners with 3 different oses. That could give possibility to filter out frames we know are ok as in this example and reduce time for other functional tests #### What does this implement or fix? ## Change Type (Required) - [x] **Patch** (Bug fix or non-breaking improvement) - [ ] **Minor** (New feature, but backward compatible) - [ ] **Major** (Breaking changes) - [ ] **Cherry pick** #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details>  --------- Co-authored-by: Georgi Rusev <Georgi Rusev>

Georgi Rusev added 8 commits February 18, 2025 11:11

initial commit

d4934ff

fix omission

71c4d93

lmdb test added and also some more logging

0ea52b6

new theory

f8c643c

small fixes

d3a5c3f

fix

42a140a

fix

6554d63

turn off execution of a test

4dfb554

github-actions bot added patch Small change, should increase patch version and removed patch Small change, should increase patch version labels Feb 19, 2025

Georgi Rusev added 2 commits February 19, 2025 12:17

fixed delete tests

f8cdd50

small error fix

84acd3b

github-actions bot added patch Small change, should increase patch version and removed patch Small change, should increase patch version labels Feb 19, 2025

grusev marked this pull request as ready for review February 19, 2025 11:53

grusev requested review from alexowens90, willdealtry and poodlewars as code owners February 19, 2025 11:53

last attempt

564e2d6

github-actions bot added patch Small change, should increase patch version and removed patch Small change, should increase patch version labels Feb 19, 2025

remove setup bug

2f04615

github-actions bot added patch Small change, should increase patch version and removed patch Small change, should increase patch version labels Feb 20, 2025

grusev mentioned this pull request Feb 25, 2025

Memray tests for memory leaks for head() and tail() #2199

Merged

9 tasks

grusev and others added 2 commits February 25, 2025 13:44

Merge branch 'master' into asv_s3_more

e1e9865

fix bug

f3fdcd0

grusev mentioned this pull request Feb 26, 2025

Comparison of memory efficiency LMDB, S3, Pandas for read/write operations on dataframes (new approach on making sure we accurately measure peakmem with ASV) #2204

Open

9 tasks

IvoDD reviewed Feb 26, 2025

View reviewed changes

Georgi Rusev added 2 commits February 28, 2025 14:21

addressed comments

8520125

fixed notes

12d850f

IvoDD requested changes Mar 5, 2025

View reviewed changes

grusev and others added 7 commits March 6, 2025 09:59

Merge branch 'master' into asv_s3_more

39c8776

fix date range

78609b9

get_library_name now remains only method

2ca6f0a

addressed comments

f690b1f

updated last comment for setyp multiple libs with symbols

4c26323

fix ommission

8d528c0

tone down logging

609adb4

poodlewars reviewed Mar 7, 2025

View reviewed changes

grusev and others added 4 commits March 10, 2025 12:02

Merge branch 'master' into asv_s3_more

fb136be

fixes for comments

5d2fa1d

added support for unicode strings

7dbba01

Merge branch 'master' into asv_s3_more

24b6dd7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional asv tests #2185

Additional asv tests #2185

grusev commented Feb 18, 2025 •

edited

Loading

IvoDD Feb 26, 2025

IvoDD Feb 26, 2025

grusev Feb 27, 2025

IvoDD Mar 5, 2025

IvoDD Feb 26, 2025

IvoDD Feb 26, 2025

IvoDD Mar 5, 2025

IvoDD Feb 26, 2025

IvoDD Mar 5, 2025

IvoDD left a comment

IvoDD Mar 5, 2025

IvoDD Mar 5, 2025

IvoDD Mar 5, 2025

poodlewars Mar 7, 2025

poodlewars Mar 7, 2025 •

edited

Loading

poodlewars Mar 7, 2025

poodlewars Mar 7, 2025

poodlewars Mar 7, 2025

poodlewars Mar 7, 2025

poodlewars left a comment

poodlewars Mar 7, 2025

poodlewars Mar 7, 2025

poodlewars Mar 7, 2025

poodlewars Mar 7, 2025

poodlewars Mar 7, 2025

poodlewars Mar 7, 2025

		@@ -664,6 +665,322 @@ def clear_symbols_cache(self):
		lib._nvs.version_store._clear_symbol_list_keys()

		return next


		class GeneralSetupOfLibrariesWithSymbols(EnvConfigurationBase):

		AWS30kColsWideDFLargeAppendDataModify.number)


		class LMDB30kColsWideDFLargeAppendDataModify(AWSLargeAppendDataModify):

Additional asv tests #2185

Are you sure you want to change the base?

Additional asv tests #2185

Conversation

grusev commented Feb 18, 2025 • edited Loading

Reference Issues/PRs

What does this implement or fix?

Change Type (Required)

Any other comments?

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IvoDD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

poodlewars Mar 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

poodlewars left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grusev commented Feb 18, 2025 •

edited

Loading

poodlewars Mar 7, 2025 •

edited

Loading