-
Notifications
You must be signed in to change notification settings - Fork 84
Refactor Test Framework Architecture: Responsibility Segregation, Configuration, and Clarity #733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Test Framework Architecture: Responsibility Segregation, Configuration, and Clarity #733
Conversation
|
This needs rebase |
For now I'm focusing on improving the existing Python tests. The Rust conversion doesn't have much approval at the moment, so it'll be on hold for now. |
73dd553 to
f7fe352
Compare
Rebased |
Davidson-Souza
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review, will continue tomorrow (TM)
af27ece to
3440b46
Compare
…g Improvements, and Test Restart Refactoring e87a4d2 feat(functional): validate daemon startup and log errors on failure (moisesPomilio) 26804ae fix(test): improve port detection logic and reduce timeout (moisesPomilio) 60fe445 refactor(test): simplify restart test to use a single node (moisesPomilio) Pull request description: ### Description and Notes A validation step was added after node startup to ensure the daemon actually started successfully. After executing the startup command, the workflow waits for 1 second to verify if the node is running. If the daemon fails to start(such as due to misconfigured flags or command errors) the process is stopped, a relevant error message is logged (including the output of the command and the affected node), and an error is returned. This provides much clearer feedback for developers writing or running tests and makes it easier to detect incorrect flags or startup commands. The port detection logic was also improved. Previously, the test only read the last line of the log within a specified timeout, which could cause failures if the output timing was off, resulting in missed port detection and unstable CI runs. Now, all log lines are read and checked, significantly reducing the chance of missing the port assignment. Since port information is always printed at the beginning of the logs, this change does not impact performance. Note: This is not a definitive solution to the architectural issue behind port detection; a more comprehensive fix is planned for PR #733. Additionally, the restart test was refactored for better clarity and relevance. Instead of starting two nodes, stopping them, and checking their data directories (which did not accurately reflect a restart scenario), the test now starts a node, stops it, and then starts it again, verifying that restart works correctly and that stopping the node does not corrupt files. ### How to verify the changes you have done? - Start a node with both valid and intentionally invalid flags or commands. Confirm that, if the daemon fails to start, a clear log error is shown with the command output and node information. - Run the functional test suite and observe improved reliability in port detection; verify that logs are parsed correctly and the port is assigned even with slight output timing differences. - Review the restart test logic: ensure it starts a node, stops it, and starts the same node again, checking for correct restart behavior and file integrity. ACKs for top commit: jaoleal: ACK e87a4d2 j-moreno-c-r: ACK e87a4d2 Davidson-Souza: ACK e87a4d2 Tree-SHA512: 3f266552b84bf0df0a1decfd828c8600f90afaa2d0c070ffe4e6e0b7bb09cdcd7f661cdeadfd03b1c68d6fb036b2b519da28798a8e6d238deff12a368abdf800
|
Needs rebase |
3440b46 to
eb485f9
Compare
|
Rebased |
53d5ff3 to
d3301f7
Compare
|
The adjustments I was making to the |
d3301f7 to
0102b58
Compare
0102b58 to
4fdba69
Compare
|
Needs rebase |
4fdba69 to
240c7ab
Compare
|
Rebased |
Davidson-Souza
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the new organization. The tests are much easier to reason about, and I think it will really pay off as we go for more complex scenarios.
Just a few nits from me
ba517ae to
ab69cdb
Compare
tests/test_framework/__init__.py
Outdated
|
|
||
| @staticmethod | ||
| def get_available_random_port(start: int, end: int = 65535): | ||
| def get_available_random_port_by_range(start: int, end: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still got a transient error here. It happens like once every 30 executions on my machine (yes, I ran this hundreds of times, looking very good!). If the two tests are executed at the same time, the rng is seeded with system clock, so it will return the same port. And since they run at the same time, no one had the chance to bind to the port, so the test in 395 won't catch it.
We can easily solve it by seeding the rng with the test name. Perhaps testname + system time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, looking at the logs it's more like once every 5 runs on a faster-ish machine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! I had already seen this error but didn't investigate it further. I made the fix and on my machine the error apparently stopped happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran 128dd59 and still breaks. Wft is wrong with this RNG?
Perhaps run_node could re-randomize the port each attempt? Fyi I run this:
while true; do just test-functional-run || break; doneThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps
run_nodecould re-randomize the port each attempt? Fyi I run this:
I modified the architecture to move the default config creation to the node side. The node can now also have a static_value variable to indicate that values should be static. So when the node is in the initialization process and static_value is false, it changes its config values if there's an error. When the config is successful, it sets static_value to true to prevent an already initialized node from having its values modified if it gets restarted.
d9f880c to
f8c5862
Compare
|
f8c5862 I modified the architecture so that a node using default configs can have its config modified during initialization. This happens if the node fails to start, allowing it to change the config, especially the port, which solves the port conflict issue identified in this PR. However, after it's initialized, it's marked as |
Is this happening when he restarts |
|
With the following diff things got way more stable. It still breaks sometimes, but way less. Perhaps we should go ahead with this one and investigate a little further diff --git a/crates/floresta-wire/src/p2p_wire/node/conn.rs b/crates/floresta-wire/src/p2p_wire/node/conn.rs
index 25c40f6..de47ab2 100644
--- a/crates/floresta-wire/src/p2p_wire/node/conn.rs
+++ b/crates/floresta-wire/src/p2p_wire/node/conn.rs
@@ -561,6 +561,9 @@ where
&mut self,
required_service: ServiceFlags,
) -> Result<(), WireError> {
+ // try to connect with manually added peers
+ self.maybe_open_connection_with_added_peers()?;
+
// If the user passes in a `--connect` cli argument, we only connect with
// that particular peer.
if self.fixed_peer.is_some() && !self.peers.is_empty() {
@@ -573,9 +576,6 @@ where
let needs_utreexo = required_service.has(service_flags::UTREEXO.into());
self.maybe_use_hardcoded_addresses(needs_utreexo);
- // try to connect with manually added peers
- self.maybe_open_connection_with_added_peers()?;
-
let connection_kind = ConnectionKind::Regular(required_service);
if self.peers.len() < T::MAX_OUTGOING_PEERS {
self.create_connection(connection_kind)?;
diff --git a/tests/floresta-cli/getbestblockhash.py b/tests/floresta-cli/getbestblockhash.py
index b31f819..0886679 100644
--- a/tests/floresta-cli/getbestblockhash.py
+++ b/tests/floresta-cli/getbestblockhash.py
@@ -57,9 +57,7 @@ class GetBestblockhashTest(FlorestaTestFramework):
self.log("=== Connect floresta to utreexod")
utreexod_url = self.utreexod.p2p_url
- self.florestad.rpc.addnode(
- node=utreexod_url, command="onetry", v2transport=False
- )
+ self.florestad.rpc.addnode(node=utreexod_url, command="add", v2transport=False)
self.log("=== Waiting for floresta to connect to utreexod...")
self.wait_for_peers_connections(self.florestad, self.utreexod)
diff --git a/tests/floresta-cli/getblockcount.py b/tests/floresta-cli/getblockcount.py
index b72f3ad..b838103 100644
--- a/tests/floresta-cli/getblockcount.py
+++ b/tests/floresta-cli/getblockcount.py
@@ -71,14 +71,14 @@ class GetBlockCountTest(FlorestaTestFramework):
self.log("=== Connect floresta to utreexod")
utreexod_url = self.utreexod.p2p_url
- self.florestad.rpc.addnode(utreexod_url, command="onetry", v2transport=False)
+ self.florestad.rpc.addnode(utreexod_url, command="add", v2transport=False)
self.log("=== Waiting for floresta to connect to utreexod...")
self.wait_for_peers_connections(self.florestad, self.utreexod)
self.log("=== Connect bitcoind to utreexod")
bitcoind_url = self.bitcoind.p2p_url
- self.bitcoind.rpc.addnode(utreexod_url, command="onetry", v2transport=False)
+ self.bitcoind.rpc.addnode(utreexod_url, command="add", v2transport=False)
self.log("=== Waiting for bitcoind to connect to utreexod...")
self.wait_for_peers_connections(self.bitcoind, self.utreexod)
diff --git a/tests/floresta-cli/getblockhash.py b/tests/floresta-cli/getblockhash.py
index a7debf8..dea6efc 100644
--- a/tests/floresta-cli/getblockhash.py
+++ b/tests/floresta-cli/getblockhash.py
@@ -52,13 +52,13 @@ class GetBlockhashTest(FlorestaTestFramework):
self.log("=== Connect floresta to utreexod")
utreexod_url = self.utreexod.p2p_url
- self.florestad.rpc.addnode(utreexod_url, command="onetry", v2transport=False)
+ self.florestad.rpc.addnode(utreexod_url, command="add", v2transport=False)
self.log("=== Waiting for floresta to connect to utreexod...")
self.wait_for_peers_connections(self.florestad, self.utreexod)
self.log("=== Connect bitcoind to utreexod")
- self.bitcoind.rpc.addnode(utreexod_url, command="onetry", v2transport=False)
+ self.bitcoind.rpc.addnode(utreexod_url, command="add", v2transport=False)
self.log("=== Waiting for bitcoind to connect to utreexod...")
self.wait_for_peers_connections(self.bitcoind, self.utreexod)
diff --git a/tests/floresta-cli/gettxout.py b/tests/floresta-cli/gettxout.py
index de63d93..8f23c4c 100644
--- a/tests/floresta-cli/gettxout.py
+++ b/tests/floresta-cli/gettxout.py
@@ -70,13 +70,13 @@ class GetTxoutTest(FlorestaTestFramework):
self.log("=== Connect floresta to utreexod")
utreexod_url = self.utreexod.p2p_url
- self.florestad.rpc.addnode(utreexod_url, command="onetry", v2transport=False)
+ self.florestad.rpc.addnode(utreexod_url, command="add", v2transport=False)
self.log("=== Waiting for floresta to connect to utreexod...")
self.wait_for_peers_connections(self.florestad, self.utreexod)
self.log("=== Connect bitcoind to utreexod")
- self.bitcoind.rpc.addnode(utreexod_url, command="onetry", v2transport=False)
+ self.bitcoind.rpc.addnode(utreexod_url, command="add", v2transport=False)
self.log("=== Waiting for bitcoind to connect to utreexod...")
self.wait_for_peers_connections(self.bitcoind, self.utreexod)
diff --git a/tests/floresta-cli/ping.py b/tests/floresta-cli/ping.py
index d8ac84f..b78f99a 100644
--- a/tests/floresta-cli/ping.py
+++ b/tests/floresta-cli/ping.py
@@ -21,7 +21,7 @@ class PingTest(FlorestaTestFramework):
self.run_node(self.bitcoind)
# Connect floresta to bitcoind
- self.florestad.rpc.addnode(self.bitcoind.p2p_url, "onetry")
+ self.florestad.rpc.addnode(self.bitcoind.p2p_url, "add")
time.sleep(1)
diff --git a/tests/florestad/reorg-chain.py b/tests/florestad/reorg-chain.py
index 2162c81..d37e917 100644
--- a/tests/florestad/reorg-chain.py
+++ b/tests/florestad/reorg-chain.py
@@ -41,7 +41,7 @@ class ChainReorgTest(FlorestaTestFramework):
self.log("=== Connect floresta to utreexod")
self.florestad.rpc.addnode(
- self.utreexod.p2p_url, command="onetry", v2transport=False
+ self.utreexod.p2p_url, command="add", v2transport=False
)
self.log("=== Waiting for floresta to connect to utreexod.rpc...") |
e6a0729 to
7a6ac6e
Compare
|
7a6ac6e I added the |
|
It's way more reliable now. I did get a crash, but after hundreds of runs. I'll review ASAP, but I think it's pretty much ready |
tests/floresta-cli/disconnectnode.py
Outdated
| ) | ||
| # Call `disconnectnode` with an invalid `node_address` (wrong port). | ||
| node_address = f"127.0.0.1:{self.bitcoind.get_port('p2p') + 1}" | ||
| node_address = f"{self.bitcoind.p2p_url}1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could generate invalid ports. The test needs a valid IP/port that doesn't have any associated peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted it to the way it was before; for this part I only split the new information we have to obtain the port and the IP separately.
tests/floresta-cli/disconnectnode.py
Outdated
| ) | ||
| # Call `disconnectnode` with an invalid `node_address` (wrong IP address: 127.0.0.2). | ||
| node_address = f"127.0.0.2:{self.bitcoind.get_port('p2p')}" | ||
| node_address = f"1{self.bitcoind.p2p_url}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can keep the same ip as in the original test. This IP is invalid (the test requires a valid ip that doesn't have an associated peer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted to the original state to keep the test as it was.
tests/floresta-cli/disconnectnode.py
Outdated
| ) | ||
| # Call `disconnectnode` with an invalid `node_address` (wrong IP address). | ||
| node_address = f"127.0.0:{self.bitcoind.get_port('p2p')}" | ||
| node_address = f"1.{self.bitcoind.p2p_url}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted to the original state to keep the test as it was.
This refactor introduces a new architecture for the test framework, separating responsibilities across classes to improve maintainability and clarity: - Daemon classes (`BitcoinDaemon`, `FlorestaDaemon`, `UtreexoDaemon`) are responsible for instantiating and configuring nodes, managing the necessary flags and settings to run a node. - RPC classes (`BitcoinRPC`, `FlorestaRPC`, `UtreexoRPC`) handle only RPC-related functionality, such as managing connections and performing requests. - The `Node` class is now responsible for controlling the instantiation of a node by assembling the required components (Daemon and RPC) and managing their interactions. This reorganization also fixes a bug in port handling by explicitly defining RPC and P2P ports in the configurations, which are passed directly to the `Node` class. Additional changes: - Introduced `ConfigP2P`, `ConfigRPC`, `ConfigElectrum`, and `ConfigTls` classes for structured configuration handling. - Simplified `BaseDaemon` class and `BaseRPC` class by removing the metaclass and enforcing method implementations through abstract methods. - Added the ability to update the node's configurations (RPC, P2P, Electrum) dynamically when `static_values` is set to False. This ensures that if a node fails to start due to issues like port conflicts, the configurations can be regenerated and retried automatically.
Moved RPC methods that are common across `UtreexoRPC`, `FlorestaRPC`, and `BitcoinRPC` to the `BaseRPC` class to reduce code duplication and centralize shared functionality.
Removed `TestSslFailInitialization` from `florestad/tls-fail.py` as it does not test Floresta's behavior. The test only validates `ElectrumClient` connection handling, which is unrelated to Floresta's functionality.
The `Node` class now initializes an `ElectrumClient` instance internally, simplifying its usage and removing the need to instantiate it externally.
- Moved Node class and NodeType enum to a new file `node.py` for better organization. - Introduced Utility class in `util.py` for common helper functions, including directory and port management. - Updated all test files to import Node and Utility from their new locations. - Replaced direct calls to utility functions in the test framework with calls to the new Utility class methods.
7a6ac6e to
1eeb017
Compare
…ork` and `Node` - Introduced the `connect_nodes` method in `FlorestaTestFramework`: - Responsible for connecting two peers to each other. - Verifies if the peers are successfully connected. - Added the `connect_node` method to the `Node` class: - Allows the current node to connect to a specified peer. - Improved the `wait_for_peers_connections` method: - For the first 10 attempts, it uses a 1-second sleep between retries. - After 10 attempts, it switches to a 2-second sleep to allow more time for peers to process connections. - Added logging to display the connection status of each peer, making it easier to debug connection issues.
The updated logic now verifies that all nodes are at the same block height and includes a specific check to confirm that Floresta has completed its Initial Block Download (IBD) process.
…n node startup - Added a check to ensure that a node's RPC interface is ready to receive requests after startup. - Specifically for Floresta nodes, added a check to verify that the Electrum server is ready to handle requests when starting for the first time.
Reordered the call to `maybe_open_connection_with_added_peers` to ensure it is the first operation performed during connection setup. This change ensures that the node attempts to connect to manually added peers before performing other validations or attempting connections with other peers.
1eeb017 to
e4aa3d3
Compare
|
e4aa3d3 I added the suggestions from the reviews. |
Davidson-Souza
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK e4aa3d3
Phew, that was plenty of work! Good job @moisesPompilio! Now let's go to #742 🫡
|
ACK e4aa3d3 But i couldn't really attest that the changes presented here does improve on stability, I had to make a few workarounds to make it work on my end, who happens to be very slow. Apparently the problem is regarding sleeps around the code where they dont match the required time for the operating nodes to do certain task and when ending the sleep the nodes arent synced. Im trying to come with a fix for that but just some sketchs. (Also not sure if the fix are present in another parallel pr so ill watch after it too.) Overall, the changes makes sense and improves the framework. |
Is the issue caused by sleeps in general, or specifically by sleeps tied to blockchain synchronization (for example, during block mining)? |

Description and Notes
The test framework for Floresta functional tests has been restructured to better segregate responsibilities and simplify the overall architecture. Previously, logic for node instantiation, configuration, and RPC handling was scattered across multiple code locations, making maintenance and understanding the code flow difficult.
Key architectural changes:
BitcoinDaemon,FlorestaDaemon,UtreexoDaemon) handle node instantiation, startup, and are solely responsible for managing all relevant configuration flags (network, RPC, P2P, Electrum, etc.), including connection URLs and port assignments.BitcoinRPC,FlorestaRPC,UtreexoRPC) are now focused exclusively on RPC communications. Shared RPC functionality common to all node types was centralized in theBaseRPCclass to reduce duplication and ease maintenance.Nodeclass orchestrates node instantiation, assembling Daemon and RPC as needed, and now handles its own internalElectrumClientinitialization for easier Electrum interactions inside tests.New configuration classes (
ConfigP2P,ConfigRPC,ConfigElectrum,ConfigTls) provide structured and customizable handling of different node aspects. For example, custom P2P configuration is now as simple as providing a specific configuration instance, improving test readability and extensibility.Additional improvements:
tls-fail.pywas removed as it did not test Floresta functionality, but only ElectrumClient connection handling, which is out of scope for this test suite.A significant bug was addressed by centrally setting node ports in the configuration stage, removing the need to wait for nodes to start in order to detect port usage. P2P, RPC, and Electrum ports are now set in advance and known to all test framework components, eliminating port detection issues.
How to verify the changes you have done?
tls-fail.py) is no longer present.BaseRPCclass, reducing repetition and simplifying common method updates.