-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spanner: Refactor the test fixture. #1425
Comments
Some aspects that we might want to discuss (that I can think off right now)
My thoughts on these: Resource deletion: Keeping resources after tests are done allows using them for diagnosing purposes when tests fail. But it also means that we need to delete stale resources before starting a new test run, and that if we are not very good at that, we hit quotas. Initializing test runs also takes longer because of all the deletions. I'm happy to reconsider this approach and delete temporary resources on test teardown, or maybe on fixture teardown. Fixed vs. temporary resources: Thinking particularly of backups, but also of some DB operations. They take (or at least used to) a long time (some are still being skipped, see #1300 ), so if we can avoid creating temporary resources in those cases, that's preferable. The aim should be to keep running times as low as possible. Test execution ordering: I'm still opposed to executing tests in a certain order, not just because of testing best practices, but also because there's no guarantee that users will execute samples in a given order. We need to make sure that samples work on their own. Splitting the current fixture: Seems like the instance is a good place to do that, as in, each fixture works with a different DB instance? But this is just intuition. |
In addition to the above, I would also like to add the following to the list of discussion points:
Resource deletionWhile it can be useful to keep a resource around after a test run for diagnostic purposes, many of the sample tests are so simple that a failure will normally be easy to diagnose without the resource as well, especially if the single test is quick to run from a development machine. I would say that an approach that prefers deletion of resources after a run is preferable, unless the resource in question has some attributes that make it different from the most common resources, or if we know that it takes a long time to create. In the latter case, it might be more efficient to diagnose a test failure by looking at the resource that was left behind, than by rerunning the test. Also, the deletion should normally be the responsibility of the code that creates is. So my suggestion would be something like:
Fixed resourcesMost types of resources are now quickly created by Cloud Spanner when the resource is small/empty, but backups can still be slow. Tests that need one or more backups should therefore prefer using fixed resources over creating temporary ones. Other resources can be created quickly. Some example times (the below tests use uniquely generated ids for the resources, as Cloud Spanner will otherwise throttle requests to create a new resource with the same name as a recently dropped resource):
Based on the above, I would suggest that test cases should prefer temporary resources for all resource types, except for backups. Test execution orderI completely agree with the above, the tests should whenever possible be independent of execution order, and independent of all other tests. Splitting the fixtureBased on all the above, I would suggest that:
The above would make the tests completely independent of each other, which makes it easier to execute them individually (and also to execute them on the emulator). Parallel executionThe above execution times are (except for extreme circumstances) not affected very much if multiple operations are executed in parallel on different databases. That means that an op-in option for executing (simple) tests in parallel would further reduce the total execution times of the sample tests. Tests that are set up to always create/drop their own temporary database can safely be executed in parallel with other tests that use the same strategy. |
+1 to chatting about Parallel execution and dropping Test exection order since we are in agreement there. Fixed resources (shared across test runs):
Resource deletion and Splitting the fixture (mixing these two as there is much overlap)
Parallel executionI'm fine with allowing parallel execution. If I remember correctly we had some resource overlapped, which should be taken care of once we've agreed on a new approach here, and also because we were hitting some quotas (requests per minute kind of quotas). We might need to fine tune it but that's fine. For reference: https://xunit.net/docs/running-tests-in-parallel |
(@olavloite now I could add you on the assignee list as well, weird why I couldn't at first) |
@amanda-tarafa Thanks for the elaborate points. I think we agree on most (all?) of the above. I'll take a detailed look Thursday or Friday this week. The first half of this week is a little bit hectic. |
Great! No rush at all on my part. This is a nice to have (do) but tests are running pretty consistently so it's fine that we take time to do all this. |
I totally agree with almost all of the above, so I'll just focus on a couple of worries that I have regarding a couple of points (plus one point where my rationale was indeed not really clear):
Yeah, I'm a little bit too used to the Java test environment, where there is no separate fixture, and the setup is part of the test class itself. I think in most cases we mean the same, but my terminology is a little off. The biggest difference is your idea on a shared query fixture, which I really like. I have a couple of small worries/exceptions (see below).
I think that this will work very well for many of the query samples, but probably not all. There are two (maybe three) categories that I worry about:
All the above can easily be worked around by using the 'data modification' fixture instead of the query fixture for the specific sample test that would include any of the above. So my suggestion would be:
I agree that we should try to stick with the strategy that the deletion happens on fixture teardown (i.e. on instance deletion) instead of after each test, as long as that is possible. We might however be closer to that point than we think. The current samples directory contains 79 sample files, and there is a hard limit of max 100 databases per instance on Cloud Spanner. Assuming that many of the samples can use the shared query fixture, we should be good for now, but it is something that we should keep in mind if the number of (non-query) samples continues to grow.
Sorry, I should have included my rationale in that point. The reasoning is that we generally will not keep resources around for inspection in case of test failure, as most resources are quick and easy to create anyways. That means that debugging a failed test is normally easy to do by just running it again. That is not necessarily true for tests that create for example backups, and for those cases it might be useful to keep the resource around after the test has executed, and only delete it at a later time. One additional point regarding the deletion of backups: Cloud Spanner does not allow deleting instances that contain one or more backups. The fixture teardown must therefore first delete all backups that are stored on an instance before it can delete the instance. |
Yes, totally agree. We'll always need to support exceptions for the general case.
Yes, I'm also worried that we might be almost there. I wouldn't like to have to rewrite things again in a few months. Since the limit is per instance, I was wondering if we could just group some other tests in a similar way as will do queries, even if it's not for reusing one database, but just for avoiding the limit. For instance, we can group all the admin tests in one fixture, etc.
Ah yes, I see. I think I'm now leaning towards a cleaner test strategy here and just try and delete everything at teardown. But also, I think that at teardown, we should continue to attempt to delete stale resources, which will allow us to always delete on a best effort basis, and not fail the tests if cleanup failed. So, I think it'll be easier to switch from one strategy to the other if we need to, basically, we will start by doing
Yes, I think we do this right now. |
The existing test fixture is getting unmanageable in both the amount of time it takes to initialize, and the amount of unrelated code it's holding. Also, some decisions that were made at the beginning os Spanner test rewritting might have not been the best, in hindsight, and may be reconsider now.
The general proposal is to have several specialized fixtures, maybe all deriving from a base class.
Let's use this issue to discuss some options before starting any work.
@olavloite FYI (I can't add you as an assignee)
The text was updated successfully, but these errors were encountered: