forked from greenplum-db/gpbackup-archive
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync 1.29.1 changes #33
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
To support specific use cases, currently mainly gpcopy, if the correct flags are provided gpbackup should place its timestamp+pid lockfile in the backupdir, instead of in /tmp. This is to facilitate the highly-parallel metadata-only backups that gpcopy uses when dumping metadata. The flags required to elicit this behavior are: metadata-only, no-history, and backup-dir.
We're updating our CI to validate against the newest version of datadomain. Update vault vars to new point to the new instances.
This test relies on a flag that was only added to gpbackup 1.28.0, so skip if we're testing with an older version.
Releng team changed some "rhel" artifacts to "el" to reflect more general approach to RHEL binary compatibility. Update our CI to pull from the correct locations.
We test most of our plugin functionality in the gpbackup CI, so add a test to our plugin suite to cover the delete_replica function added to our ddboost plugin.
We have historically written out history records only at the end of the process, and in many failure cases this means no record at all is available. This commit restructures to write out the full history entry very early on, but leaves Status as "In progress" until the cleanup phase, where it will be updated to show whether the backup was successful or a failure
Our current cleanup mechanisms rely on a recover() call to pick up FatalOnError. However, this mechanism does not work if the panic is called from a separate goroutine. As such, our cleanup has some gaps in it that need to be plugged up. We identify four cases where a panic can be invoked from a goroutine. In these cases we set up a channel for bubbling the panics up to the main process, and invoke gplog.Fatal() at the earliest opportunity. We structure the defers such that waitgroups are still marked Done() even in case of a panic, so that we can keep the deadlock protection they offer. In the future any we will avoid invoking panic() from within goroutines, so that this handling will not be necessary.
GPDB7 has removed support for QuickLZ. Any tables backed up with QuickLZ will cause an error on restore, unless this GUC is set. This GUC will allow the server to silently fall back to using zstd, and allow for successful restores.
Previously our CI plugin tests had been calling noops for the cleanup step. Replace these with calls to delete_directory where possible. An additional cleanup job will be scheduled to run occasionally to handle things that are not easily cleaned up from plugin storage in this manner.
Currently our plugin jobs leave behind a lot of data and don't clean it up at all. Prior commit added some in-test cleanup, but it still leaves behind a fair bit. Add a CI job that will run weekly and drop all contents from our plugin storage. Note that this requires the job to be in a pipeline sitting around for a week. This may not be the case for short-lived dev pipelines. Might be worth considering making a skinny pipeline to leave sitting in the DP concourse so that the DP storage also gets cleaned automatically.
A previous change to this code restructured how we track which tables are being backed up, and their status. This led to the case where, if a piped copy command failed, the error was captured but never displayed. Instead we'd spin forever on an "Unknown" state in our sync.Map, causing permanent hangs. This commit restores the original behavior, where if a pipe fails the backup of the table is marked "Complete" in the sync.map, and the error is handled correctly downstream
Given the issues with this function misbehaving, it needs better coverage. This commit adds some basic test coverage of it. We also tweak how FilePathInfo is structured, so that it is simpler to mock out where data is copied out to during backup.
If tables are included in a backup set that inherit from, or are inherited by, other tables and those other parent or child tables are not included in the backup set, gpbackup should ignore those inheritance links when constructing CREATE TABLE statements for included tables. This commit also adds a --no-inherit flag, which will cause filtered backups to ignore child tables of included tables to preserve the old behavior.
The syntax and output of "go test cover" has changed since this script was originally written, so this commit updates those to maintain the nice formatted script output we expect. It also adds the rest of the packages with unit test files to the list of those for which coverage is generated.
Some of our CI systems have a different "not found" error for this test than others. Expect a more general message to keep CI happy.
The recent local-lockfile changes neglect to consider that the target directory may not be created before running gpbackup, and occurs before gpbackup would normally make the backup-dir directories. In this instance, make the required directory on COORDINATOR so that lockfile can be taken.
Gppkg is deprecated, and has been removed from GPDB7 CI images. Our CI needs to be updated to use the new version, which is kept in a separate repo. This commit adds logic to build GPDB7 gppkg's with the new gppkg version, and also conditionally checks for those gppkgs when running our GPDB7 test jobs in CI. Also, we pin the ccp_src version for now. A recent change broke testing on Ubuntu 18.04, and we still need to test on that.
If we're performing a normal restore of a backup taken with a version below 1.26.0, the SegmentCount value is missing from the config file and so it is set to 0. This can causes problems down the line, so this commit sets it to be equal to the current cluster size and adds some tests to catch similar issues in the future.
Previous changes in support distributed snapshots for backing up data introduced bugs in how we handle errors when piping copy commands to the server. A prior attempt at fixing this was insufficient, so here we overhaul how the goroutines doing distributed data backup inform of errors, and how those errors are handled. Also, some changes are necessary to our integration test suite to support testing of this code path. Specifically, the distributed code path is only invoked if there are multiple connections to the cluster available. So, we update to run all our tests with two connections open.
Recent change missed this one, add it.
It seems that Concourse does not enforce isolation between tasks in the same job. This means that the build_binaries tasks were having a bug where the go_components.tar.gz file pulled in for each platform was non-deterministic, and we were getting GLIBC version errors because the gppkgs job was, for example, building the RHEL7 gppkg with the RHEL8 binaries. This commit refactors our build_binaries step so that there is a separate job for each platform, and the correct binaries will land where expected.
The recent gppkg changes missed a step when updating the regression pipeline. It requires the input to be called out in the task yaml for it to appear in the docker filesystem.
We've had issues in our CI with dummy_seclabel not getting installed, which causes several downstream tests to fail but only after they've run for a long time. Since we don't have complete control over gpconfig, try to bypass it. It's only tweaking the postgressql.conf files, simple enough to do locally with invoking ssh.
This test has been a bit flaky in our CI. This is being addressed in 2 ways. First, we're going to introduce usage of the FlakeAttempts test decorater to automatically retry to reduce red jobs. Second, restructure the transaction rollback/commit to use defer so that subsequent tests aren't broken by having an orphan transaction left over.
The timeout added when doing restore can occasionally cause failures of otherwise successful restores, in cases where the COPY commands from server lag too far behind the writer opening its half of the pipe. A full RCA of the causes and implications of this is not complete, but for now a smarter timeout helps resolve errors seen in CI. The scheme put in place is an exponential backoff approach, starting at 10ms and scaling up to about 5m27s, for a total of 15 retries and about 12 minutes. Logs of the retry delay begin after the 10s wait, to help keep users informed of what's happening.
This catalog column was not present in upstream and maybe not very useful so it was removed from the catalog in GP7. The grammar will still be accepted in GP7 but ignored.
Also fix pipeline path for output to use the current directory rather than a hardcoded location that may not be accurate
These are supposed to go inside spawned goroutines, not outside. Move it.
In GPDB7 gp_toolkit was refactored to be an extension. Some of our extension tests trip over this change, so this commit updates them as needed.
Add more table rows to ensure stats get generated properly and update expected values as needed.
When including a given table in the filter, we now include that table's parent tables (and their parents, and so on), that table's children, and any other parents of those children, such that all INHERITS statements in any related tables are satisfied. This is a tweak to the behavior introduced in b290ade, so that we don't follow *every* inheritance link and collect any "sibling" tables of the originally-included table that do not share common children with it.
In a scenario where a user is restoring a backup in a read-only backup directory, gprestore will currently fail at the end because it cannot write the restore report file to that directory. This is a legitimate use case, so we should treat this as a warning rather than an error.
We frequently see spurious failures when a single test fails and leaves a role or other object around and thus causes many subsequent tests to fail, so this commit adds DROP statements for the most common offenders to make debugging failed tests much easier.
The CPU_HARD_QUOTA_LIMIT and CPU_SOFT_PRIORITY parameters for resource groups have been respectively renamed to CPU_MAX_PERCENT and CPU_WEIGHT, with no change in underlying functionality, so this commit updates related code accordingly.
The gp_toolkit extension should be installed by default in GPDB 7, but some tests are failing intermittently because it seems not to exist for them. For now, we'll create it manually before running tests to avoid the issue.
GPDB 7 does not currently support distributing by enum columns, so we remove the table with an enum column from the set of objects we use for the end-to-end plugin tests in 7.
As a follow-up to 0896ede, we don't want to fail a restore at the end because an error file can't be written if there is a legitimate reason for that, such as restoring a backup on a read-only filesystem, so we treat that as a warning rather than an error.
Tests failed because greenplum-db#692. Let's wait for reaction, otherwise repack it by ourselves. Alternatively, we can sync 1.28.1 first and mask problem. |
InnerLife0
previously approved these changes
Jul 26, 2023
Merged
Implement --report-dir option for gprestore New option allows to create report files in directory different from --backup-dir. Main code changes were made for filepath class. Now it takes new option as deafult path for reports with fallback to content dir. Unit and end_to_end tests were updated to show mentioned behavior. Cherry-picked-from: ac8731e to reapply above 8bdcb48 and 7346404
Stolb27
dismissed
InnerLife0’s stale review
July 26, 2023 09:33
The merge-base changed after approval.
InnerLife0
approved these changes
Jul 26, 2023
dnskvlnk
approved these changes
Jul 27, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.