Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convertFlappingRows(): fix foreign key error history -> flapping_history #554

Merged
merged 3 commits into from
Jul 31, 2023

Conversation

Al2Klimov
Copy link
Member

@Al2Klimov Al2Klimov commented Dec 19, 2022

Don't INSERT IGNORE everything, then UPSERT everything. Instead INSERT IGNORE flapping_history, UPSERT flapping_history and finally INSERT IGNORE history.

Refactor the converters so they can signal the caller to behave not as usual
i.e. not to INSERT IGNORE everything, then UPSERT everything.
Don't INSERT IGNORE everything, then UPSERT everything. Instead INSERT IGNORE
flapping_history, UPSERT flapping_history and finally INSERT IGNORE history.
@Al2Klimov Al2Klimov changed the title migrateOneType(): allow multiple stages of INSERT IGNORE, UPSERT convertFlappingRows(): fix foreign key error history -> flapping_history Dec 19, 2022
@Al2Klimov Al2Klimov marked this pull request as ready for review January 2, 2023 11:21
@Al2Klimov Al2Klimov removed their assignment Jan 2, 2023
@julianbrost
Copy link
Contributor

How can this be reproduced/tested?

@Al2Klimov
Copy link
Member Author

The only explanation for the second commit I found so far is:

You've lost the flapping start event somehow.

@julianbrost
Copy link
Contributor

Didn't trigger the bug with the following flapping history in the IDO for me (1000 should be start and 1001 be end, so the sequence is end, start, end, end on the same object):

MariaDB [icingaido]> select * from icinga_flappinghistory order by event_time;
+--------------------+-------------+---------------------+-----------------+------------+-------------+---------------+-----------+----------------------+---------------+----------------+--------------+---------------------+--------------------+
| flappinghistory_id | instance_id | event_time          | event_time_usec | event_type | reason_type | flapping_type | object_id | percent_state_change | low_threshold | high_threshold | comment_time | internal_comment_id | endpoint_object_id |
+--------------------+-------------+---------------------+-----------------+------------+-------------+---------------+-----------+----------------------+---------------+----------------+--------------+---------------------+--------------------+
|                  3 |           1 | 2021-10-26 14:36:48 |          865855 |       1001 |           1 |             1 |       408 |                   21 |            25 |             30 | NULL         |                   0 |                239 |
|                  4 |           1 | 2022-05-30 08:30:43 |          746691 |       1000 |           0 |             1 |       408 |                 32.5 |            25 |             30 | NULL         |                   0 |                239 |
|                  5 |           1 | 2022-06-07 09:06:13 |          434294 |       1001 |           1 |             1 |       408 |                   21 |            25 |             30 | NULL         |                   0 |                239 |
|                  7 |           1 | 2022-11-14 10:09:03 |          573515 |       1001 |           1 |             1 |       408 |                 22.3 |            25 |             30 | NULL         |                   0 |                239 |
+--------------------+-------------+---------------------+-----------------+------------+-------------+---------------+-----------+----------------------+---------------+----------------+--------------+---------------------+--------------------+

This generated a single row in flapping_history (and the corresponding two rows in history), where the timestamps are 2022-05-30 08:30:43 and 2022-06-07 09:06:13, i.e. the flapping period from flappinghistory_id 4 to 5, so it looks like the ones without a start have just been ignored.

idb_migration=# select * from flapping_history;
-[ RECORD 1 ]--------------+-------------------------------------------
id                         | \xdceb3d4282ee4f5b72bc66547570db556237ee8c
environment_id             | \x6cc4013ece4ffdbb275a4a763460507d17203146
endpoint_id                | 
object_type                | service
host_id                    | \x2bea57085609682477bde443bff3e4081558cd15
service_id                 | \x52cb639f147b0f429b3c3a18fb19529d50c81816
start_time                 | 1653899443746
end_time                   | 1654592773434
percent_state_change_start | 32.5
percent_state_change_end   | 21
flapping_threshold_low     | 25
flapping_threshold_high    | 30

@Al2Klimov
Copy link
Member Author

With or without my changes here? If without, what happens with and vice-versa? Did you compare with latest master or the merge base?

@julianbrost
Copy link
Contributor

I tested the current master branch.

@julianbrost julianbrost added this to the 1.2.0 milestone Jul 26, 2023
@Al2Klimov
Copy link
Member Author

What you described above IS the bug. The foreign key error does this due to INSERT IGNORE.

@julianbrost
Copy link
Contributor

What exactly is the bug? #553 mentions an explicit foreign key error being shown, which I didn't see.

(I just noticed I tested PostgreSQL while the bug report mentions MariaDB and shows a corresponding error message, so I'll retry with that, maybe that makes a difference, but I doubt it.)

@julianbrost
Copy link
Contributor

Key to reproducing the issue: set ido.from to a time while a checkable was flapping (see #553 (comment) for more details).

The error goes away with this PR (I didn't check much more so far though).

Comment on lines 430 to 457
for _, op := range []struct {
kind string
data [][]contracts.Entity
streamer func(context.Context, <-chan contracts.Entity, ...icingadb.OnSuccess[contracts.Entity]) error
}{{"INSERT IGNORE", inserts, idb.CreateIgnoreStreamed}, {"UPSERT", upserts, idb.UpsertStreamed}} {
for _, table := range op.data {
if len(table) < 1 {
continue
}

ch := make(chan contracts.Entity, len(table))
for _, row := range table {
ch <- row
}

close(ch)

if err := op.streamer(context.Background(), ch); err != nil {
log.With("backend", "Icinga DB", "op", op.kind, "table", utils.TableName(table[0])).
Fatalf("%+v", errors.Wrap(err, "can't perform DML"))
for _, stage := range stages {
for _, op := range []struct {
kind string
data [][]contracts.Entity
streamer func(context.Context, <-chan contracts.Entity, ...icingadb.OnSuccess[contracts.Entity]) error
}{
{"INSERT IGNORE", stage.inserts, idb.CreateIgnoreStreamed},
{"UPSERT", stage.upserts, idb.UpsertStreamed},
} {
for _, table := range op.data {
if len(table) < 1 {
continue
}

ch := make(chan contracts.Entity, len(table))
for _, row := range table {
ch <- row
}

close(ch)

if err := op.streamer(context.Background(), ch); err != nil {
log.With("backend", "Icinga DB", "op", op.kind, "table", utils.TableName(table[0])).
Fatalf("%+v", errors.Wrap(err, "can't perform DML"))
}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop triggered my "I think I can make it look nicer" instinct, I played around in my editor a bit and ended up in a state where I could basically type git commit, so please have a look at 5bdaaa3 and let me know what you think.

(It should basically do the same as before, but I'm still waiting for the filling cache stage with the current code, so consider that commit untested for now.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not bad on first sight, feel free to make a PR into this one (to be rebase-and-merged to avoid additional merge commits).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or into master w/ this one included. I'd prefer this if your complaint doesn't actually block anything here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preferred way would be that if you approve of that change, you just manually push my commit to this PR (so more similar to the typical "include proposed changes from a review" workflow).

Not bad on first sight, feel free to make a PR into this one (to be rebase-and-merged to avoid additional merge commits).

That would leave a weird merged PR that wasn't actually merged into any "productive" branch, i.e. show up in "merged PRs without milestone" searches and so on.

... or into master w/ this one included. I'd prefer this if your complaint doesn't actually block anything here.

That would mean some merge/approval with "but actually I didn't look at this code in detail because (hopefully) it will be replaced anyways".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preferred way would be that if you approve of that change, you just manually push my commit to this PR (so more similar to the typical "include proposed changes from a review" workflow).

Yeees... but... this isn't just a typo, you know?

That would leave a weird merged PR that wasn't actually merged into any "productive" branch, i.e. show up in "merged PRs without milestone" searches and so on.

No, I think it wouldn't, e.g.

is on the 2.14 milestone, but not in the changelog. I think such and buildfix PRs can also be in the changelog in the future, possibly starting with

which is admittedly a bad example as it touches an out-of-changelog file, but you get the point.

That would mean some merge/approval with "but actually I didn't look at this code in detail because (hopefully) it will be replaced anyways".

Huh? You didn't look at this code in detail, and/but replaced it?

Anyway.

If you really don't want an additional PR, feel free to push on top of my branch, I have just a few non-critical things about the unit tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeees... but... this isn't just a typo, you know?

Reviews often include things way more complex than typos.

That would leave a weird merged PR that wasn't actually merged into any "productive" branch, i.e. show up in "merged PRs without milestone" searches and so on.

No, I think it wouldn't, e.g.

is on the 2.14 milestone, but not in the changelog.

Well, I already didn't like that PR at that time but you merged it faster than I could say please don't. The main reason I don't like this is that typically, if you see that a PR is merged, you expect that the change was included in the software. But that's only the case if you merge it into master/support/*. If it's merged into another branch of some PR, and that PR is discarded, the other PR would still show as merged even though the change never made it into the software.

Huh? You didn't look at this code in detail, and/but replaced it?

I replaced it with simpler code that does what I'd expect to happen there and what I hope your code does. If my code does something different than your code, please tell me.

Anyway.

If you really don't want an additional PR, feel free to push on top of my branch, I have just a few non-critical things about the unit tests.

Is that anything complex that would need the review tools from GH PRs? I mean I can create a PR to discuss it, I'd just want to avoid having it show up as merged in the end.

pkg/utils/utils_test.go Outdated Show resolved Hide resolved
})

t.Run("Empty", func(t *testing.T) {
ch := ChanFromSlice([]int{})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice []T{}!=nil. But could this change in theory so we should use make([]T,0,1) as in TestBinary_MarshalJSON, just to be sure to cover both nil and empty?

Wait! I think we should use make([]T,0,1) anyway as it's the absolute corner case of an empty slice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But could this change in theory

Could it? Looks to be part of the language specification rather than a current implementation detail:

Note that the zero value for a slice or map type is not the same as an initialized but empty value of the same type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly (luckily!) you're right.

Wait! I think we should use make([]T,0,1) anyway as it's the absolute corner case of an empty slice.

But I think we should do this anyway as cap([]int{})==0 in contrast to make(). Just to provoke the tested code a bit (which is one of the purposes of unit tests).

pkg/utils/utils_test.go Show resolved Hide resolved
This commit simplifies the `icingaDbOutputStage` type to contain only one
entity slice to be insert/upsert. This allows to simplify the handling in
`migrateOneType()` by removing nested loops.

Additionally, a bit of code inside that function is outsourced into a new
`utils.ChanFromSlice()` function. This makes the body of the loop over the
insert/upsert operation (the loop using the `op` variable) simple enough so
that it can just be unrolled which saves the inline struct and slice definition
for that loop.
Copy link
Member Author

@Al2Klimov Al2Klimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing critical left. Change as above or merge if you're happy with it.

@julianbrost julianbrost merged commit 336ee4a into master Jul 31, 2023
31 checks passed
@julianbrost julianbrost deleted the 553 branch July 31, 2023 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

icingadb-migrate: can't insert because key constraint fails "fk_history_flapping_history"
2 participants