Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort HA Realization Logic After Timeout #800

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Commits on Oct 25, 2024

  1. icingadb: Unify select cases for derived contexts

    The main loop select cases for hactx.Done() and ctx.Done() were unified,
    as hactx is a derived ctx. A closed ctx case may be lost as the hactx
    case could have been chosen.
    oxzi committed Oct 25, 2024
    Configuration menu
    Copy the full SHA
    a587870 View commit details
    Browse the repository at this point in the history
  2. HA: Increase log level for heartbeats from the future

    Timing issues may be the root of future failures. Thus, it is important
    to be aware if the timing seems to be out of sync.
    oxzi committed Oct 25, 2024
    Configuration menu
    Copy the full SHA
    3f69f98 View commit details
    Browse the repository at this point in the history
  3. HA: Deferred SQL Transaction Rollback

    Each transaction is created within the retryable function, but this
    function may be exited prematurely before committing. A deferred
    rollback ensures that the transaction will be rolled back and cleaned up
    in this case, or will be a noop when performed after the commit.
    oxzi committed Oct 25, 2024
    Configuration menu
    Copy the full SHA
    f881920 View commit details
    Browse the repository at this point in the history
  4. HA: Insert environment within retryable function

    The HA.insertEnvironment() method was inlined into the retryable
    function to use the deadlined context. Otherwise, this might block
    afterwards, as it was used within HA.realize(), but without the passed
    context.
    oxzi committed Oct 25, 2024
    Configuration menu
    Copy the full SHA
    dd0ca8f View commit details
    Browse the repository at this point in the history
  5. HA/Heartbeat: Use last message's timestamp

    Since the retryable HA function may be executed a few times before
    succeeding, the inserted heartbeat value will be directly outdated. The
    heartbeat logic was slightly altered to always use the latest heartbeat
    time value.
    oxzi committed Oct 25, 2024
    Configuration menu
    Copy the full SHA
    c2d8bd6 View commit details
    Browse the repository at this point in the history
  6. HA: Abort Transaction Commit after Timeout

    A strange HA behavior was reported in #787, resulting in both instances
    being active.
    
    The logs contained an entry of the previous active instance exiting the
    HA.realize() method successfully after 1m9s. This, however, should not
    be possible as the method's context is deadlined to a minute after the
    heartbeat was received.
    
    However, as it turns out, executing COMMIT on a database transaction is
    not bound to the transaction's context, allowing to survive longer. To
    mitigate this, another context watch was introduced. Doing so allows
    directly handing over, while the other instance can now take over due to
    the expired heartbeat in the database.
    oxzi committed Oct 25, 2024
    Configuration menu
    Copy the full SHA
    8b95d25 View commit details
    Browse the repository at this point in the history