Skip to content

Aperture v0.8.0

Compare
Choose a tag to compare
@hdkshingala hdkshingala released this 31 Oct 09:59
77ac2c2

Changelog

List of aperture PRs merged since 0.7.0 release. For the full list of changes, see list of changes

Revamp workload and flux meter metrics and labels (#843)

Description of change

  • New label attribute_found in FluxMeter to denote if the attribute on
    which the flux meter is based was found in the access log/span
  • Removed label decision_type on summary workload_latency_ms since
    it is now emitted only if response was received.
  • New counter workload_requests_total to measure the workload
    decisions count since the summary does not take into account the
    scenarios where response is not received e.g. rejects or connection
    resets.
  • A new column response_received on OLAP Flow events to denote the
    case when response is not received.

Ignore negative workload latency (#839)

Issue

  • Workload latency in case of Envoy is calculated as:
workload_latency = response_latency - aperture_latency
  • Workload Latency can become negative in case of connection reset
  • If the connection is aborted by Client or Server Envoy immediately
    terminates the connection for the other endpoint.
  • In the Access Log, status code is set as 0 and response_latency is
    set as zero.
  • If Authz call to Aperture Agent had succeeded for this request, then
    aperture_latency is greater than zero.
    • This would lead the workload_latency to be computed as negative.
      Screenshot from 2022-10-28 19-24-44

Fix

  • Ignore negative workload latency I.E. don't populate the workload
    latency column
  • Publish Prometheus metrics for flux-meter or workload latency only if
    the metric column is found

TickInfo in LoadDecision (#836)

Description of change

  • Put TickInfo in LoadDecision` to re-trigger fill-rate evaluation at
    Agent.

Re-structure protos (#831)

Fix telemetry labels propagation (#835)

Description of change

This fixes regression introduced in
#828.

Dynamic Telemetry Flow Labels were added before labels filtering, which
led them to be incorrectly filtered out.

Fix telemetry labels propagation (#835)

Description of change

This fixes regression introduced in
#828.

Dynamic Telemetry Flow Labels were added before labels filtering, which
led them to be incorrectly filtered out.

Bump OTEL to 0.63.0 (#834)

Description of change

Bumps OTEL and FN OTEL to 0.63.0. This removes Istio 1.15 compat hack as
it is included in the upstream OTEL.

Response status in telemetry (#828)

Description of change

This introduces aperture.response_status column in telemetry. It
mirrors the implementation of response_status label for metrics.
This also extends above logic to include 1xx, 2xx, and 3xx codes
as OK instead of only 2xx codes.

Besides this, some cleanup is done:

  1. Above logic is moved from FluxMeter to OTEL package. This changes
    FluxMeter interface!
  2. A log of logic is moved from metricsprocessor to
    metricsprocessor/internal for better visibility and easier separation
    of functions which are called directly in metricsprocessor and helpers,
  3. The above made creating UT much easier, so this PR also includes
    some.

Ref: fluxninja/cloud#6788

Dry run mode for Load Actuator (#826)

Description of change

  • Dry run mode for Load Actuator. No traffic can get dropped due to this
    Load Actuator in this mode. Useful for observing the behavior of Load
    Actuator without any disruptions.
  • Load Actuator has a new Pass through mode
  • Default to Pass through mode in case multiplier is invalid and also
    when there is no decision available at the Agent including
    initialization

Rollup based on metrics (#821)

Closes: GH-515

docs: playground doc updates (#819)

Description of change

  • Moved demo_app to playground
  • Added more details to playground documentation
  • Bump istio and other tools