Skip to content

Conversation

mjh1
Copy link
Contributor

@mjh1 mjh1 commented Oct 13, 2025

Since these two pieces of code were doing similar things, move capacity reporting into db_discovery, more importantly this also addresses the issue we have currently where we only track idle containers in the metrics as we were getting an OrchestratorCapped error if the O was busy.

@github-actions github-actions bot added go Pull requests that update Go code AI Issues and PR related to the AI-video branch. labels Oct 13, 2025
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 28.98551% with 49 lines in your changes missing coverage. Please review.
✅ Project coverage is 31.74015%. Comparing base (1bc682a) to head (040ebde).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
monitor/census.go 3.33333% 29 Missing ⚠️
discovery/db_discovery.go 48.71795% 19 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #3778         +/-   ##
===================================================
+ Coverage   31.68993%   31.74015%   +0.05022%     
===================================================
  Files            158         158                 
  Lines          47564       47536         -28     
===================================================
+ Hits           15073       15088         +15     
+ Misses         31603       31560         -43     
  Partials         888         888                 
Files with missing lines Coverage Δ
cmd/livepeer/starter/flags.go 0.00000% <ø> (ø)
cmd/livepeer/starter/starter.go 22.02618% <ø> (-0.01254%) ⬇️
core/livepeernode.go 60.22099% <ø> (ø)
discovery/discovery.go 82.72727% <ø> (+11.40652%) ⬆️
server/ai_session.go 6.63265% <ø> (-0.57665%) ⬇️
discovery/db_discovery.go 66.93333% <48.71795%> (-2.20614%) ⬇️
monitor/census.go 59.58702% <3.33333%> (-0.35363%) ⬇️

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7dff847...040ebde. Read the comment docs.

Files with missing lines Coverage Δ
cmd/livepeer/starter/flags.go 0.00000% <ø> (ø)
cmd/livepeer/starter/starter.go 22.02618% <ø> (-0.01254%) ⬇️
core/livepeernode.go 60.22099% <ø> (ø)
discovery/discovery.go 82.72727% <ø> (+11.40652%) ⬆️
server/ai_session.go 6.63265% <ø> (-0.57665%) ⬇️
discovery/db_discovery.go 66.93333% <48.71795%> (-2.20614%) ⬇️
monitor/census.go 59.58702% <3.33333%> (-0.35363%) ⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mjh1 mjh1 marked this pull request as ready for review October 15, 2025 19:23
@mjh1 mjh1 changed the title Test out db discovery Move the capacity reporting into db_discovery Oct 15, 2025
@mjh1 mjh1 requested review from ad-astra-video and leszko October 15, 2025 19:28
Comment on lines +396 to +402
// Only update network capabilities every 25 minutes
if time.Since(dbo.lastNetworkCapabilitiesReported) >= networkCapabilitiesReportingInterval {
// Save network capabilities in LivepeerNode
dbo.node.UpdateNetworkCapabilities(orchNetworkCapabilities)

dbo.lastNetworkCapabilitiesReported = time.Now()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we always send network capabilities?

Copy link
Contributor Author

@mjh1 mjh1 Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to leave the existing behaviour alone, so only sending to kafka every 25 mins, wdyt @ad-astra-video ? Going from 25 mins to 10 seconds seems wrong :)

cfg.LiveAIAuthWebhookURL = fs.String("liveAIAuthWebhookUrl", "", "Live AI RTMP authentication webhook URL")
cfg.LivePaymentInterval = fs.Duration("livePaymentInterval", *cfg.LivePaymentInterval, "Interval to pay process Gateway <> Orchestrator Payments for Live AI Video")
cfg.LiveOutSegmentTimeout = fs.Duration("liveOutSegmentTimeout", *cfg.LiveOutSegmentTimeout, "Timeout duration to wait the output segment to be available in the Live AI pipeline; defaults to no timeout")
cfg.LiveAICapRefreshModels = fs.String("liveAICapRefreshModels", "", "Comma separated list of models to periodically fetch capacity for. Leave unset to switch off periodic refresh.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that we need to be careful with deploying this change because if we have this flag configured in infra the gateway will fail to start because the flag does not exist anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's a pain, i guess maybe i could leave it in but not use it, then come along after the prod deploy and remove it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Issues and PR related to the AI-video branch. go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants