-
-
Notifications
You must be signed in to change notification settings - Fork 590
docs(metrics): automate usage metrics collection and publish it in the docs site #3495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs(metrics): automate usage metrics collection and publish it in the docs site #3495
Conversation
- Create Go script to query GitHub Code Search API for usage metrics - Add CSV storage for historical data (all versions from v0.13.0 to v0.39.0) - Integrate usage metrics dashboard into MkDocs documentation - Add interactive charts (trend, version comparison, distribution) - Create GitHub Actions workflow for automated weekly collection - Support manual workflow trigger with custom version queries Co-authored-by: mdelapenya <951580+mdelapenya@users.noreply.github.com>
✅ Deploy Preview for testcontainers-go ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds a Usage Metrics system: a Go collector that queries GitHub Code Search, a GitHub Actions workflow to run it monthly (and manually), CSV storage, MkDocs frontend assets (JS/CSS/MD) to visualize metrics, a docs README, build exclusions, and small CI/config tweaks. Changes
Sequence Diagram(s)sequenceDiagram
participant Scheduler as GitHub Scheduler
participant Workflow as Actions workflow
participant Collector as collect-metrics.go
participant GH as GitHub API (gh)
participant CSV as docs/usage-metrics.csv
participant Git as Git (branch/PR)
Scheduler->>Workflow: Trigger (monthly or manual)
Workflow->>Workflow: Checkout + setup Go
Workflow->>Collector: Run with resolved -version flags
Collector->>GH: Code search queries (retry/backoff)
GH-->>Collector: Return counts
Collector->>CSV: Append rows (date, version, count)
Workflow->>Git: Create branch, commit & push CSV
Workflow->>Git: Open PR to main
sequenceDiagram
participant User as Browser
participant Page as usage-metrics page
participant JS as usage-metrics.js
participant CSV as docs/usage-metrics.csv
participant Chart as Chart.js
User->>Page: Load
Page->>JS: Init
JS->>CSV: Fetch CSV
CSV-->>JS: Rows
JS->>JS: Parse & aggregate (by date/version)
JS->>Chart: Render trend, version, latest charts
JS->>Page: Show stats & update time
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
usage-metrics/README.md (1)
7-7: Wrap bare URLs in markdown link syntax.Lines 7, 98, and 99 contain bare URLs that should be wrapped in markdown link format for better compliance and accessibility. For example:
- Line 7: [GitHub Code Search](https://github.com/search?q=...) - Line 98: [Production](https://golang.testcontainers.org/usage-metrics/) - Line 99: [Local](http://localhost:8000/usage-metrics/)Also applies to: 98-98, 99-99
.github/workflows/usage-metrics.yml (1)
32-60: Add defensive quoting to shell variable expansion.The shellcheck warning about SC2086 is valid here. While the current code works (word splitting is intentional), adding quotes provides better robustness against unexpected input:
- go run collect-metrics.go $VERSION_FLAGS -csv "../../docs/usage-metrics.csv" + go run collect-metrics.go $VERSION_FLAGS -csv "../../docs/usage-metrics.csv"Actually, for maximum safety with multiple flags, consider using an array:
# Earlier in script: VERSION_FLAGS=() for version in "${VERSION_ARRAY[@]}"; do version=$(echo "$version" | xargs) if [ -z "$version" ]; then continue fi VERSION_FLAGS+=("-version" "$version") done # Then invoke: go run collect-metrics.go "${VERSION_FLAGS[@]}" -csv "../../docs/usage-metrics.csv"This eliminates word-splitting concerns entirely.
docs/js/usage-metrics.js (1)
64-103: Handle empty datasets to avoid “Latest Version: undefined” UIIf the CSV is ever empty (or filtered to no rows),
versionsbecomes empty, which makeslatestVersionundefinedwhile still rendering the stats grid and latest‑usage doughnut (with zero data). Consider an upfront guard that detectsversions.length === 0, shows a “no data available” message, and skips stats + charts, so the UI degrades more cleanly.Also applies to: 262-321
usage-metrics/collect-metrics.go (2)
142-164: Avoid shell‑constructedgh apicommand; useexec.Commandarguments instead
queryGitHubUsagebuilds a single shell string and runssh -cwith interpolatedendpoint. Even though version strings are expected to be well‑formed tags, this is brittle (quoting/escaping issues) and opens the door to command injection if a malformed version ever sneaks in. Consider switching to something like:cmd := exec.Command( "gh", "api", "-H", "Accept: application/vnd.github+json", "-H", "X-GitHub-Api-Version: 2022-11-28", endpoint, ) output, err := cmd.Output()This keeps arguments properly separated and avoids shell interpretation entirely.
54-96: Optional robustness: dedupe versions before querying and consider UTC dateTwo small, non‑blocking robustness tweaks you might consider in
collectMetrics:
- If callers accidentally pass the same
-versionmultiple times, you’ll query and append duplicate metrics for that version/date. A simplemap[string]struct{}dedupe step before the loop would avoid redundant GitHub calls and duplicate CSV rows.date := time.Now().Format("2006-01-02")uses the host’s local timezone; for scheduled runs in CI you likely want a stable notion of “collection day” (e.g.,time.Now().UTC()) so dates don’t skew if the runner’s locale changes.Neither is critical, but both make behavior a bit more predictable over time.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/usage-metrics.csvis excluded by!**/*.csv
📒 Files selected for processing (10)
.github/workflows/usage-metrics.yml(1 hunks).gitignore(1 hunks)docs/css/usage-metrics.css(1 hunks)docs/js/usage-metrics.js(1 hunks)docs/usage-metrics.md(1 hunks)mkdocs.yml(2 hunks)scripts/changed-modules.sh(2 hunks)usage-metrics/README.md(1 hunks)usage-metrics/collect-metrics.go(1 hunks)usage-metrics/go.mod(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-18T08:24:27.479Z
Learnt from: mdelapenya
Repo: testcontainers/testcontainers-go PR: 3254
File: .github/dependabot.yml:21-21
Timestamp: 2025-09-18T08:24:27.479Z
Learning: In the testcontainers-go repository, submodules like atlaslocal that are part of a parent module (e.g., mongodb) share the same go.mod file and should not have separate Dependabot entries. They are already monitored through the parent module's Dependabot configuration entry.
Applied to files:
usage-metrics/go.modusage-metrics/README.mdmkdocs.yml
🪛 actionlint (1.7.8)
.github/workflows/usage-metrics.yml
32-32: shellcheck reported issue in this script: SC2086:info:28:27: Double quote to prevent globbing and word splitting
(shellcheck)
🪛 LanguageTool
usage-metrics/README.md
[uncategorized] ~31-~31: The official name of this software platform is spelled with a capital “H”.
Context: ... mobile and desktop ### 🤖 Automation (.github/workflows/usage-metrics.yml) - Runs mo...
(GITHUB)
[style] ~180-~180: Consider using a different verb for a more formal wording.
Context: ...le ## Contributing To add features or fix issues: 1. Test changes locally with `...
(FIX_RESOLVE)
🪛 markdownlint-cli2 (0.18.1)
usage-metrics/README.md
7-7: Bare URL used
(MD034, no-bare-urls)
98-98: Bare URL used
(MD034, no-bare-urls)
99-99: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: lint (modules/yugabytedb) / lint: modules/yugabytedb
- GitHub Check: lint (modules/solace) / lint: modules/solace
- GitHub Check: lint (modules/socat) / lint: modules/socat
- GitHub Check: lint (modules/scylladb) / lint: modules/scylladb
- GitHub Check: lint (modules/dockermcpgateway) / lint: modules/dockermcpgateway
- GitHub Check: lint (modules/meilisearch) / lint: modules/meilisearch
- GitHub Check: lint (modules/nats) / lint: modules/nats
- GitHub Check: lint (modules/neo4j) / lint: modules/neo4j
- GitHub Check: lint (modules/k3s) / lint: modules/k3s
- GitHub Check: lint / lint:
- GitHub Check: lint (modules/ollama) / lint: modules/ollama
- GitHub Check: lint (modules/nebulagraph) / lint: modules/nebulagraph
- GitHub Check: lint (modules/inbucket) / lint: modules/inbucket
- GitHub Check: lint (modules/memcached) / lint: modules/memcached
- GitHub Check: lint (modules/compose) / lint: modules/compose
- GitHub Check: lint (modules/milvus) / lint: modules/milvus
- GitHub Check: lint (modules/gcloud) / lint: modules/gcloud
- GitHub Check: lint (modules/localstack) / lint: modules/localstack
- GitHub Check: lint (modules/databend) / lint: modules/databend
- GitHub Check: Analyze (go)
🔇 Additional comments (11)
usage-metrics/go.mod (1)
1-3: Module definition looks correct.The usage-metrics submodule is properly scoped and Go version is current. Dependencies will be managed by go.mod when the collect-metrics.go program is compiled.
.gitignore (1)
27-28: Verify binary output path.The .gitignore entry references
usage-metrics/scripts/collect-metrics, but based on the workflow (which runs fromusage-metricsdirectory and buildscollect-metrics.go), the binary should be atusage-metrics/collect-metrics(without thescripts/subdirectory). Confirm the correct path to ignore.scripts/changed-modules.sh (1)
89-89: Exclusions properly aligned with system design.Both additions—the usage-metrics workflow and the generated CSV file—are appropriately excluded from triggering module builds, consistent with the established pattern for documentation and automation files.
Also applies to: 102-102
mkdocs.yml (1)
22-27: Configuration additions are well-structured.The externally sourced libraries are pinned to specific versions for stability, and the local assets are properly referenced. Navigation entry is correctly positioned.
Also applies to: 157-157
docs/usage-metrics.md (1)
1-30: Static structure is well-designed for dynamic population.The HTML markup provides appropriate hooks (element IDs, semantic structure) for the JavaScript runtime to populate with data. Loading and error states are properly provisioned.
.github/workflows/usage-metrics.yml (2)
36-43: Verify git tag extraction logic handles edge cases.The tag extraction pipeline uses several transformations (grep, sed, sort, awk). While the logic appears sound, test these scenarios:
- When no tags exist matching the pattern
- When v0.13.0 doesn't exist (will awk still work correctly?)
- When there are pre-release tags (e.g., v0.40.0-rc1) in the repo
The current grep pattern
v0\.[0-9]+\.[0-9]+$will correctly exclude pre-releases, which is good. However, verify that the awk line/v0.13.0/,0correctly handles the case where v0.13.0 might not exist.
62-85: PR creation flow is solid.The branch naming convention is clear, commit messages are descriptive, and the PR body provides context. The workflow correctly checks for staged changes before committing to avoid empty commits.
docs/css/usage-metrics.css (1)
1-111: CSS is well-structured and responsive.The stylesheet provides coherent styling across multiple components with appropriate responsive breakpoints. Grid layout for stats, proper shadow and radius use for depth, and mobile considerations are all in place. Color contrast and accessibility appear adequate.
docs/js/usage-metrics.js (2)
10-33: Solid data loading, processing, and init flowThe CSV fetch, Papa.parse usage, sorting/grouping, and single‑run
initguard form a clean, resilient pipeline. Error handling withshowErrorplus resettingisInitializedon failure is a good touch, and the page‑presence check viatrendChartkeeps this script from doing work on unrelated pages.Also applies to: 45-63, 323-362
105-175: Verify Chart.js time‑scale adapter is wired correctlyBoth the trend and total‑repositories charts rely on
type: 'time'for the x‑axis andx: d.datevalues. This assumes the Chart.js time adapter is loaded/registered in the page and can parse yourYYYY-MM-DDstrings. Please double‑check the docs page includes the appropriate adapter bundle/registration for the Chart.js major version you’re using so these axes render correctly.Also applies to: 177-260
usage-metrics/collect-metrics.go (1)
166-195: CSV append strategy is straightforward and safe for single-writer use
appendToCSVcorrectly resolves an absolute path, creates the file if missing, conditionally writes the header, and appends a single row per metric. For the current usage pattern (single CI job appending once per version per run) this is perfectly adequate and keeps the implementation simple.
| func main() { | ||
| var versions arrayFlags | ||
| csvPath := flag.String("csv", "../../docs/usage-metrics.csv", "Path to CSV file") | ||
| flag.Var(&versions, "version", "Version to query (can be specified multiple times)") | ||
| flag.Parse() | ||
|
|
||
| if len(versions) == 0 { | ||
| log.Fatal("At least one version is required. Use -version flag (can be repeated)") | ||
| } | ||
|
|
||
| if err := collectMetrics(versions, *csvPath); err != nil { | ||
| log.Fatalf("Failed to collect metrics: %v", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Confirm default CSV path matches the actual working directory
The default -csv value "../../docs/usage-metrics.csv" is quite sensitive to the process working directory. Depending on how the GitHub Action runs this binary (from repo root vs usage-metrics/), this can easily point outside the repo or to a non‑existent path, forcing callers to always override it. It’d be safer to align the default with the actual workflow working directory (e.g., ../docs/usage-metrics.csv or ./docs/usage-metrics.csv from repo root) or resolve relative to the repo root via an environment variable.
I'll verify the repository structure, file locations, and how the GitHub Actions workflow executes this binary to validate the concern about the default CSV path.
🏁 Script executed:
#!/bin/bash
# Find repository structure around usage-metrics
echo "=== Repository structure ==="
fd -t d -H --max-depth 3 | head -20
echo -e "\n=== Locate usage-metrics directory ==="
fd -t d "usage-metrics"
echo -e "\n=== Check for docs directory and CSV file ==="
find . -name "usage-metrics.csv" -o -name "docs" -type d | head -10
echo -e "\n=== GitHub Actions workflows ==="
find . -path "*/.github/workflows/*.yml" -o -path "*/.github/workflows/*.yaml" | head -10Length of output: 863
I need to check which workflow executes the collect-metrics binary and what its working directory is.
🏁 Script executed:
#!/bin/bash
# Search workflows for collect-metrics or usage-metrics references
echo "=== Searching workflows for collect-metrics ==="
grep -r "collect-metrics\|usage-metrics" .github/workflows/ --include="*.yml" --include="*.yaml" -n
echo -e "\n=== Check collect-metrics.go location ==="
find . -name "collect-metrics.go"
echo -e "\n=== Check for go.mod or build config in usage-metrics ==="
ls -la usage-metrics/ | grep -E "go.mod|Makefile|build|\.go$"Length of output: 839
🏁 Script executed:
#!/bin/bash
# Search workflows for collect-metrics or usage-metrics references
echo "=== Searching workflows for collect-metrics ==="
grep -r "collect-metrics\|usage-metrics" .github/workflows/ --include="*.yml" --include="*.yaml" -n
echo -e "\n=== Check collect-metrics.go location ==="
find . -name "collect-metrics.go"
echo -e "\n=== Check for go.mod or build config in usage-metrics ==="
ls -la usage-metrics/ | grep -E "go.mod|Makefile|build|\.go$"Length of output: 839
Default CSV path is fragile but not broken in CI context
The review comment correctly identifies that the default path "../../docs/usage-metrics.csv" is sensitive to working directory. However, the concern can be clarified:
-
In the GitHub Actions workflow: The path is explicitly overridden via the
-csvflag (.github/workflows/usage-metrics.yml:60), so the default is not used. The workflow setsworking-directory: usage-metricsbefore invoking the binary. -
Locally or in other contexts: If someone runs the binary from the repo root without the
-csvflag, the default would work. If run fromusage-metrics/, the path would resolve outside the repository and fail.
The path fragility is real. While CI uses an explicit flag and works around it, the default being relative (../../docs/...) instead of absolute or environment-variable-based remains a valid concern for robustness and clarity. The suggestion to align with actual invocation context or use an environment variable is sound for preventing user confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
modulegen/internal/mkdocs/types.go (2)
25-27:ExtraJavascriptfield wiring looks correct; naming nit onlyThe YAML tag
extra_javascriptmatches MkDocs’ config key, so this will correctly round‑trip additional JS assets. If you care about Go identifier style, you might consider renaming the field toExtraJavaScriptfor conventional capitalization, but behavior is already correct as‑is.
30-41: AddingUsage Metricsnav: confirmaddModule’s hard‑coded indices still point at the right items
addModuleassumesc.Nav[3].Modulesandc.Nav[4].Examplesare the “Modules” and “Examples” entries. With a newUsage Metricsnav item added inmkdocs.yml, this will break if the new entry is inserted before either of those and shifts their indices.Two follow‑ups to consider:
- Verify that in
mkdocs.ymlthe newUsage Metricsitem is appended after the existing “Modules”/“Examples” entries so indices 3 and 4 are unchanged.- Longer term, avoid magic indices by locating the correct nav entry by key (e.g., scanning
c.Navfor the item wherelen(Modules) > 0orlen(Examples) > 0) before mutating it, so future nav additions don’t silently misplace generated links.Also applies to: 48-79
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
modulegen/internal/mkdocs/types.go(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: lint (modules/artemis) / lint: modules/artemis
- GitHub Check: lint (modules/mariadb) / lint: modules/mariadb
- GitHub Check: lint (modules/dockermodelrunner) / lint: modules/dockermodelrunner
- GitHub Check: lint (modules/influxdb) / lint: modules/influxdb
- GitHub Check: lint (modules/vault) / lint: modules/vault
- GitHub Check: lint (modules/pinecone) / lint: modules/pinecone
- GitHub Check: lint (modules/mssql) / lint: modules/mssql
- GitHub Check: lint (modules/elasticsearch) / lint: modules/elasticsearch
- GitHub Check: Analyze (go)
Summary
Implements an automated system to track and visualize testcontainers-go adoption across GitHub repositories. The system queries GitHub's Code Search API for usage in go.mod
files, stores historical data, and displays interactive charts on the documentation site.
Features
Components Added
Data Collection (
usage-metrics/)collect-metrics.go: Go program that queries GitHub Code Search API-versionflags for efficient batch processingREADME.md: Complete documentation for the metrics systemVisualization (
docs/)usage-metrics.md: Dashboard page integrated into documentationjs/usage-metrics.js: Interactive charts powered by Chart.jscss/usage-metrics.css: Responsive styling for dashboardAutomation (
.github/workflows/)usage-metrics.yml: GitHub Actions workflowData Storage
docs/usage-metrics.csv: Historical usage data (758 initial records)date,version,countCI/CD Integration
scripts/changed-modules.sh:docs/usage-metrics.csvto excluded files.github/workflows/usage-metrics.ymlto excluded filesHow It Works
chore(metrics): update usage metrics (YYYY-MM-DD))Viewing the Dashboard
mkdocs serve→ http://localhost:8000/usage-metrics/Test Plan
workflow_dispatch