Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

prometheus / node_exporter Public

Notifications You must be signed in to change notification settings
Fork 2.4k
Star 11.3k

Code
Issues 181
Pull requests 60
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feat: node exporter mixin large update #2665

Closed

v-zhuravlev wants to merge 15 commits into prometheus:master from grafana:master

Closed

feat: node exporter mixin large update #2665

v-zhuravlev wants to merge 15 commits into prometheus:master from grafana:master

Conversation 5 Commits 15 Checks 2 Files changed

Conversation

Copy link

Contributor

v-zhuravlev commented Apr 21, 2023 •

edited

Loading

This update introduces three tier view of linux nodes:

TOP: Fleet view: see group of your linux instances at once
Overview of the specific node: see specific node at a glance
Drill down: Set of dashboards for deep analysis with advanced metrics

Links and data links are provided for better navigation between views.

Checklist:

Convert graph panels to timeseries panels with default style (opacity, tooltip, legend position, etc).
Add info row to overview dashboard
Add linux network dashboard
- Add interfaces overview panel
- Add oper status timeline
- Add Sockstat/Netstat metrics to network dashboard
Add Advanced CPU and system dash
Add Advanced Memory dash
Add Fleet overview dash
- Add overview fleet table
- Add common CPU graph (top25)
- Add common memory graph (top25)
- Add common network graph (top25)
- Add common Disk / FS graph (top25)
Add annotations
- Reboot detected
- Kernel change detected
- OOM kill detected
Add job/cluster variable support for additional grouping

Various dashboards improvements:

Change 'logical core' line style to dotted
Update Disk I/O time metric to dots
Move dashboards parameters to _config, such as tags, timezone
Convert gauges to stat for memory usage panel
Add CPU usage stat panel
Add dashboards and data links to navigate between dashboards

Sorry, something went wrong.

All reactions

Copy link

Member

discordianfish commented Apr 26, 2023

Very nice! This needs a thorough review though. For this, can you first clean up the commit history and add DCO sign-off, then ping us when you're ready to get this reviewed?

All reactions

Sorry, something went wrong.

SuperQ requested changes

View reviewed changes

Copy link

Member

SuperQ left a comment

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a DCO sign-off. You can use git commit -s --amend to add it.

Sorry, something went wrong.

All reactions

Copy link

Contributor Author

v-zhuravlev commented May 24, 2023 •

edited

Loading

yes, will clean this up after alerts #2644 is merged.

All reactions

Sorry, something went wrong.

rgeyer and others added 7 commits

July 15, 2023 10:51


          Add some lint exclusions.

b190fcd

Add UIDs to all dashboards.
Add units and descriptions to all panels which were missing them.
Modify alerts descriptions and summaries as needed for linting.

Signed-off-by: Ryan J. Geyer <me@ryangeyer.com>


          Add multi-cluster dashboard lint exclusions

a6cc3b8

Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>


          Update mixin for linux-integration (#18)

* Add mountpoint to NodeFilesystem alerts
This helps to identify alerting filesystem.

* Decrease NodeFilesystem pending time to 15m
30m is too long and there is a risk of running out of disk space/inodes completely if something is filling up disk very fast (like log file).

* Add CPU and memory alerts
* Add failed systemd service alert
* Decrease NodeNetwork*Errs pending period
* Set 'at' everywhere as preposition for instance
* Add NodeDiskIOSaturation alert
* Add %(nodeExporterSelector)s to Network and conntrack alerts
* Add diskDevice selector
* Fix NodeMemoryHighUtilization alert
* Add NodeSystemSaturation and NodeMemoryMajorPagesFaults
* Decrease NodeSystemdServiceFailed severity to warning
* Extend alert description
* Add comma after 'mounted on'
* Add thresholds for memory alerts
* Add thresholds for memory, disk and system alerts
* Set severity to NodeCPUHighUsage to info
* Convert graph panels to timeseries panel
...With default style (opacity, tooltip etc).
Also:
Change 'logical core' line style to dotted
Update Disk I/O time metric to dots
* Move dashboard paramaters to config
* Add overview row
* Add Cpu Usage stat panel
* Add network dash
- Add interfaces overview panel
- Add oper status timeline
- Add common lib with reused elements (templates, queries)
- Add common panels with shared style to be used accross this mixin
* Remove external panels lib
* Add fleet dashboard
* Update fleet dash
* Add CPU and memory to fleet
* Add common cpu/memory/disk/network panels on fleet
* add network errors panel as points
* Fix alerts column in fleet table
* Add support for multiple group and instance labels
* Add sockstat to network dashboard
* Add netstat to network dashboard
* Change span to gridPod. Make overview row smaller.
- gridPos supports tiny panels height.
* add reboot annotation
* Add system dashboard
* add filesystem row
* Add disk and fs dashboard
* Add memory dashboard
* Add memory generic counters to memory dashboard
* Update common lib
* Update OOM killer panel
* Add common annotations: kernelChange, OOMkill
* Add mountpoint to NodeFilesystem alerts
- This helps to identify alerting filesystem.
* Add CPU and memory alerts
* Add failed systemd service alert
* Decrease NodeNetwork*Errs pending period
* Set 'at' everywhere as preposition for instance
* Add NodeDiskIOSaturation alert
* Add %(nodeExporterSelector)s to Network and conntrack alerts
* Add diskDevice selector
* Fix NodeMemoryHighUtilization alert
* Add NodeSystemSaturation and NodeMemoryMajorPagesFaults
* Decrease NodeSystemdServiceFailed severity to warning
* Remove unused import
* Add ability to set custom dashboardUID
* Add mountpoint to NodeFilesystem alerts
* Add failed systemd service alert
* Remove systemd panel
- systemd collector is disabled by default
* Add some lint exclusions.
- Add UIDs to all dashboards.
- Add units and descriptions to all panels which were missing them.
- Modify alerts descriptions and summaries as needed for linting.
* Add multi-cluster dashboard lint exclusions
* Extend alert description
* Add thresholds for memory, disk and system alerts
* Set severity to NodeCPUHighUsage to info
* Fix broken diskSpaceUsage link
* Fix cpuIdle panel units
* Change cpuUsage to use $__rate_interval
* Fix cpu usage (replace with nodeQuerySelector)
* Fix units (seconds->s)
* Fix iops units
* Add %(nodeQuerySelector)s to alerts queries
* Add support for multi in job
* Fix Pagesout metric
* Add total and available memory metrics
* Update context switches description
* Add network descriptions
* Change pipe to | from / in AxisLabel
* Update network descriptions
* Add timezone metric

---------

Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
Signed-off-by: Ryan J. Geyer <me@ryangeyer.com>


          Remove forced uid regeneration (#20)

fe0d3e0

Instead, one can redefine grafanaDashboardIDs in _config

Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>


          Remove + { uid: std.md5(name) } from dashboards.jsonnet as well

dc6db9c

Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>


          Split alerts into two groups (#21)

10495ef

to stay under mimir's default limit of 20 alerts per group.

Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>


          fix: fix typos (#22)

3d79ef1

Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>

v-zhuravlev force-pushed the master branch from 7bbe86c to 079978e Compare

July 15, 2023 11:00

Dasomeone added 2 commits

July 15, 2023 11:02


          Add datasource selector to data links

38b6ad6

This fixes an issue with selecting a node, given a specific datasource, and the link not using said datasource thus showing no data

Signed-off-by: Emily Ahlstrand Rager <emily.rager@grafana.com>


          Fix description in node usage dashboard

ed637a2

Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>

v-zhuravlev force-pushed the master branch from 079978e to ed637a2 Compare

July 15, 2023 11:03

Copy link

Contributor Author

v-zhuravlev commented Jul 15, 2023 •

edited

Loading

@discordianfish , @SuperQ , Hi!
Rebased and DCO signed.

All reactions

Sorry, something went wrong.

v-zhuravlev added 2 commits

July 15, 2023 11:07


          Format jsonnet

203d5cd

Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>


          Revert alerts pending durtions as agreed here prometheus#2644

720a114

Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>

v-zhuravlev requested a review from SuperQ

July 24, 2023 19:57

v-zhuravlev added 4 commits

November 14, 2023 22:02


          Add node-observ-lib (#25)

836773a

* Add node-observ-lib

* Remove trends support (not in 10.0 schema)

* Make filteringSelector for logs dashboard configurable

* Temp change dependency (until PR is merged for commonlib)

* Refactor config

* Update jsonnetfile.json

* Update README

* Add separate loki example

* Add sep file example


          Add missing metrics (#27)

4a48f6b


          Add Macos observability lib (#28)

94e744e

* Add gitignore to node-observ-lib

* Fix typo in node default filteringSelector

* Prep alert group names for macos

* Add macos-observ-lib

* Change overview dashboard:
show networkErrorsAndDroppedPerSec instead of networkErrorPerSec for Linux/MacOS

* Add more alerts

* Move alerts to sep file

* Breaking: Update layout

To allow to locally import linux from macos

* Bring back NodeFilesystemAlmostOutOfFiles alert

* Show only errors when they occur

* Only show network interfaces that had traffic change at least once during selected dashboard interval


          Fix memory queries typos

3eb9759

Copy link

Contributor Author

v-zhuravlev commented Nov 29, 2023 •

edited

Loading

Closing in favor of #2861
It is much cleaner now thanks to grafonnet. (IMHO)

All reactions

Sorry, something went wrong.

v-zhuravlev closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

SuperQ Awaiting requested review from SuperQ

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

5 participants

Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.