CheckMate

CheckMate is a service monitoring tool written in Go that provides real-time health checks and metrics for infrastructure. It supports multiple protocols, customizable rules, and Prometheus integration.

DISCLAIMER: This is a personal project and is not meant to be used in a production environment as it is not feature complete nor secure nor tested and under heavy development.

Features

Core Features

Multi-protocol support (TCP, HTTP, SMTP, DNS)
Hierarchical configuration (Sites → Hosts → Checks)
Configurable check intervals per service
Prometheus metrics integration
Rule-based monitoring with custom conditions
Flexible notification system with rule-specific routing

Metrics & Monitoring

Service availability status
Response time measurements
Rule-based alerting with customizable conditions
Prometheus-compatible metrics endpoint
Downtime tracking
Latency histograms and gauges

Technical Features

YAML-based configuration
Modular architecture for easy extension using interfaces
Site-based infrastructure organization
Tag inheritance (site tags are inherited by hosts)

Quick Start

Clone the repository:

git clone https://github.com/whiskeyjimbo/CheckMate.git
cd CheckMate

Build using Make:

make build

Run using Make:

make run config.yaml ## see configuration below for more details

Configuration

CheckMate is configured using a YAML (default: ./config.yaml) file. Here's a complete example:

sites:
  - name: us-east-1
    tags: ["prod", "aws", "use1"]
    hosts:
      - host: api.example.com
        tags: ["public"]
        checks:
          - port: "443"
            protocol: HTTP
            interval: 30s
            tags: ["https", "api"]
          - port: "22"
            protocol: TCP
            interval: 1m
            tags: ["ssh"]

  - name: eu-west-1
    tags: ["prod", "aws", "euw1"]
    hosts:
      - host: eu.example.com
        tags: ["api", "public"]
        checks:
          - port: "443"
            protocol: HTTP
            interval: 30s

rules:
  - name: high_latency_warning
    condition: "responseTime > 2s"
    tags: ["prod"]
    notifications: ["log"]
  
  - name: critical_downtime
    condition: "downtime > 5m"
    tags: ["prod"]
    notifications: ["log", "slack"]

notifications:
  - type: "log"

Site Configuration

name: Unique identifier for the site
tags: List of tags inherited by all hosts in the site
hosts: List of hosts in this site

Host Configuration

host: The hostname or IP to monitor
tags: Additional tags specific to this host (combined with site tags)
checks: List of service checks
- port: Port number to check
- protocol: One of: TCP, HTTP, SMTP, DNS
- interval: Check frequency (e.g., "30s", "1m")
- tags: Additional tags specific to this check (combined with site and host tags)

Rule Configuration

name: Unique rule identifier
condition: Expression to evaluate (uses responseTime and downtime variables)
tags: List of tags to match against hosts
notifications: List of notification types to use when rule triggers
- If omitted, all configured notifiers will be used

Notification Configuration

type: Type of notification ("log", with more coming soon)
Each notification type can have its own configuration options

Metrics

CheckMate exposes Prometheus metrics at :9100/metrics including:

checkmate_check_success: Service availability (1 = up, 0 = down)
checkmate_check_latency_milliseconds: Response time gauge
checkmate_check_latency_milliseconds_histogram: Response time distribution

Labels included with metrics:

site: Site name
host: Target hostname
port: Service port
protocol: Check protocol
tags: Comma-separated list of combined site and host tags

Example Prometheus queries:

# Filter checks by site
checkmate_check_success{site="us-east-1"}

# Average response time for production APIs
avg(checkmate_check_latency_milliseconds{tags=~".*prod.*", tags=~".*api.*"})

# 95th percentile latency by site
histogram_quantile(0.95, sum(rate(checkmate_check_latency_milliseconds_histogram[5m])) by (le, site))

Health Checks

CheckMate provides Kubernetes-compatible health check endpoints:

/health/live - Liveness probe
- Returns 200 OK when the service is running
/health/ready - Readiness probe
- Returns 200 OK when the service is ready to receive traffic
- Returns 503 Service Unavailable during initialization

All health check endpoints are served on port 9100 alongside the metrics endpoint.

Logging

CheckMate uses structured logging with the following fields:

Basic check information:
- site: Site name
- host: Target hostname
- port: Service port
- protocol: Check protocol
- success: Check result (true/false)
- responseTime_us: Response time in microseconds
- tags: Array of host tags
Rule evaluation:
- rule: Rule name
- ruleTags: Tags assigned to the rule
- hostTags: Tags assigned to the host
- condition: Rule condition
- downtime: Current downtime duration
- responseTime: Last check response time

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Roadmap

Technical Details

Implementing a New Checker

To add support for a new protocol, implement the Checker interface:

// 1. Create a new type for your checker
type MyNewChecker struct {
    protocol Protocol
}

// 2. Implement the Checker interface
func (c *MyNewChecker) Check(ctx context.Context, address string) CheckResult {
    // Perform your check logic here
    result := CheckResult{
        Success:      false,
        ResponseTime: 0,
        Error:        nil,
    }
    
    startTime := time.Now()
    
    // Your check implementation
    // For example:
    // - Open a connection
    // - Send/receive data
    // - Validate response
    
    result.ResponseTime = time.Since(startTime)
    result.Success = true // based on check success
    
    return result
}

func (c *MyNewChecker) Protocol() Protocol {
    return c.protocol
}

// 3. Register your checker in pkg/checkers/checker.go
func NewChecker(protocol string) (Checker, error) {
    switch Protocol(protocol) {
    // ... existing protocols ...
    case ProtocolMyNew:
        return &MyNewChecker{protocol: ProtocolMyNew}, nil
    default:
        return nil, fmt.Errorf("unsupported protocol: %s", protocol)
    }
}

Implementing a New Notifier

To add a new notification system, implement the Notifier interface:

// 1. Create a new notification type constant
const MyNewNotification NotificationType = "mynew"

// 2. Create your notifier type
type MyNewNotifier struct {
    // Add any required fields
    client    *myclient.Client
    apiKey    string
}

// 3. Implement the Notifier interface
func (n *MyNewNotifier) Initialize(ctx context.Context) error {
    // Setup your notification client/connection
    n.client = myclient.New(n.apiKey)
    return nil
}

func (n *MyNewNotifier) SendNotification(ctx context.Context, notification Notification) error {
    // Convert the notification to your system's format
    message := MyNotificationFormat{
        Text:     notification.Message,
        Severity: convertLevel(notification.Level),
        Tags:     notification.Tags,
        Metadata: map[string]string{
            "host":     notification.Host,
            "port":     notification.Port,
            "protocol": notification.Protocol,
        },
    }
    
    // Send the notification
    return n.client.Send(ctx, message)
}

func (n *MyNewNotifier) Type() NotificationType {
    return MyNewNotification
}

func (n *MyNewNotifier) Close() error {
    return n.client.Close()
}

// 4. Register your notifier in pkg/notifications/notifier.go
func NewNotifier(notifierType string, opts ...interface{}) (Notifier, error) {
    switch NotificationType(notifierType) {
    // ... existing notifiers ...
    case MyNewNotification:
        if len(opts) > 0 {
            if apiKey, ok := opts[0].(string); ok {
                return &MyNewNotifier{apiKey: apiKey}, nil
            }
        }
        return nil, fmt.Errorf("mynew notifier requires an API key")
    default:
        return nil, fmt.Errorf("unsupported notification type: %s", notifierType)
    }
}

Available Make Commands

make dev          # Setup development environment
make lint         # Run linter
make test         # Run tests
make coverage     # Generate test coverage report
make docker-build # Build Docker image
make help         # Show all available commands

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.github		.github
pkg		pkg
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yaml		.goreleaser.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CheckMate

Features

Core Features

Metrics & Monitoring

Technical Features

Quick Start

Configuration

Site Configuration

Host Configuration

Rule Configuration

Notification Configuration

Metrics

Health Checks

Logging

Contributing

License

Roadmap

Technical Details

Implementing a New Checker

Implementing a New Notifier

Available Make Commands

About

Releases 3

Packages

Languages

License

whiskeyjimbo/CheckMate

Folders and files

Latest commit

History

Repository files navigation

CheckMate

Features

Core Features

Metrics & Monitoring

Technical Features

Quick Start

Configuration

Site Configuration

Host Configuration

Rule Configuration

Notification Configuration

Metrics

Health Checks

Logging

Contributing

License

Roadmap

Technical Details

Implementing a New Checker

Implementing a New Notifier

Available Make Commands

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages