Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Software Quality Assurance and Telemetry Proposal #45

Open
raamsri opened this issue Jan 8, 2025 · 0 comments
Open

feature: Software Quality Assurance and Telemetry Proposal #45

raamsri opened this issue Jan 8, 2025 · 0 comments
Labels
enhancement New feature or request Needs More Info The issue is acknowledged, but requires more information for an objective understanding

Comments

@raamsri
Copy link
Contributor

raamsri commented Jan 8, 2025

Feature Request

Summary

The goal is to provide the fastest and most reliable open-source services through StrataSTOR. To achieve this, it is proposed to collect metrics on endpoint performance and send fully anonymized telemetry reports ("anonymous usage statistics") to a centralized server. This data will help analyze how changes impact performance and stability, while also identifying potential issues.

The telemetry system is designed with the following principles:

  • Full transparency regarding the data collected, its purpose, and its processing.
  • Privacy-first approach, ensuring no personally identifiable information (PII) is collected or transmitted.
  • An opt-out mechanism for users who do not wish to share telemetry data.

This proposal aims to outline the data collection and processing pipeline, use cases, and the privacy measures in place.


Data Processing Pipeline

The telemetry system will operate in the following steps:

  1. Telemetry data is collected at each service instance.
  2. Collected data is periodically sent to a dedicated telemetry endpoint (e.g., telemetry.stratastor.io).
  3. The data is stored in a secure, private storage system managed by StrataSTOR.
  4. Data is processed, filtered, and aggregated on secure infrastructure.
  5. The output is analyzed to generate insights in the form of anonymized reports.

Objectives

The telemetry pipeline is proposed to achieve the following goals:

  • Determine the number of production deployments.
  • Identify which features are used and how frequently.
  • Monitor throughput and understand how much traffic deployments handle.
  • Evaluate the frequency of specific feature usage.
  • Detect issues introduced by new features (e.g., buggy releases).
  • Identify problems at scale, such as slow or underperforming endpoints.
  • Track which versions are deployed across environments.

Privacy Measures

To ensure user privacy, the telemetry system incorporates the following safeguards:

  1. Only anonymized, aggregated data is transmitted.
  2. Query parameters, request/response bodies, headers, and path parameters are excluded from transmission.
  3. IP addresses of both host and client are anonymized to 0.0.0.0 before transmission.
  4. No environment-specific data is collected except for:
    • Operating system identifier (e.g., Linux, Windows, macOS).
    • Target architecture (e.g., amd64, arm64).
    • Number of CPUs.
    • Binary metadata (build time, git hash, version).

Identification

To group clusters or installations, a SHA-256 hash is created from installation-specific metadata (e.g., host and port). Additionally, each running instance generates a cryptographically secure random identifier (UUID v4) during startup.


Metrics Collected

System Metrics

  • goarch: Target architecture (e.g., amd64).
  • goos: Operating system (e.g., linux, darwin).
  • numCpu: Number of available CPUs.
  • runtimeVersion: Version of the Go runtime.
  • version: Binary version.
  • hash: Git hash of the binary.
  • buildTime: Build timestamp.

Request Metrics

  • host: Request URL host name.
  • path: An allowlisted part of the request path (e.g., /api/v1/resource).
  • method: HTTP method (e.g., GET, POST).
  • latency: Execution time in milliseconds.
  • size: Response size.
  • status: HTTP status code.

Real-World Use Cases

Implementing this telemetry system could enable outcomes like:

  1. Detecting underutilized APIs or features and deciding whether to deprecate or refactor them.
  2. Identifying performance bottlenecks, such as high-latency endpoints, and resolving them in subsequent releases.
  3. Tracking version adoption rates and encouraging upgrades from older, potentially insecure versions.
  4. Monitoring feature usage to prioritize development efforts for popular features.

Opt-Out Mechanism

To ensure flexibility, the telemetry system will include an opt-out mechanism:

  • A configuration flag (e.g., --sqa-opt-out) to disable telemetry collection.
  • Disabling telemetry will not affect service functionality, aside from reducing insights available for improving StrataSTOR.
  • It may be necessary to always send minimal ping with version information once on start up.

Source Code and Transparency

The source code for the telemetry package will be open-source and made publicly available within the StrataSTOR ecosystem.


This proposal aims to establish a robust telemetry and quality assurance pipeline to ensure continuous improvement, better performance, and reliability for StrataSTOR's open-source services.

@raamsri raamsri added enhancement New feature or request Needs More Info The issue is acknowledged, but requires more information for an objective understanding labels Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Needs More Info The issue is acknowledged, but requires more information for an objective understanding
Projects
None yet
Development

No branches or pull requests

1 participant