RB-DRUID-INDEXER

Simple distributed druid-indexer task manager for kafka ingestion

Overview

rb-druid-indexer is a cluster-compatible service designed to manage the indexing of Kafka data streams into Druid. It handles task announcements, generates configuration specification files, and submits tasks to the Druid Supervisor.

Features

Cluster compatible using ZooKeeper
Automatic task ingestion & specfile configuration
FailOver support for long-term ingestion

Configuration

The configuration for rb-druid-indexer is defined in a YAML file and includes settings for both Zookeeper and the tasks that should be executed. Below is an example configuration file:

zookeeper_servers:
  - "127.0.0.1:2181"

tasks:
  - task_name: "rb_monitor"
    namespace: ""
    feed: "rb_monitor"
    kafka_host: "kafka.service:9092"
  - task_name: "rb_flow"
    namespace: ""
    feed: "rb_flow_post"
    kafka_host: "kafka.service:9092"

zookeeper_servers

Description: A list of Zookeeper servers used for leadership checks and coordination.
Type: Array of strings.
Example:
- "127.0.0.1:2181"

tasks

Description: A list of tasks to be managed by the indexer. Each task contains the following attributes:

task_name

Description: The name of the task. This is used to identify the task in the system.
Type: String.
Example:
- "rb_monitor"
- "rb_flow"

namespace (optional)

Description: The namespace associated with the task. This can be left empty if not needed.
Type: String.
Example:
- "" (empty)

feed

Description: The name of the Kafka feed associated with the task. This specifies which feed to listen to.
Type: String.
Example:
- "rb_monitor"
- "rb_flow_post"

kafka_host

Description: The host and port for the Kafka service where the feed is being published.
Type: String.
Example:
- "kafka.service:9092"

Every dataSource is managed in /druid/datasources/${datasource}.go for example

package datasources

import druidrouter "rb-druid-indexer/druid"

var FlowMetrics = []druidrouter.Metrics{
	{Type: "count", Name: "events"},
	{Type: "longSum", Name: "sum_bytes", FieldName: "bytes"},
	{Type: "longSum", Name: "sum_pkts", FieldName: "pkts"},
	{Type: "longSum", Name: "sum_rssi", FieldName: "client_rssi_num"},
	{Type: "hyperUnique", Name: "clients", FieldName: "client_mac"},
	{Type: "hyperUnique", Name: "wireless_stations", FieldName: "wireless_station"},
	{Type: "longSum", Name: "sum_dl_score", FieldName: "darklist_score"},
}

var FlowDimensions = []string{
	"application_id_name", "building", "building_uuid", "campus", "campus_uuid",
	"client_accounting_type", "client_auth_type", "client_fullname", "client_gender",
	"client_id", "client_latlong", "client_loyality", "client_mac", "client_mac_vendor",
	"client_rssi", "client_vip", "conversation", "coordinates_map", "darklist_category",
	"darklist_direction", "darklist_score_name", "darklist_score", "deployment",
	"deployment_uuid", "direction", "dot11_protocol", "dot11_status", "dst_map", "duration",
	"engine_id_name", "floor", "floor_uuid", "host", "host_l2_domain", "http_social_media",
	"http_user_agent", "https_common_name", "interface_name", "ip_as_name", "ip_country_code",
	"ip_protocol_version", "l4_proto", "lan_interface_description", "lan_interface_name",
	"lan_ip", "lan_ip_is_malicious", "lan_ip_as_name", "lan_ip_country_code", "lan_ip_name",
	"lan_ip_net_name", "lan_l4_port", "lan_name", "lan_vlan", "market", "market_uuid",
	"namespace", "namespace_uuid", "organization", "organization_uuid", "product_name",
	"public_ip", "public_ip_is_malicious", "public_ip_mac", "referer", "referer_l2",
	"scatterplot", "selector_name", "sensor_ip", "sensor_name", "sensor_uuid", "service_provider",
	"service_provider_uuid", "src_map", "tcp_flags", "tos", "type", "url", "wan_interface_description",
	"wan_interface_name", "wan_ip", "wan_ip_is_malicious", "wan_ip_as_name", "wan_ip_country_code",
	"wan_ip_map", "wan_ip_net_name", "wan_l4_port", "wan_name", "wan_vlan", "wireless_id",
	"wireless_operator", "wireless_station", "zone", "zone_uuid",
}

const FlowDataSource = "rb_flow"

and later published in the config.go file in /druid/datasources/config.go

var Configs = map[string]DataSourceConfig{
	"rb_flow": {
		DataSource: FlowDataSource,
		Metrics:    FlowMetrics,
		Dimensions: FlowDimensions,
	},
	"rb_monitor": {
		DataSource: MonitorDataSource,
		Metrics:    MonitorMetrics,
		Dimensions: MonitorDimensions,
	},
}

So if you want to add your own you have to make a copy of any datasource and include in the config.go of datasource for later call it with your `config.yml`

Project Structure

└── rb-druid-indexer/
    ├── LICENSE
    ├── config
    │   └── config.go
    ├── druid
    │   ├── datasources
    │   ├── realtime.go
    │   └── router.go
    ├── example_config.yml
    ├── go.mod
    ├── go.sum
    ├── main.go
    └── zkclient
        ├── client.go
        ├── election.go
        └── task_announcer.go

Project Index

RB-DRUID-INDEXER/

__root__

main.go

go.mod

go.sum

example_config.yml

config

config.go

zkclient

election.go

client.go

task_announcer.go

druid

realtime.go

router.go

datasources

location.go

config.go

event.go

wireless.go

monitor.go

state.go

flow.go

Getting Started

Prerequisites

Before getting started with rb-druid-indexer, ensure your runtime environment meets the following requirements:

Programming Language: Go
Package Manager: Go modules

Installation

Install rb-druid-indexer using one of the following methods:

Build from source:

Clone the rb-druid-indexer repository:

❯ git clone https://github.com/redBorder/rb-druid-indexer

Navigate to the project directory:

❯ cd rb-druid-indexer

Install the project dependencies:

Using go modules

❯ go build

Usage

Run rb-druid-indexer using the following command: Using go modules

❯ ./rb-druid-indexer -c config.yml

Contributing

💬 Join the Discussions: Share your insights, provide feedback, or ask questions.
🐛 Report Issues: Submit bugs found or log feature requests for the rb-druid-indexer project.
💡 Submit Pull Requests: Review open PRs, and submit your own PRs.

Contributing Guidelines

Fork the Repository: Start by forking the project repository to your github account.
Clone Locally: Clone the forked repository to your local machine using a git client.
```
git clone https://github.com/redBorder/rb-druid-indexer
```
Create a New Branch: Always work on a new branch, giving it a descriptive name.
```
git checkout -b new-feature-x
```
Make Your Changes: Develop and test your changes locally.
Commit Your Changes: Commit with a clear message describing your updates.
```
git commit -m 'Implemented new feature x.'
```
Push to github: Push the changes to your forked repository.
```
git push origin new-feature-x
```
Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
Review: Once your PR is reviewed and approved, it will be merged into the main branch. Congratulations on your contribution!

Contributor Graph

License

This project is protected under the AGPL-3.0 License. For more details, refer to the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RB-DRUID-INDEXER

Table of Contents

Overview

Features

Configuration

zookeeper_servers

tasks

task_name

namespace (optional)

feed

kafka_host

So if you want to add your own you have to make a copy of any datasource and include in the config.go of datasource for later call it with your `config.yml`

Project Structure

Project Index

Getting Started

Prerequisites

Installation

Usage

Contributing

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
config		config
druid		druid
zkclient		zkclient
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example_config.yml		example_config.yml
go.mod		go.mod
go.sum		go.sum
main.go		main.go

License

redBorder/rb-druid-indexer

Folders and files

Latest commit

History

Repository files navigation

RB-DRUID-INDEXER

Table of Contents

Overview

Features

Configuration

zookeeper_servers

tasks

task_name

namespace (optional)

feed

kafka_host

So if you want to add your own you have to make a copy of any datasource and include in the config.go of datasource for later call it with your config.yml

Project Structure

Project Index

Getting Started

Prerequisites

Installation

Usage

Contributing

License

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

So if you want to add your own you have to make a copy of any datasource and include in the config.go of datasource for later call it with your `config.yml`

Packages