Simple distributed druid-indexer task manager for kafka ingestion
- Overview
- Features
- Project Structure
- Getting Started
- Project Roadmap
- Contributing
- License
- Acknowledgments
rb-druid-indexer
is a cluster-compatible service designed to manage the indexing of Kafka data streams into Druid. It handles task announcements, generates configuration specification files, and submits tasks to the Druid Supervisor.
- Cluster compatible using ZooKeeper
- Automatic task ingestion & specfile configuration
- FailOver support for long-term ingestion
The configuration for rb-druid-indexer
is defined in a YAML file and includes settings for both Zookeeper and the tasks that should be executed. Below is an example configuration file:
zookeeper_servers:
- "127.0.0.1:2181"
tasks:
- task_name: "rb_monitor"
namespace: ""
feed: "rb_monitor"
kafka_host: "kafka.service:9092"
- task_name: "rb_flow"
namespace: ""
feed: "rb_flow_post"
kafka_host: "kafka.service:9092"
- Description: A list of Zookeeper servers used for leadership checks and coordination.
- Type: Array of strings.
- Example:
"127.0.0.1:2181"
- Description: A list of tasks to be managed by the indexer. Each task contains the following attributes:
- Description: The name of the task. This is used to identify the task in the system.
- Type: String.
- Example:
"rb_monitor"
"rb_flow"
- Description: The namespace associated with the task. This can be left empty if not needed.
- Type: String.
- Example:
""
(empty)
- Description: The name of the Kafka feed associated with the task. This specifies which feed to listen to.
- Type: String.
- Example:
"rb_monitor"
"rb_flow_post"
- Description: The host and port for the Kafka service where the feed is being published.
- Type: String.
- Example:
"kafka.service:9092"
Every dataSource is managed in /druid/datasources/${datasource}.go
for example
package datasources
import druidrouter "rb-druid-indexer/druid"
var FlowMetrics = []druidrouter.Metrics{
{Type: "count", Name: "events"},
{Type: "longSum", Name: "sum_bytes", FieldName: "bytes"},
{Type: "longSum", Name: "sum_pkts", FieldName: "pkts"},
{Type: "longSum", Name: "sum_rssi", FieldName: "client_rssi_num"},
{Type: "hyperUnique", Name: "clients", FieldName: "client_mac"},
{Type: "hyperUnique", Name: "wireless_stations", FieldName: "wireless_station"},
{Type: "longSum", Name: "sum_dl_score", FieldName: "darklist_score"},
}
var FlowDimensions = []string{
"application_id_name", "building", "building_uuid", "campus", "campus_uuid",
"client_accounting_type", "client_auth_type", "client_fullname", "client_gender",
"client_id", "client_latlong", "client_loyality", "client_mac", "client_mac_vendor",
"client_rssi", "client_vip", "conversation", "coordinates_map", "darklist_category",
"darklist_direction", "darklist_score_name", "darklist_score", "deployment",
"deployment_uuid", "direction", "dot11_protocol", "dot11_status", "dst_map", "duration",
"engine_id_name", "floor", "floor_uuid", "host", "host_l2_domain", "http_social_media",
"http_user_agent", "https_common_name", "interface_name", "ip_as_name", "ip_country_code",
"ip_protocol_version", "l4_proto", "lan_interface_description", "lan_interface_name",
"lan_ip", "lan_ip_is_malicious", "lan_ip_as_name", "lan_ip_country_code", "lan_ip_name",
"lan_ip_net_name", "lan_l4_port", "lan_name", "lan_vlan", "market", "market_uuid",
"namespace", "namespace_uuid", "organization", "organization_uuid", "product_name",
"public_ip", "public_ip_is_malicious", "public_ip_mac", "referer", "referer_l2",
"scatterplot", "selector_name", "sensor_ip", "sensor_name", "sensor_uuid", "service_provider",
"service_provider_uuid", "src_map", "tcp_flags", "tos", "type", "url", "wan_interface_description",
"wan_interface_name", "wan_ip", "wan_ip_is_malicious", "wan_ip_as_name", "wan_ip_country_code",
"wan_ip_map", "wan_ip_net_name", "wan_l4_port", "wan_name", "wan_vlan", "wireless_id",
"wireless_operator", "wireless_station", "zone", "zone_uuid",
}
const FlowDataSource = "rb_flow"
and later published in the config.go
file in /druid/datasources/config.go
var Configs = map[string]DataSourceConfig{
"rb_flow": {
DataSource: FlowDataSource,
Metrics: FlowMetrics,
Dimensions: FlowDimensions,
},
"rb_monitor": {
DataSource: MonitorDataSource,
Metrics: MonitorMetrics,
Dimensions: MonitorDimensions,
},
}
So if you want to add your own you have to make a copy of any datasource and include in the config.go of datasource for later call it with your config.yml
└── rb-druid-indexer/
├── LICENSE
├── config
│ └── config.go
├── druid
│ ├── datasources
│ ├── realtime.go
│ └── router.go
├── example_config.yml
├── go.mod
├── go.sum
├── main.go
└── zkclient
├── client.go
├── election.go
└── task_announcer.go
RB-DRUID-INDEXER/
__root__
main.go go.mod go.sum example_config.yml
config
config.go
zkclient
election.go client.go task_announcer.go
druid
realtime.go router.go datasources
location.go config.go event.go wireless.go monitor.go state.go flow.go
Before getting started with rb-druid-indexer, ensure your runtime environment meets the following requirements:
- Programming Language: Go
- Package Manager: Go modules
Install rb-druid-indexer using one of the following methods:
Build from source:
- Clone the rb-druid-indexer repository:
❯ git clone https://github.com/redBorder/rb-druid-indexer
- Navigate to the project directory:
❯ cd rb-druid-indexer
- Install the project dependencies:
❯ go build
Run rb-druid-indexer using the following command:
Using go modules
❯ ./rb-druid-indexer -c config.yml
- 💬 Join the Discussions: Share your insights, provide feedback, or ask questions.
- 🐛 Report Issues: Submit bugs found or log feature requests for the
rb-druid-indexer
project. - 💡 Submit Pull Requests: Review open PRs, and submit your own PRs.
Contributing Guidelines
- Fork the Repository: Start by forking the project repository to your github account.
- Clone Locally: Clone the forked repository to your local machine using a git client.
git clone https://github.com/redBorder/rb-druid-indexer
- Create a New Branch: Always work on a new branch, giving it a descriptive name.
git checkout -b new-feature-x
- Make Your Changes: Develop and test your changes locally.
- Commit Your Changes: Commit with a clear message describing your updates.
git commit -m 'Implemented new feature x.'
- Push to github: Push the changes to your forked repository.
git push origin new-feature-x
- Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
- Review: Once your PR is reviewed and approved, it will be merged into the main branch. Congratulations on your contribution!
This project is protected under the AGPL-3.0 License. For more details, refer to the LICENSE file.