Skip to content

Commit

Permalink
Merge branch 'main' into trinity/update-json-obj
Browse files Browse the repository at this point in the history
  • Loading branch information
trinity-1686a committed Aug 20, 2024
2 parents 6a7049b + c59be63 commit c6b488d
Show file tree
Hide file tree
Showing 88 changed files with 5,693 additions and 1,310 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
- "4571:4571"
- "8080:8080"
env:
SERVICES: kinesis,s3
SERVICES: kinesis,s3,sqs
options: >-
--health-cmd "curl -k https://localhost:4566"
--health-interval 10s
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ When you submit a pull request to the project, the CI system runs several verifi
You will be notified by email from the CI system if any issues are discovered, but if you want to run these checks locally before submitting PR or in order to verify changes you can use the following commands in the root directory:
1. To verify that all tests are passing, run `make test-all`.
2. To fix code style and format as well as catch common mistakes run `make fix`. Alternatively, run `make -k test-all docker-compose-down` to tear down the Docker services after running all the tests.
3. To build docs run `make build-docs`.
3. To build docs run `make build-rustdoc`.

# Development

Expand Down Expand Up @@ -58,7 +58,7 @@ Run `make test-all` to run all tests.
* `make fmt` - runs formatter, this command requires the nightly toolchain to be installed by running `rustup toolchain install nightly`.
* `make fix` - runs formatter and clippy checks.
* `make typos` - runs the spellcheck tool over the codebase. (Install by running `cargo install typos-cli`)
* `make docs` - builds docs.
* `make doc` - builds docs.
* `make docker-compose-up` - starts Docker services.
* `make docker-compose-down` - stops Docker services.
* `make docker-compose-logs` - shows Docker logs.
Expand Down
16 changes: 14 additions & 2 deletions distribution/lambda/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,12 @@ simplify the setup and avoid unstable deployments.
[1]: https://rust-lang-nursery.github.io/rust-cookbook/development_tools/debugging/config_log.html


> [!TIP]
> The Indexer Lambda's logging is quite verbose. To reduce the associated
> CloudWatch costs, you can disable some lower level logs by setting the
> `RUST_LOG` environment variable to `info,quickwit_actors=warn`, or disable
> INFO logs altogether by setting `RUST_LOG=warn`.
Indexer only:
| Variable | Description | Default |
|---|---|---|
Expand Down Expand Up @@ -151,12 +157,18 @@ You can query and visualize the Quickwit Searcher Lambda from Grafana by using t

#### Configure Grafana data source

You need to provide the following information.
If you don't have a Grafana instance running yet, you can start one with the Quickwit plugin installed using Docker:

```bash
docker run -e GF_INSTALL_PLUGINS="quickwit-quickwit-datasource" -p 3000:3000 grafana/grafana
```

In the `Connections > Data sources` page, add a new Quickwit data source and configure the following settings:

|Variable|Description|Example|
|--|--|--|
|HTTP URL| HTTP search endpoint for Quickwit Searcher Lambda | https://*******.execute-api.us-east-1.amazonaws.com/api/v1 |
|Custom HTTP Headers| If you configure API Gateway to require an API key, set `x-api-key` HTTP Header | Header: `x-api-key` <br> Value: API key value|
|Index ID| Same as `QW_LAMBDA_INDEX_ID` | hdfs-logs |

After entering these values, click "Save & test" and you can now query your Quickwit Lambda from Grafana!
After entering these values, click "Save & test". You can now query your Quickwit Lambda from Grafana!
6 changes: 4 additions & 2 deletions distribution/lambda/cdk/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,14 +320,16 @@ def _clean_s3_bucket(bucket_name: str, prefix: str = ""):
print(f"Cleaning up bucket {bucket_name}/{prefix}...")
s3 = session.resource("s3")
bucket = s3.Bucket(bucket_name)
bucket.objects.filter(Prefix=prefix).delete()
try:
bucket.objects.filter(Prefix=prefix).delete()
except s3.meta.client.exceptions.NoSuchBucket:
print(f"Bucket {bucket_name} not found, skipping cleanup")


def empty_hdfs_bucket():
bucket_name = _get_cloudformation_output_value(
app.HDFS_STACK_NAME, hdfs_stack.INDEX_STORE_BUCKET_NAME_EXPORT_NAME
)

_clean_s3_bucket(bucket_name)


Expand Down
7 changes: 6 additions & 1 deletion distribution/lambda/cdk/stacks/examples/mock_data_stack.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,12 @@ def __init__(
index_id=index_id,
index_config_bucket=index_config.s3_bucket_name,
index_config_key=index_config.s3_object_key,
indexer_environment=lambda_env,
indexer_environment={
# the actor system is very verbose when the source is shutting
# down (each Lambda invocation)
"RUST_LOG": "info,quickwit_actors=warn",
**lambda_env,
},
searcher_environment=lambda_env,
indexer_package_location=indexer_package_location,
searcher_package_location=searcher_package_location,
Expand Down
4 changes: 2 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ networks:

services:
localstack:
image: localstack/localstack:${LOCALSTACK_VERSION:-2.3.2}
image: localstack/localstack:${LOCALSTACK_VERSION:-3.5.0}
container_name: localstack
ports:
- "${MAP_HOST_LOCALSTACK:-127.0.0.1}:4566:4566"
Expand All @@ -37,7 +37,7 @@ services:
- all
- localstack
environment:
SERVICES: kinesis,s3
SERVICES: kinesis,s3,sqs
PERSISTENCE: 1
volumes:
- .localstack:/etc/localstack/init/ready.d
Expand Down
134 changes: 134 additions & 0 deletions docs/assets/sqs-file-source.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
terraform {
required_version = "1.7.5"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.39.1"
}
}
}

provider "aws" {
region = "us-east-1"
default_tags {
tags = {
provisioner = "terraform"
author = "Quickwit"
}
}
}

locals {
sqs_notification_queue_name = "qw-tuto-s3-event-notifications"
source_bucket_name = "qw-tuto-source-bucket"
}

resource "aws_s3_bucket" "file_source" {
bucket_prefix = local.source_bucket_name
force_destroy = true
}

data "aws_iam_policy_document" "sqs_notification" {
statement {
effect = "Allow"

principals {
type = "*"
identifiers = ["*"]
}

actions = ["sqs:SendMessage"]
resources = ["arn:aws:sqs:*:*:${local.sqs_notification_queue_name}"]

condition {
test = "ArnEquals"
variable = "aws:SourceArn"
values = [aws_s3_bucket.file_source.arn]
}
}
}


resource "aws_sqs_queue" "s3_events" {
name = local.sqs_notification_queue_name
policy = data.aws_iam_policy_document.sqs_notification.json

redrive_policy = jsonencode({
deadLetterTargetArn = aws_sqs_queue.s3_events_deadletter.arn
maxReceiveCount = 5
})
}

resource "aws_sqs_queue" "s3_events_deadletter" {
name = "${locals.sqs_notification_queue_name}-deadletter"
}

resource "aws_sqs_queue_redrive_allow_policy" "s3_events_deadletter" {
queue_url = aws_sqs_queue.s3_events_deadletter.id

redrive_allow_policy = jsonencode({
redrivePermission = "byQueue",
sourceQueueArns = [aws_sqs_queue.s3_events.arn]
})
}

resource "aws_s3_bucket_notification" "bucket_notification" {
bucket = aws_s3_bucket.file_source.id

queue {
queue_arn = aws_sqs_queue.s3_events.arn
events = ["s3:ObjectCreated:*"]
}
}

data "aws_iam_policy_document" "quickwit_node" {
statement {
effect = "Allow"
actions = [
"sqs:ReceiveMessage",
"sqs:DeleteMessage",
"sqs:ChangeMessageVisibility",
"sqs:GetQueueAttributes",
]
resources = [aws_sqs_queue.s3_events.arn]
}
statement {
effect = "Allow"
actions = ["s3:GetObject"]
resources = ["${aws_s3_bucket.file_source.arn}/*"]
}
}

resource "aws_iam_user" "quickwit_node" {
name = "quickwit-filesource-tutorial"
path = "/system/"
}

resource "aws_iam_user_policy" "quickwit_node" {
name = "quickwit-filesource-tutorial"
user = aws_iam_user.quickwit_node.name
policy = data.aws_iam_policy_document.quickwit_node.json
}

resource "aws_iam_access_key" "quickwit_node" {
user = aws_iam_user.quickwit_node.name
}

output "source_bucket_name" {
value = aws_s3_bucket.file_source.bucket

}

output "notification_queue_url" {
value = aws_sqs_queue.s3_events.id
}

output "quickwit_node_access_key_id" {
value = aws_iam_access_key.quickwit_node.id
sensitive = true
}

output "quickwit_node_secret_access_key" {
value = aws_iam_access_key.quickwit_node.secret
sensitive = true
}
55 changes: 51 additions & 4 deletions docs/configuration/source-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,62 @@ The source type designates the kind of source being configured. As of version 0.

The source parameters indicate how to connect to a data store and are specific to the source type.

### File source (CLI only)
### File source

A file source reads data from a local file. The file must consist of JSON objects separated by a newline (NDJSON).
As of version 0.5, a file source can only be ingested with the [CLI command](/docs/reference/cli.md#tool-local-ingest). Compressed files (bz2, gzip, ...) and remote files (Amazon S3, HTTP, ...) are not supported.
A file source reads data from files containing JSON objects separated by newlines (NDJSON). Gzip compression is supported provided that the file name ends with the `.gz` suffix.

#### Ingest a single file (CLI only)

To ingest a specific file, run the indexing directly in an adhoc CLI process with:

```bash
./quickwit tool local-ingest --index <index> --input-path <input-path>
```

Both local and object files are supported, provided that the environment is configured with the appropriate permissions. A tutorial is available [here](/docs/ingest-data/ingest-local-file.md).

#### Notification based file ingestion (beta)

Quickwit can automatically ingest all new files that are uploaded to an S3 bucket. This requires creating and configuring an [SQS notification queue](https://docs.aws.amazon.com/AmazonS3/latest/userguide/ways-to-add-notification-config-to-bucket.html). A complete example can be found [in this tutorial](/docs/ingest-data/sqs-files.md).


The `notifications` parameter takes an array of notification settings. Currently one notifier can be configured per source and only the SQS notification `type` is supported.

Required fields for the SQS `notifications` parameter items:
- `type`: `sqs`
- `queue_url`: complete URL of the SQS queue (e.g `https://sqs.us-east-1.amazonaws.com/123456789012/queue-name`)
- `message_type`: format of the message payload, either
- `s3_notification`: an [S3 event notification](https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventNotifications.html)
- `raw_uri`: a message containing just the file object URI (e.g. `s3://mybucket/mykey`)

*Adding a file source with SQS notifications to an index with the [CLI](../reference/cli.md#source)*

```bash
./quickwit tool local-ingest --input-path <INPUT_PATH>
cat << EOF > source-config.yaml
version: 0.8
source_id: my-sqs-file-source
source_type: file
num_pipelines: 2
params:
notifications:
- type: sqs
queue_url: https://sqs.us-east-1.amazonaws.com/123456789012/queue-name
message_type: s3_notification
EOF
./quickwit source create --index my-index --source-config source-config.yaml
```

:::note

- Quickwit does not automatically delete the source files after a successful ingestion. You can use [S3 object expiration](https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-expire-general-considerations.html) to configure how long they should be retained in the bucket.
- Configure the notification to only forward events of type `s3:ObjectCreated:*`. Other events are acknowledged by the source without further processing and an warning is logged.
- We strongly recommend using a [dead letter queue](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html) to receive all messages that couldn't be processed by the file source. A `maxReceiveCount` of 5 is a good default value. Here are some common situations where the notification message ends up in the dead letter queue:
- the notification message could not be parsed (e.g it is not a valid S3 notification)
- the file was not found
- the file is corrupted (e.g unexpected compression)

:::

### Ingest API source

An ingest API source reads data from the [Ingest API](/docs/reference/rest-api.md#ingest-data-into-an-index). This source is automatically created at the index creation and cannot be deleted nor disabled.
Expand Down
4 changes: 4 additions & 0 deletions docs/deployment/kubernetes/gke.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,10 @@ image:
pullPolicy: Always
tag: edge

serviceAccount:
create: false
name: quickwit-sa

config:
default_index_root_uri: gs://{BUCKET}/qw-indexes
metastore_uri: gs://{BUCKET}/qw-indexes
Expand Down
2 changes: 1 addition & 1 deletion docs/get-started/tutorials/tutorial-hdfs-logs.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ curl -o hdfs_logs_index_config.yaml https://raw.githubusercontent.com/quickwit-o
The index config defines five fields: `timestamp`, `tenant_id`, `severity_text`, `body`, and one JSON field
for the nested values `resource.service`, we could use an object field here and maintain a fixed schema, but for convenience we're going to use a JSON field.
It also sets the `default_search_fields`, the `tag_fields`, and the `timestamp_field`.
The `timestamp_field` and `tag_fields` are used by Quickwit for [splits pruning](../../overview/architecture) at query time to boost search speed.
The `timestamp_field` and `tag_fields` are used by Quickwit for [splits pruning](../../overview/concepts/querying.md#time-sharding) at query time to boost search speed.
Check out the [index config docs](../../configuration/index-config) for more details.

```yaml title="hdfs-logs-index.yaml"
Expand Down
6 changes: 6 additions & 0 deletions docs/ingest-data/ingest-local-file.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,12 @@ Clearing local cache directory...
✔ Documents successfully indexed.
```

:::tip

Object store URIs like `s3://mybucket/mykey.json` are also supported as `--input-path`, provided that your environment is configured with the appropriate permissions.

:::

## Tear down resources (optional)

That's it! You can now tear down the resources you created. You can do so by running the following command:
Expand Down
Loading

0 comments on commit c6b488d

Please sign in to comment.