Problem

We are building a search bar that lets people do fuzzy search on different Konnect entities (services, routes, nodes). you're in charge of creating the backend ingest to power that service built on top of a CDC stream generated by Debezium

We have provided a jsonl file containing some sample events that can be used to simulate input stream.

Below are the tasks we want you to complete.

develop a program that ingests the sample cdc events into a Kafka topic
develop a program that persists the data from Kafka into Opensearch

Get started

Run

docker-compose up -d

to start a Kakfa cluster.

The cluster is accessible locally at localhost:9092 or kafka:29092 for services running inside the container network.

You can also access Kafka-UI at localhost:8080 to examine the ingested Kafka messages.

Opensearch is accessible locally at localhost:9200 or opensearch-node:9200 for services running inside the container network.

You can validate Opensearch is working by running sample queries

Insert

curl -X PUT localhost:9200/cdc/_doc/1 -H "Content-Type: application/json" -d '{"foo": "bar"}'
{"_index":"cdc","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}%

Search

curl localhost:9200/cdc/_search  | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   223  100   223    0     0  41527      0 --:--:-- --:--:-- --:--:-- 44600
{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "cdc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "foo": "bar"
                }
            }
        ]
    }
}

Run

docker-compose down

to tear down all the services.

Resources

stream.jsonl contains cdc events that need to be ingested
docker-compose.yaml contains the skeleton services to help you get started

Steps to run producers and consumers

Note - it could be either "docker-compose" or "docker compose" depending on the docker version you are using. Old docker versions needs docker-compose separately whereas with new versions, docker compose is integrated with docker.

Also multiple kafka topics are used for different konnect entities since different entities have different schema.

docker compose up -d

mvn install

mvn exec:java -Dexec.mainClass="org.konnect.IngestExerciseProducer"

mvn exec:java -Dexec.mainClass="org.konnect.IngestExerciseConsumer"

docker compose down

Some working curls for opensearch (after consumer has processed events successfully)

curl --location 'localhost:9200/cdc/_search?pretty=null&size=100' \
--header 'Content-Type: application/json'

curl --location --request GET 'http://localhost:9200/cdc/_search?pretty=null&size=100' \
--header 'Content-Type: application/json' \
--data '{
  "query": {
    "wildcard": {
      "host": {
        "value": "*cypress*"
      }
    }
  }
}'

curl --location --request GET 'http://localhost:9200/cdc/_search?pretty=null&size=100' \
--header 'Content-Type: application/json' \
--data '{
  "query": {
    "match": {
      "konnect_entity": {
        "query": "node",
        "fuzziness": "AUTO"  
      }
    }
  }
}'

curl --location --request DELETE 'localhost:9200/cdc' \
--header 'Content-Type: application/json'

Important Points that could be addressed as enhancement

We could produce messages to kafka in batch instead one by one
Docker compose updates to run producer and consumer.
Which fields to be kept searchable in open search schema ? Right now we are utilizing opensearch default mappings to make fields searchable, but we can explicitly change those mappings by defining some schema for opensearch index.
Add unit test cases too.
At consumer side, we have a map to store (entity id, updated_at of last event processed) which is used to handle out of order updates. Ideally this map data could be in some distributed key,value db like Redis.
We could use spring consumer which can have auto retry with backoff.

Understanding sample events schema and pattern

(fuzzy search on different Konnect entities (services, routes, nodes)) there are create/update events for different type of entities. No delete event in the sample

total events - 726
8 events for cluster which have key like -> c/_global/o/cluster/f24150e5-4781-4d70-9350-aaa7700ee9c3 -> guid at last is cluster id. The events are for create and update both for a cluster. Ex - cluster with guid 4c75f4f6-ca71-44a9-80ca-e96f6c412b24 has create and update event both.
15 events for service - NEED TO CONSIDER
554 events for node - NEED TO CONSIDER
6 events for node-status - DONT THINK THIS IS NEEDED, BUT DOUBLE CHECK
2 events for sni
2 events for target
14 events for route - NEED TO CONSIDER
86 events for store_event
1 event for key
5 event for vault
13 event for hash
5 event for upstream
2 event for composite-status
5 event for consumer_group
8 events for consumer

Count of events to be considered = 15+554+14 = 583

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.idea		.idea
configuration		configuration
src/main		src/main
README.md		README.md
docker-compose.yaml		docker-compose.yaml
pom.xml		pom.xml
stream.jsonl		stream.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem

Get started

Resources

Steps to run producers and consumers

Some working curls for opensearch (after consumer has processed events successfully)

Important Points that could be addressed as enhancement

Understanding sample events schema and pattern

About

Releases

Packages

Languages

amanpwl92/konnect-team-interview-ingest-exercise

Folders and files

Latest commit

History

Repository files navigation

Problem

Get started

Resources

Steps to run producers and consumers

Some working curls for opensearch (after consumer has processed events successfully)

Important Points that could be addressed as enhancement

Understanding sample events schema and pattern

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages