Skip to content

amanpwl92/konnect-team-interview-ingest-exercise

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Problem

We are building a search bar that lets people do fuzzy search on different Konnect entities (services, routes, nodes). you're in charge of creating the backend ingest to power that service built on top of a CDC stream generated by Debezium

We have provided a jsonl file containing some sample events that can be used to simulate input stream.

Below are the tasks we want you to complete.

  • develop a program that ingests the sample cdc events into a Kafka topic
  • develop a program that persists the data from Kafka into Opensearch

Get started

Run

docker-compose up -d

to start a Kakfa cluster.

The cluster is accessible locally at localhost:9092 or kafka:29092 for services running inside the container network.

You can also access Kafka-UI at localhost:8080 to examine the ingested Kafka messages.

Opensearch is accessible locally at localhost:9200 or opensearch-node:9200 for services running inside the container network.

You can validate Opensearch is working by running sample queries

Insert

curl -X PUT localhost:9200/cdc/_doc/1 -H "Content-Type: application/json" -d '{"foo": "bar"}'
{"_index":"cdc","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}%

Search

curl localhost:9200/cdc/_search  | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   223  100   223    0     0  41527      0 --:--:-- --:--:-- --:--:-- 44600
{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "cdc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "foo": "bar"
                }
            }
        ]
    }
}

Run

docker-compose down

to tear down all the services.

Resources

  • stream.jsonl contains cdc events that need to be ingested
  • docker-compose.yaml contains the skeleton services to help you get started

Steps to run producers and consumers

Note - it could be either "docker-compose" or "docker compose" depending on the docker version you are using. Old docker versions needs docker-compose separately whereas with new versions, docker compose is integrated with docker.

Also multiple kafka topics are used for different konnect entities since different entities have different schema.

docker compose up -d

mvn install

mvn exec:java -Dexec.mainClass="org.konnect.IngestExerciseProducer"

mvn exec:java -Dexec.mainClass="org.konnect.IngestExerciseConsumer"

docker compose down

Some working curls for opensearch (after consumer has processed events successfully)

curl --location 'localhost:9200/cdc/_search?pretty=null&size=100' \
--header 'Content-Type: application/json'
curl --location --request GET 'http://localhost:9200/cdc/_search?pretty=null&size=100' \
--header 'Content-Type: application/json' \
--data '{
  "query": {
    "wildcard": {
      "host": {
        "value": "*cypress*"
      }
    }
  }
}'
curl --location --request GET 'http://localhost:9200/cdc/_search?pretty=null&size=100' \
--header 'Content-Type: application/json' \
--data '{
  "query": {
    "match": {
      "konnect_entity": {
        "query": "node",
        "fuzziness": "AUTO"  
      }
    }
  }
}'
curl --location --request DELETE 'localhost:9200/cdc' \
--header 'Content-Type: application/json'

Important Points that could be addressed as enhancement

  1. We could produce messages to kafka in batch instead one by one
  2. Docker compose updates to run producer and consumer.
  3. Which fields to be kept searchable in open search schema ? Right now we are utilizing opensearch default mappings to make fields searchable, but we can explicitly change those mappings by defining some schema for opensearch index.
  4. Add unit test cases too.
  5. At consumer side, we have a map to store (entity id, updated_at of last event processed) which is used to handle out of order updates. Ideally this map data could be in some distributed key,value db like Redis.
  6. We could use spring consumer which can have auto retry with backoff.

Understanding sample events schema and pattern

(fuzzy search on different Konnect entities (services, routes, nodes)) there are create/update events for different type of entities. No delete event in the sample

  1. total events - 726
  2. 8 events for cluster which have key like -> c/_global/o/cluster/f24150e5-4781-4d70-9350-aaa7700ee9c3 -> guid at last is cluster id. The events are for create and update both for a cluster. Ex - cluster with guid 4c75f4f6-ca71-44a9-80ca-e96f6c412b24 has create and update event both.
  3. 15 events for service - NEED TO CONSIDER
  4. 554 events for node - NEED TO CONSIDER
  5. 6 events for node-status - DONT THINK THIS IS NEEDED, BUT DOUBLE CHECK
  6. 2 events for sni
  7. 2 events for target
  8. 14 events for route - NEED TO CONSIDER
  9. 86 events for store_event
  10. 1 event for key
  11. 5 event for vault
  12. 13 event for hash
  13. 5 event for upstream
  14. 2 event for composite-status
  15. 5 event for consumer_group
  16. 8 events for consumer

Count of events to be considered = 15+554+14 = 583

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages