Welcome to my solution to Unababel's Backend Engineer Challenge 🖖
At Unbabel we deal with a lot of translation data. One of the metrics we use for our clients' SLAs is the delivery time of a translation.
In the context of this problem, and to keep things simple, our translation flow is going to be modeled as only one event.
Example:
{
"timestamp": "2018-12-26 18:12:19.903159",
"translation_id": "5aa5b2f39f7254a75aa4",
"source_language": "en",
"target_language": "fr",
"client_name": "airliberty",
"event_name": "translation_delivered",
"duration": 20,
"nr_words": 100
}
Your mission is to build a simple command line application that parses a stream of events and produces an aggregated output. In this case, we're interested in calculating, for every minute, a moving average of the translation delivery time for the last X minutes.
If we want to count, for each minute, the moving average delivery time of all translations for the past 10 minutes we would call your application like (feel free to name it anything you like!).
unbabel_cli --input_file events.json --window_size 10
The input file format would be something like:
{"timestamp": "2018-12-26 18:11:08.509654","translation_id": "5aa5b2f39f7254a75aa5","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 20}
{"timestamp": "2018-12-26 18:15:19.903159","translation_id": "5aa5b2f39f7254a75aa4","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 31}
{"timestamp": "2018-12-26 18:23:19.903159","translation_id": "5aa5b2f39f7254a75bb3","source_language": "en","target_language": "fr","client_name": "taxi-eats","event_name": "translation_delivered","nr_words": 100, "duration": 54}
Assume that the lines in the input are ordered by the timestamp
key, from lower (oldest) to higher values, just like in the example input above.
The output file would be something in the following format.
{"date": "2018-12-26 18:11:00", "average_delivery_time": 0}
{"date": "2018-12-26 18:12:00", "average_delivery_time": 20}
{"date": "2018-12-26 18:13:00", "average_delivery_time": 20}
{"date": "2018-12-26 18:14:00", "average_delivery_time": 20}
{"date": "2018-12-26 18:15:00", "average_delivery_time": 20}
{"date": "2018-12-26 18:16:00", "average_delivery_time": 25.5}
{"date": "2018-12-26 18:17:00", "average_delivery_time": 25.5}
{"date": "2018-12-26 18:18:00", "average_delivery_time": 25.5}
{"date": "2018-12-26 18:19:00", "average_delivery_time": 25.5}
{"date": "2018-12-26 18:20:00", "average_delivery_time": 25.5}
{"date": "2018-12-26 18:21:00", "average_delivery_time": 25.5}
{"date": "2018-12-26 18:22:00", "average_delivery_time": 31}
{"date": "2018-12-26 18:23:00", "average_delivery_time": 31}
{"date": "2018-12-26 18:24:00", "average_delivery_time": 42.5}
Make sure you have Go (1.16.x or higher), installed in your machine. Then, run the following command:
go build -o unbabel_cli
The repository comes with an events.json file, populated with the events described in the example above.
unbabel_cli --input_file=events.json --window_size=10 --output_file=aggregated_events.out.json
There are 3 flags available:
- --input_file → Path to events file. Defaults to "events.json".
- --window_size → The window size to calculate the moving average. Defaults to 10.
- --output_file → Path to output file. Defaults to "aggregated_events.out.json".
The application is divided into 3 packages: main, events and statistics.
To test the code, you can test each package individually.
To test the main package:
go test github.com/jmbds/unbabel-backend-engineering-challenge
To test the statistics package:
go test github.com/jmbds/unbabel-backend-engineering-challenge/internal/statistics
To test the events package:
go test github.com/jmbds/unbabel-backend-engineering-challenge/internal/events
Alternatively, you can run tests for the whole application, using the following command:
go test ./...