Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
**en/
104 changes: 34 additions & 70 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,86 +1,50 @@
# Backend Engineering Challenge
# Backend Engineering Challenge -- Moving Average Calculator

## Overview

Welcome to our Engineering Challenge repository 🖖
This Python script calculates the moving average delivery time based on events read from a JSON file. The moving average is computed over a specified time window.

If you found this repository it probably means that you are participating in our recruitment process. Thank you for your time and energy. If that's not the case please take a look at our [openings](https://unbabel.com/careers/) and apply!
## Requirements

Please fork this repo before you start working on the challenge, read it careful and take your time and think about the solution. Also, please fork this repository because we will evaluate the code on the fork.
* Python 3.x
* Dependencies (install via `pip install -r requirements.txt`)

This is an opportunity for us both to work together and get to know each other in a more technical way. If you have any questions please open and issue and we'll reach out to help.
## Installation

Good luck!

## Challenge Scenario

At Unbabel we deal with a lot of translation data. One of the metrics we use for our clients' SLAs is the delivery time of a translation.

In the context of this problem, and to keep things simple, our translation flow is going to be modeled as only one event.

### *translation_delivered*

Example:

```json
{
"timestamp": "2018-12-26 18:12:19.903159",
"translation_id": "5aa5b2f39f7254a75aa4",
"source_language": "en",
"target_language": "fr",
"client_name": "airliberty",
"event_name": "translation_delivered",
"duration": 20,
"nr_words": 100
}
1. Clone the repository:
```bash
git clone https://github.com/thisIsMailson/moving-average-calculator.git
cd moving-average-calculator
```

## Challenge Objective

Your mission is to build a simple command line application that parses a stream of events and produces an aggregated output. In this case, we're interested in calculating, for every minute, a moving average of the translation delivery time for the last X minutes.

If we want to count, for each minute, the moving average delivery time of all translations for the past 10 minutes we would call your application like (feel free to name it anything you like!).

unbabel_cli --input_file events.json --window_size 10

The input file format would be something like:

{"timestamp": "2018-12-26 18:11:08.509654","translation_id": "5aa5b2f39f7254a75aa5","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 20}
{"timestamp": "2018-12-26 18:15:19.903159","translation_id": "5aa5b2f39f7254a75aa4","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 31}
{"timestamp": "2018-12-26 18:23:19.903159","translation_id": "5aa5b2f39f7254a75bb3","source_language": "en","target_language": "fr","client_name": "taxi-eats","event_name": "translation_delivered","nr_words": 100, "duration": 54}

Assume that the lines in the input are ordered by the `timestamp` key, from lower (oldest) to higher values, just like in the example input above.

The output file would be something in the following format.

2. Install dependencies
```bash
pip install -r requirements.txt
```
{"date": "2018-12-26 18:11:00", "average_delivery_time": 0}
{"date": "2018-12-26 18:12:00", "average_delivery_time": 20}
{"date": "2018-12-26 18:13:00", "average_delivery_time": 20}
{"date": "2018-12-26 18:14:00", "average_delivery_time": 20}
{"date": "2018-12-26 18:15:00", "average_delivery_time": 20}
{"date": "2018-12-26 18:16:00", "average_delivery_time": 25.5}
{"date": "2018-12-26 18:17:00", "average_delivery_time": 25.5}
{"date": "2018-12-26 18:18:00", "average_delivery_time": 25.5}
{"date": "2018-12-26 18:19:00", "average_delivery_time": 25.5}
{"date": "2018-12-26 18:20:00", "average_delivery_time": 25.5}
{"date": "2018-12-26 18:21:00", "average_delivery_time": 25.5}
{"date": "2018-12-26 18:22:00", "average_delivery_time": 31}
{"date": "2018-12-26 18:23:00", "average_delivery_time": 31}
{"date": "2018-12-26 18:24:00", "average_delivery_time": 42.5}

# Usage
The code to calculate the moving average of an event resides inside the **main.py** file.
To calculate the moving average delivery time, run the script with the input JSON file and window size. Example:
```bash
python3 main.py --input_file=input.json --window_size=10
```
* input_file: Path to the input JSON file.
* window_size: Size of the time window for the moving average.

#### Notes
The results will be saved to an output file.

Before jumping right into implementation we advise you to think about the solution first. We will evaluate, not only if your solution works but also the following aspects:
# Running Tests
The code to calculate the moving average of an event resides inside the **events_test.py** file.
```bash
python -m unittest events_test.py
```

+ Simple and easy to read code. Remember that [simple is not easy](https://www.infoq.com/presentations/Simple-Made-Easy)
+ Comment your code. The easier it is to understand the complex parts, the faster and more positive the feedback will be
+ Consider the optimizations you can do, given the order of the input lines
+ Include a README.md that briefly describes how to build and run your code, as well as how to **test it**
+ Be consistent in your code.
## Sample Data

Feel free to, in your solution, include some your considerations while doing this challenge. We want you to solve this challenge in the language you feel most comfortable with. Our machines run Python (3.7.x or higher) or Go (1.16.x or higher). If you are thinking of using any other programming language please reach out to us first 🙏.
For testing purposes, you can use the provided sample JSON file sample_data.json.

Also, if you have any problem please **open an issue**.
# File Structure

Good luck and may the force be with you
* calculate_moving_average.py: Main script for calculating the moving average.
* test_calculate_moving_average.py: Test cases for the script.
* sample_data.json: Sample input data for testing.
Binary file added __pycache__/events_test.cpython-37.pyc
Binary file not shown.
Binary file added __pycache__/events_tests.cpython-37.pyc
Binary file not shown.
Binary file added __pycache__/main.cpython-37.pyc
Binary file not shown.
3 changes: 3 additions & 0 deletions events.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{"timestamp": "2018-12-26 18:11:08.509654","translation_id": "5aa5b2f39f7254a75aa5","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 20}
{"timestamp": "2018-12-26 18:15:19.903159","translation_id": "5aa5b2f39f7254a75aa4","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 31}
{"timestamp": "2018-12-26 18:23:19.903159","translation_id": "5aa5b2f39f7254a75bb3","source_language": "en","target_language": "fr","client_name": "taxi-eats","event_name": "translation_delivered","nr_words": 100, "duration": 54}
34 changes: 34 additions & 0 deletions events_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import unittest
from unittest.mock import patch
from main import calculate_moving_average, save_to_file
import json
import os
class EventsTests(unittest.TestCase):
@patch('builtins.print') # Mock the print function to capture output
def test_calculate_moving_average(self, mock_print):
# Prepare test data
input_file = 'test_input.json'
window_size = 5

# Mocking events for testing
test_events = [
{"timestamp": "2022-01-01 12:00:00.000", "duration": 10},
{"timestamp": "2022-01-01 12:05:00.000", "duration": 20},
{"timestamp": "2022-01-01 12:07:00.000", "duration": 30},
]

with patch('builtins.open', create=True) as mock_open:
# Mocking the file read to return test_events
mock_open.return_value.__enter__.return_value.read.return_value = json.dumps(test_events)

# Function to test
calculate_moving_average(input_file, window_size)

# Assertions based on the expected output
mock_print.assert_called_with({"date": "2022-01-01 12:00:00", "average_delivery_time": 10.0})
mock_print.assert_called_with({"date": "2022-01-01 12:05:00", "average_delivery_time": 20.0})
mock_print.assert_called_with({"date": "2022-01-01 12:07:00", "average_delivery_time": 30.0})


if __name__ == '__main__':
unittest.main()
112 changes: 112 additions & 0 deletions main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
from typing import List, Dict, Union
import argparse
import json
from datetime import datetime, timedelta

def read_events_from_file(input_file: str) -> List[Dict]:
"""
Read events from a JSON file and return a list of events.

Parameters:
input_file (str): Path to the input JSON file.

Returns:
list: List of event dictionaries.
"""
events = []
if input_file:
with open(input_file, 'r') as file:
for line in file:
yield json.loads(line)
return events

def remove_old_events(event_queue: List[tuple], timestamp: datetime, window_size: int) -> None:
"""
Remove events outside the current time window from the event queue.

Parameters:
event_queue (list): List of tuples containing (timestamp, duration).
timestamp (datetime): Current event timestamp.
window_size (int): Size of the time window for moving average.
"""
while event_queue and timestamp - event_queue[0][0] > timedelta(minutes=window_size):
event_queue.pop(0)

def filter_events_within_window(event_queue: List[tuple], window_start_time: datetime, current_time: datetime) -> List[tuple]:
"""
Filter events within the current time window.

Parameters:
event_queue (list): List of tuples containing (timestamp, duration).
window_start_time (datetime): Start time of the current window.
current_time (datetime): Current time.

Returns:
list: List of tuples containing (timestamp, duration) within the window.
"""

return [(time, duration) for time, duration in event_queue if window_start_time <= time <= current_time]

def calculate_moving_average(input_file: str, window_size: int) -> None:
"""
Calculate moving average delivery time.

Parameters:
input_file (str): Path to the input JSON file.
window_size (int): Size of the time window for moving average.
"""
event_queue: List[tuple] = []
average_delivery_times: List[Dict[str, Union[str, float]]] = []

events = read_events_from_file(input_file)

for event in events:
timestamp = datetime.strptime(event['timestamp'], '%Y-%m-%d %H:%M:%S.%f')
timestamp = timestamp.replace(second=0, microsecond=0)
duration = event['duration']
event_queue.append((timestamp, duration))

remove_old_events(event_queue, timestamp, window_size)

current_time = timestamp
window_start_time = current_time - timedelta(minutes=window_size)

while event_queue and event_queue[0][0] <= current_time:
# Filter events within the current time window [current_minute - window_size, current_minute]
events_within_window = filter_events_within_window(event_queue, window_start_time, current_time)

if events_within_window:
moving_average = round(sum(duration for _, duration in events_within_window) / len(events_within_window), 2)
else:
moving_average = 0

average_delivery_times.append({"date": current_time.strftime('%Y-%m-%d %H:%M:%S'), "average_delivery_time": moving_average})

current_time -= timedelta(minutes=1)
window_start_time = current_time - timedelta(minutes=window_size)

save_to_file(average_delivery_times)

def save_to_file(average_time: List[Dict[str, Union[str, float]]], output_file: str = 'output.json') -> None:
"""
Save moving average delivery times to file.

Parameters:
average_time (list): List of dictionaries containing date and average delivery time.
output_file (str): Path to the output JSON file. Default is 'output.json'.
"""
average_time.sort(key=lambda x: x["date"])

with open(output_file, 'w') as file:
json.dump(average_time, file, indent=2)

print(f"Moving average delivery times saved to {output_file}")


if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Calculate moving average delivery time.')
parser.add_argument('--input_file', type=str, help='Path to the input JSON file')
parser.add_argument('--window_size', type=int, help='Size of the time window for moving average')
args = parser.parse_args()
input_file, window_size = args.input_file, args.window_size
calculate_moving_average(input_file=input_file, window_size=window_size)
62 changes: 62 additions & 0 deletions output.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
[
{
"date": "2018-12-26 18:11:00",
"average_delivery_time": 20.0
},
{
"date": "2018-12-26 18:11:00",
"average_delivery_time": 20.0
},
{
"date": "2018-12-26 18:12:00",
"average_delivery_time": 20.0
},
{
"date": "2018-12-26 18:13:00",
"average_delivery_time": 20.0
},
{
"date": "2018-12-26 18:14:00",
"average_delivery_time": 20.0
},
{
"date": "2018-12-26 18:15:00",
"average_delivery_time": 25.5
},
{
"date": "2018-12-26 18:15:00",
"average_delivery_time": 31.0
},
{
"date": "2018-12-26 18:16:00",
"average_delivery_time": 31.0
},
{
"date": "2018-12-26 18:17:00",
"average_delivery_time": 31.0
},
{
"date": "2018-12-26 18:18:00",
"average_delivery_time": 31.0
},
{
"date": "2018-12-26 18:19:00",
"average_delivery_time": 31.0
},
{
"date": "2018-12-26 18:20:00",
"average_delivery_time": 31.0
},
{
"date": "2018-12-26 18:21:00",
"average_delivery_time": 31.0
},
{
"date": "2018-12-26 18:22:00",
"average_delivery_time": 31.0
},
{
"date": "2018-12-26 18:23:00",
"average_delivery_time": 42.5
}
]
4 changes: 4 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
argparse
json
typing
unittest
3 changes: 3 additions & 0 deletions test_input.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{"timestamp": "2018-12-26 18:11:08.509654","translation_id": "5aa5b2f39f7254a75aa5","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 20}
{"timestamp": "2018-12-26 18:15:19.903159","translation_id": "5aa5b2f39f7254a75aa4","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 31}
{"timestamp": "2018-12-26 18:23:19.903159","translation_id": "5aa5b2f39f7254a75bb3","source_language": "en","target_language": "fr","client_name": "taxi-eats","event_name": "translation_delivered","nr_words": 100, "duration": 54}
6 changes: 6 additions & 0 deletions test_output.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[
{
"date": "2022-01-01 12:00:00",
"average_delivery_time": 15.0
}
]