Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback #1

Open
wants to merge 10 commits into
base: feedback
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .env.sample
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Google Sheet Name
SHEET='Your Google Sheet Name'

# DB Credentials
DB_HOST='localhost'
DB_USER='postgres'
DB_PASS='password'
DB_NAME='superjoin'

# Configurations
# in seconds
UPDATE_INTERVAL=10
CONFLICT_PRIORITY="Sheet"
# Incase of conflict, which source to give priority to
# Options: "Sheet", "DB"
14 changes: 14 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Virtual Environment
venv

# Cache
__pycache__

# ENV files
.env

# Logs
*.log

# Credentials
service_account.json
43 changes: 37 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@

https://github.com/user-attachments/assets/2dec6755-1453-4a09-8f06-eac6295f4498
[![Review Assignment Due Date](https://classroom.github.com/assets/deadline-readme-button-22041afd0340ce965d47ae6ef1cefeee28c7c493a6346c4f15d667ab976d596c.svg)](https://classroom.github.com/a/AHFn7Vbn)
# Superjoin Hiring Assignment

### Welcome to Superjoin's hiring assignment! 🚀
Expand Down Expand Up @@ -44,11 +47,11 @@ Once you're done, make sure you **record a video** showing your project working.

We have a checklist at the bottom of this README file, which you should update as your progress with your assignment. It will help us evaluate your project.

- [ ] My code's working just fine! 🥳
- [ ] I have recorded a video showing it working and embedded it in the README ▶️
- [ ] I have tested all the normal working cases 😎
- [ ] I have even solved some edge cases (brownie points) 💪
- [ ] I added my very planned-out approach to the problem at the end of this README 📜
- [x] My code's working just fine! 🥳
- [x] I have recorded a video showing it working and embedded it in the README ▶️
- [x] I have tested all the normal working cases 😎
- [x] I have even solved some edge cases (brownie points) 💪
- [x] I added my very planned-out approach to the problem at the end of this README 📜

## Got Questions❓
Feel free to check the discussions tab, you might get some help there. Check out that tab before reaching out to us. Also, did you know, the internet is a great place to explore? 😛
Expand All @@ -58,4 +61,32 @@ We're available at techhiring@superjoin.ai for all queries.
All the best ✨.

## Developer's Section
*Add your video here, and your approach to the problem (optional). Leave some comments for us here if you want, we will be reading this :)*


https://github.com/user-attachments/assets/dbb5feca-342d-4cfa-b13f-9a0e2e2c1b6b

### Approach
1. Environment Setup:
- Load environment variables from a .env file.
2. Set up logging to track synchronization activities and errors.
3. Google Sheets Connection:
- Connect to the Google Sheets API using a service account.
- Open the specified Google Sheet.
4. Database Connection:
- Connect to the PostgreSQL database using psycopg2.
5. Synchronization Logic:
- Continuously monitor both the Google Sheet and the database for updates.
- Compare the last updated timestamps of both the Google Sheet and the database.
- Depending on which data source has been updated, synchronize the other:
- If the Google Sheet has been updated, update the database with the new data.
- If the database has been updated, update the Google Sheet with the new data.
- If both have been updated, resolve conflicts based on a predefined priority (e.g., prioritize Google Sheet updates).
6. Error Handling:
- Handle exceptions and log errors.
- Gracefully close database connections on errors or interruptions.
7. Triggers and Metadata:
- Use database triggers to maintain metadata and handle deletions.
- Ensure that deleted records are moved to a metadata table for tracking.
8. Periodic Updates:
- Use a loop with a sleep interval to periodically check for updates and synchronize data.

63 changes: 63 additions & 0 deletions db_commands.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# DB Commands

## Create a database
```sql
CREATE DATABASE superjoin;
```

## Create table that stores data
```sql
CREATE TABLE candidates (
id SERIAL PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
email VARCHAR(100),
phone VARCHAR(100),
last_updated TIMESTAMP DEFAULT timezone('utc', NOW())
);
```

## Create a metadata table to store the deleted records
```sql
CREATE TABLE deleted_candidates (
id SERIAL PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
email VARCHAR(100),
phone VARCHAR(100),
last_updated TIMESTAMP DEFAULT timezone('utc', NOW())
);
```

## Create a trigger to move the deleted records to the metadata table
```sql
CREATE OR REPLACE FUNCTION move_deleted_candidates()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO deleted_candidates (first_name, last_name, email, phone)
VALUES (OLD.first_name, OLD.last_name, OLD.email, OLD.phone);
RETURN OLD;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER move_deleted_candidates
AFTER DELETE ON candidates
FOR EACH ROW
EXECUTE FUNCTION move_deleted_candidates();
```

## Create a trigger to update the last_updated column
```sql
CREATE OR REPLACE FUNCTION update_last_updated()
RETURNS TRIGGER AS $$
BEGIN
NEW.last_updated = timezone('utc', NOW());
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER update_last_updated
BEFORE UPDATE ON candidates
FOR EACH ROW
EXECUTE FUNCTION update_last_updated();
```
187 changes: 187 additions & 0 deletions main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
import logging
import time
import traceback
from datetime import datetime, timezone
from os import getenv

import gspread
import psycopg2
from dotenv import load_dotenv

# Set up logging
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
handler = logging.FileHandler("logs.log", encoding='utf-8', mode='a')
handler.setFormatter(logging.Formatter('%(asctime)s:%(levelname)s: %(message)s'))
logger.addHandler(handler)

# Load environment variables
load_dotenv()
sheet = getenv("SHEET")
db_host = getenv("DB_HOST")
db_user = getenv("DB_USER")
db_pass = getenv("DB_PASS")
db_name = getenv("DB_NAME")
update_interval = int(getenv("UPDATE_INTERVAL"))
conflict_priority = getenv("CONFLICT_PRIORITY")

# Connect to the Google Sheets API
try:
gc = gspread.service_account(filename="service_account.json")
except FileNotFoundError:
print("Service account file not found")
exit(1)
except:
print("Error loading service account. Please ensure the service account file is valid")
exit(1)

try:
sh = gc.open(sheet)
except gspread.SpreadsheetNotFound:
print(f"Spreadsheet {sheet} not found. Please ensure the spreadsheet exists and the service account has access to it")
exit(1)

# Connect to the database
try:
conn = psycopg2.connect(
host=db_host,
user=db_user,
password=db_pass,
dbname=db_name
)
except psycopg2.OperationalError:
print("Error connecting to the database")
exit(1)

cur = conn.cursor()

# Last updated time of the sheet
last_sheet_update = sh_updated = datetime.strptime("1970-01-01 00:00:00", "%Y-%m-%d %H:%M:%S").replace(tzinfo=timezone.utc)
last_db_update = datetime.strptime("1970-01-01 00:00:00", "%Y-%m-%d %H:%M:%S").replace(tzinfo=timezone.utc)

def check_row(sh_row, db_row):
if sh_row['First Name'] != db_row[1]:
return False
if sh_row['Last Name'] != db_row[2]:
return False
if sh_row['Email'] != db_row[3]:
return False
if sh_row['Phone Number'] != db_row[4]:
return False
return True

# Start syncronisation
logger.info("Starting syncronisation")
try:
while True:
# Get the updated version of the sheet
sh = gc.open(sheet)
sh_updated = max(datetime.strptime(sh.get_lastUpdateTime(), "%Y-%m-%dT%H:%M:%S.%fZ").replace(tzinfo=timezone.utc), sh_updated)
print("Sheet updated at: ", sh_updated)

# Get the updated version of the database
db_updated = cur.execute("SELECT max(last_updated) FROM candidates")
db_updated = cur.fetchone()[0]

db_deleted = cur.execute("SELECT max(last_updated) FROM deleted_candidates")
db_deleted = cur.fetchone()[0]

if db_updated and db_deleted:
db_updated = max(db_updated, db_deleted)
elif db_deleted:
db_updated = db_deleted
db_updated = db_updated.replace(tzinfo=timezone.utc)
print("Database updated at: ", db_updated)

if sh_updated > last_sheet_update and last_db_update >= db_updated: # If only the sheet has been updated since the last update
logger.info("Sheet has been updated")
# Get the data from the sheet and the database
sh_data = sh.sheet1.get_all_records()
db_data = cur.execute("SELECT * FROM candidates order by id")
db_data = cur.fetchall()

for i,row in enumerate(sh_data):
if i < len(db_data) and not check_row(row, db_data[i]):
# Update the database
cur.execute("UPDATE candidates SET first_name = %s, last_name = %s, email = %s, phone = %s, last_updated = %s WHERE id = %s",
(row["First Name"], row["Last Name"], row["Email"], row["Phone Number"], sh_updated, i+1))
elif i >= len(db_data):
cur.execute("INSERT INTO candidates (first_name, last_name, email, phone, last_updated) VALUES (%s, %s, %s, %s, %s)",
(row["First Name"], row["Last Name"], row["Email"], row["Phone Number"], sh_updated))
if len(sh_data) < len(db_data):
for i in range(len(sh_data), len(db_data)):
cur.execute("DELETE FROM candidates WHERE id = %s", (db_data[i][0],))
cur.execute("DELETE FROM deleted_candidates WHERE id = %s", (db_data[i][0],)) # Delete the row from the deleted_candidates table
conn.commit()
db_updated = datetime.now(timezone.utc)

elif sh_updated <= last_sheet_update and last_db_update < db_updated: # If only the database has been updated since the last update
logger.info("Database has been updated")
# Get the data from the sheet and the database
sh_data = sh.sheet1.get_all_records()
db_data = cur.execute("SELECT * FROM candidates order by id")
db_data = cur.fetchall()

min_len = min(len(sh_data), len(db_data))
update_data = [[row[1], row[2], row[3], row[4]] for row in db_data[:min_len]]
sh.sheet1.update(update_data, f"A2:D{min_len+1}")
if len(sh_data) < len(db_data): # If there are more rows in the database than in the sheet
sh.sheet1.insert_rows([[row[1], row[2], row[3], row[4]] for row in db_data[min_len:]], min_len+2)
elif len(sh_data) > len(db_data):
sh.sheet1.delete_rows(len(db_data)+2, len(sh_data)-len(db_data))
db_updated = datetime.now(timezone.utc)

elif sh_updated > last_sheet_update and last_db_update < db_updated: # If both the sheet and the database have been updated since the last update
logger.info("Sheet and database have been updated")
if conflict_priority == "Sheet":
sh_data = sh.sheet1.get_all_records()
db_data = cur.execute("SELECT * FROM candidates order by id")
db_data = cur.fetchall()

for i,row in enumerate(sh_data):
if i < len(db_data) and not check_row(row, db_data[i]):
# Update the database
cur.execute("UPDATE candidates SET first_name = %s, last_name = %s, email = %s, phone = %s, last_updated = %s WHERE id = %s",
(row["First Name"], row["Last Name"], row["Email"], row["Phone Number"], sh_updated, i+1))
elif i >= len(db_data):
cur.execute("INSERT INTO candidates (first_name, last_name, email, phone, last_updated) VALUES (%s, %s, %s, %s, %s)",
(row["First Name"], row["Last Name"], row["Email"], row["Phone Number"], sh_updated))
if len(sh_data) < len(db_data):
for i in range(len(sh_data), len(db_data)):
cur.execute("DELETE FROM candidates WHERE id = %s", (db_data[i][0],))
cur.execute("DELETE FROM deleted_candidates WHERE id = %s", (db_data[i][0],)) # Delete the row from the deleted_candidates table
conn.commit()

else:
sh_data = sh.sheet1.get_all_records()
db_data = cur.execute("SELECT * FROM candidates order by id")
db_data = cur.fetchall()

min_len = min(len(sh_data), len(db_data))
update_data = [[row[1], row[2], row[3], row[4]] for row in db_data[:min_len]]
sh.sheet1.update(update_data, f"A2:D{min_len+1}")
if len(sh_data) < len(db_data):
sh.sheet1.insert_rows([[row[1], row[2], row[3], row[4]] for row in db_data[min_len:]], min_len+2)
elif len(sh_data) > len(db_data):
sh.sheet1.delete_rows(len(db_data)+2, len(sh_data)-len(db_data))

db_updated = datetime.now(timezone.utc)
sh_updated = db_updated

else: # No updates
pass

# Update the last updated times
last_sheet_update = sh_updated
last_db_update = db_updated

# Sleep for the update interval
time.sleep(update_interval)
except KeyboardInterrupt:
logger.info("Stopping syncronisation")
conn.close()
exit(0)
except Exception as e:
logger.error(f"An error occured: {e}\n{traceback.format_exc()}")
conn.close()
exit(1)