Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
9aee81a
docker deployment
Nikhil-Kudupudi Apr 14, 2025
e5571ae
update workflow alignment
Nikhil-Kudupudi Apr 14, 2025
0a676d4
update envs to single ,s
Nikhil-Kudupudi Apr 14, 2025
c3fc67d
divide the steps
Nikhil-Kudupudi Apr 14, 2025
45a2ded
update workflows
Nikhil-Kudupudi Apr 15, 2025
d86bb04
update alignment
Nikhil-Kudupudi Apr 15, 2025
909e2dc
add docker ignore to front end
Nikhil-Kudupudi Apr 15, 2025
99d0ea8
update condition for workflows
Nikhil-Kudupudi Apr 15, 2025
f32fef7
update mlflow
Nikhil-Kudupudi Apr 16, 2025
8c9aa36
revert frontend
Nikhil-Kudupudi Apr 16, 2025
0baa07d
check mlflow issue
Nikhil-Kudupudi Apr 16, 2025
26f4296
update file safe name for scraper files
Nikhil-Kudupudi Apr 16, 2025
e5766cb
add package
Nikhil-Kudupudi Apr 16, 2025
6b774c5
try mlflow fix of id tracking
Nikhil-Kudupudi Apr 16, 2025
0673431
second revision of mlflow
Nikhil-Kudupudi Apr 17, 2025
f0e2fab
fix alignment
Nikhil-Kudupudi Apr 17, 2025
2c9b534
add multiple urls scraping code
Nikhil-Kudupudi Apr 18, 2025
0e95868
Update backend-docker-image-build.yml
Nikhil-Kudupudi Apr 18, 2025
d464083
update prefect deployment flow
Nikhil-Kudupudi Apr 19, 2025
ae83ad5
update path
Nikhil-Kudupudi Apr 19, 2025
459e854
update prefect workflow
Nikhil-Kudupudi Apr 19, 2025
29e359c
add tgcloud run deploy command
Nikhil-Kudupudi Apr 19, 2025
560f113
temporarily disable path tracking
Nikhil-Kudupudi Apr 19, 2025
3274482
update
Nikhil-Kudupudi Apr 19, 2025
cba0fa7
update flow
Nikhil-Kudupudi Apr 19, 2025
330d92e
update credentials flow
Nikhil-Kudupudi Apr 19, 2025
879eec3
update image
Nikhil-Kudupudi Apr 19, 2025
8a8beb8
Update prefect_orchestraiton.yml
Nikhil-Kudupudi Apr 19, 2025
0cfaf6c
update prefect .yaml
Nikhil-Kudupudi Apr 19, 2025
88e18b5
update the latest command
Nikhil-Kudupudi Apr 19, 2025
340e488
update command
Nikhil-Kudupudi Apr 19, 2025
f9ceeac
update command for prefect deploy
Nikhil-Kudupudi Apr 19, 2025
9c048d5
create a new flow
Nikhil-Kudupudi Apr 19, 2025
203f0f5
update test changes
Nikhil-Kudupudi Apr 19, 2025
0b2ce8f
update url
Nikhil-Kudupudi Apr 19, 2025
21860c3
swap keys
Nikhil-Kudupudi Apr 19, 2025
845906f
Update prefect_orchestraiton.yml
Nikhil-Kudupudi Apr 19, 2025
4cea0fb
Update prefect_orchestraiton.yml
Nikhil-Kudupudi Apr 19, 2025
1abb4ed
Update prefect_orchestraiton.yml
Nikhil-Kudupudi Apr 19, 2025
3f707fa
Update prefect_orchestraiton.yml
Nikhil-Kudupudi Apr 19, 2025
5da0cc9
Update prefect_orchestraiton.yml
Nikhil-Kudupudi Apr 20, 2025
6d74ddb
update flow
Nikhil-Kudupudi Apr 20, 2025
197187c
Update prefect_orchestraiton.yml
Nikhil-Kudupudi Apr 20, 2025
bd0f1f2
update
Nikhil-Kudupudi Apr 20, 2025
82d4582
add prefect workspace name
Nikhil-Kudupudi Apr 20, 2025
1d8ac74
update test flow
Nikhil-Kudupudi Apr 20, 2025
771d9a5
test
Nikhil-Kudupudi Apr 20, 2025
5be4909
update test 2
Nikhil-Kudupudi Apr 20, 2025
7302626
cha
Nikhil-Kudupudi Apr 20, 2025
37a98c9
df
Nikhil-Kudupudi Apr 20, 2025
6806074
she
Nikhil-Kudupudi Apr 20, 2025
df2b3fe
up
Nikhil-Kudupudi Apr 20, 2025
3e4a9af
up
Nikhil-Kudupudi Apr 20, 2025
e62a933
rfg
Nikhil-Kudupudi Apr 20, 2025
95b2f5e
upgfvfg
Nikhil-Kudupudi Apr 20, 2025
c2a440b
sdfdf
Nikhil-Kudupudi Apr 20, 2025
763c444
fg
Nikhil-Kudupudi Apr 20, 2025
216ed97
cgh
Nikhil-Kudupudi Apr 20, 2025
8336579
gh
Nikhil-Kudupudi Apr 20, 2025
418a062
kd
Nikhil-Kudupudi Apr 20, 2025
2a770aa
hkkhjk
Nikhil-Kudupudi Apr 20, 2025
d94f95c
hg
Nikhil-Kudupudi Apr 20, 2025
f70154c
ld
Nikhil-Kudupudi Apr 20, 2025
05153d0
Merge remote-tracking branch 'origin/main' into docker-deployment
Nikhil-Kudupudi Apr 21, 2025
71aef46
Merge remote-tracking branch 'origin/main' into docker-deployment
Nikhil-Kudupudi Apr 21, 2025
9846764
update frontend
Nikhil-Kudupudi Apr 21, 2025
27aea12
Merge pull request #48 from Nikhil-Kudupudi/docker-deployment
Nikhil-Kudupudi Apr 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion .github/workflows/backend-docker-image-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ on:
push:
branches:
- "**"

paths:
- 'services/backend/**'
jobs:
backend_build:
runs-on: ubuntu-latest
Expand All @@ -31,3 +32,14 @@ jobs:
docker build -t $IMAGE .
docker push $IMAGE
cd ../..

- name: Deploy to Cloud Run
run: |
gcloud run deploy backend-service \
--source services/backend \
--region ${{ secrets.GCP_REGION }} \
--platform managed \
--allow-unauthenticated \
--memory 4Gi \
--timeout 3600s \
--set-env-vars "AIRFLOW_UID=5000,BASE_URL=https://www.khoury.northeastern.edu/,MAX_DEPTH=3,CONCURRENT_REQUESTS=10,DATA_FOLDER=scraped_data,MISTRAL_API_KEY=${{ secrets.MISTRAL_API_KEY }},MLFLOW_TRACKING_URI=${{ secrets.MLFLOW_TRACKING_URI }},BUCKET_NAME=${{ secrets.BUCKET_NAME }},RAW_DATA_FOLDER=raw_data,FAISS_INDEX_FOLDER=faiss_index,URLS_LIST= "https://www.khoury.northeastern.edu/""
13 changes: 13 additions & 0 deletions .github/workflows/frontend-docker-image-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ on:
push:
branches:
- "**"
paths:
- 'services/frontend/**'

jobs:
frontend_build:
Expand Down Expand Up @@ -31,3 +33,14 @@ jobs:
docker build -t $IMAGE .
docker push $IMAGE
cd ../..

- name: Deploy to Cloud Run
run: |
gcloud run deploy frontend-service \
--source services/frontend \
--region ${{ secrets.GCP_REGION }} \
--platform managed \
--allow-unauthenticated \
--memory 1Gi \
--timeout 1800s \
--set-env-vars "API_URL=https://backend-service-273412-default.run.app/NuBot/"
45 changes: 45 additions & 0 deletions .github/workflows/frontend1-docker-image-build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: "build_reactfrontned_image"

on:
push:
branches:
- "**"
paths:
- "services/frontend1/**"
jobs:
backend_build:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4

- name: GCP Authentication
uses: google-github-actions/auth@v2
with:
credentials_json: "${{ secrets.GCP_KEY }}"

- name: Setup gcloud SDK
uses: google-github-actions/setup-gcloud@v2

- name: Docker login for Artifact Registry
run: |
gcloud auth configure-docker ${{ secrets.GCP_ARTIFACT_REGISTRY_REGION }}-docker.pkg.dev

- name: Build and Push Backend Image
run: |
cd services/frontend1
IMAGE=${{ secrets.GCP_ARTIFACT_REGISTRY_REGION }}-docker.pkg.dev/${{ secrets.GCP_PROJECT_ID }}/backend-nubot/react-service:latest
docker build -t $IMAGE .
docker push $IMAGE
cd ../..

- name: Deploy to Cloud Run
run: |
gcloud run deploy react-service \
--source services/frontend1 \
--region ${{ secrets.GCP_REGION }} \
--platform managed \
--allow-unauthenticated \
--memory 1Gi \
--timeout 3600s \
--set-env-vars "REACT_APP_API_URL=${{secrets.REACT_APP_API_URL}}"
Empty file.
59 changes: 59 additions & 0 deletions .github/workflows/prefect_orchestraiton.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
name: Deploy Prefect Flow to Cloud Run

on:
push:
branches: ["**"] # Trigger on push to main (adjust as needed)

jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3

# Authenticate to Google Cloud using the service account JSON key
- name: GCP Authentication
uses: google-github-actions/auth@v2
with:
credentials_json: "${{ secrets.GCP_KEY }}"

- name: Setup gcloud SDK
uses: google-github-actions/setup-gcloud@v2

- name: Docker login for Artifact Registry
run: |
gcloud auth configure-docker ${{ secrets.GCP_ARTIFACT_REGISTRY_REGION }}-docker.pkg.dev

- name: Build Docker image
run: |
cd prefectWorkflows
IMAGE_URI=${{ secrets.GCP_ARTIFACT_REGISTRY_REGION }}-docker.pkg.dev/${{ secrets.GCP_PROJECT_ID }}/backend-nubot/prefect-scraper:latest
echo "Building image $IMAGE_URI"
docker build -t "$IMAGE_URI" .
# Note: The context is the repository root (.), adjust path to Dockerfile if needed.

- name: Push Docker image to Artifact Registry
run: |
IMAGE_URI=${{ secrets.GCP_ARTIFACT_REGISTRY_REGION }}-docker.pkg.dev/${{ secrets.GCP_PROJECT_ID }}/backend-nubot/prefect-scraper:latest
docker push "$IMAGE_URI"
# After this step, the image is available in Artifact Registry for Cloud Run to use.

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"

- name: Install Prefect3
run: pip install --no-cache-dir "prefect>=3.2.4"

- name: Deploy via prefect deploy
run: |
cd prefectWorkflows
prefect deploy -n scraper-cron-deployment # tell pool to use latest image

- name: Deploy Prefect flow
run: |
cd prefectWorkflows # navigate to the folder containing prefect.yaml
prefect deploy -n scraperflow-deployment
# The -n flag ensures we deploy the specific deployment by name (optional if only one deployment in YAML).
# This command reads prefect.yaml and registers/updates the deployment in Prefect Cloud.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,7 @@ mlflow-artifacts/
# PyPI configuration file
.pypirc

*.html


*.json
!package.json
3 changes: 2 additions & 1 deletion airflow/dags/dataflow/chunk_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@
from dataflow.store_data import upload_faiss_index_to_bucket
load_dotenv(override=True)
BUCKET_NAME= os.getenv('BUCKET_NAME')
GOOGLE_APPLICATION_CREDENTIALS=os.getenv('GOOGLE_APPLICATION_CREDENTIALS')
from google.auth import default
credentials, project = default()
RAW_DATA_FOLDER= os.getenv('RAW_DATA_FOLDER')
def chunk_data():
# Load all JSON files from a directory
Expand Down
3 changes: 2 additions & 1 deletion airflow/dags/dataflow/scraper.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
BASE_URL = os.getenv('BASE_URL')
MAX_DEPTH = int(os.getenv('MAX_DEPTH')) # Maximum recursion depth (base URL is depth 0)
CONCURRENT_REQUESTS = int(os.getenv('CONCURRENT_REQUESTS')) # Maximum number of concurrent requests
GOOGLE_APPLICATION_CREDENTIALS =os.getenv('GOOGLE_APPLICATION_CREDENTIALS ')
from google.auth import default
credentials, project = default()
# Create folder for JSON data
DATA_FOLDER = "scraped_data"
if not os.path.exists(DATA_FOLDER):
Expand Down
3 changes: 2 additions & 1 deletion airflow/dags/dataflow/store_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
BUCKET_NAME= os.getenv('BUCKET_NAME')
RAW_DATA_FOLDER= os.getenv('RAW_DATA_FOLDER')
FAISS_INDEX_FOLDER= os.getenv('FAISS_INDEX_FOLDER')
GOOGLE_APPLICATION_CREDENTIALS=os.getenv('GOOGLE_APPLICATION_CREDENTIALS')
from google.auth import default
credentials, project = default()

def get_blob_from_bucket():
storage_client = Client()
Expand Down
3 changes: 3 additions & 0 deletions prefectWorkflows/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*.env
scraped_data/
faiss_index/
4 changes: 3 additions & 1 deletion prefectWorkflows/.env
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ DATA_FOLDER = "scraped_data"
BUCKET_NAME=scraped_raw_data_nubot
RAW_DATA_FOLDER=raw_data
FAISS_INDEX_FOLDER=faiss_index
GOOGLE_APPLICATION_CREDENTIALS="E:/gcpkeys/nubot/nubot-nikhil-6adeee091d55.json"
GOOGLE_APPLICATION_CREDENTIALS="E:/gcpkeys/nubot/nubot-nikhil-6adeee091d55.json"
PREFECT_API_KEY=pnu_mRGcrBkC9qyFbwGfgrVbjbOoL7WIZ411TKYp
PREFECT_API_URL="https://api.prefect.cloud/api/accounts/806f2e07-5063-4fbe-9b46-0545ad5de2d1/workspaces/acdf9e9e-8a55-446a-ac46-80a3f843d8b6"
25 changes: 25 additions & 0 deletions prefectWorkflows/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Start from a lightweight Python image (use the appropriate Python version)
FROM python:3.10-slim

# Set working directory in container
WORKDIR /app

# Install Python dependencies.
# If you have a requirements.txt, copy and install it:
COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

# (Alternatively, directly install Prefect and any needed libraries)
# RUN pip install prefect==3.1.10

# Copy the Prefect flow code and the dataflow module into the image
COPY . .


# Ensure Python can find the 'dataflow' module (add /app to PYTHONPATH)
ENV PYTHONPATH="/app:${PYTHONPATH}"

# (Optional) Set a default command (Prefect Cloud will override this when submitting the flow run)
# By default, do nothing or use a generic command. Prefect Cloud's work pool will specify the entrypoint at runtime.
CMD ["python", "-c", "print('Container built for Prefect flow execution')"]
17 changes: 15 additions & 2 deletions prefectWorkflows/dataflow/chunk_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,20 @@
from dataflow.store_data import upload_faiss_index_to_bucket
load_dotenv(override=True)
BUCKET_NAME= os.getenv('BUCKET_NAME')
GOOGLE_APPLICATION_CREDENTIALS=os.getenv('GOOGLE_APPLICATION_CREDENTIALS')
from google.auth import default
from google.oauth2 import service_account

# Try to get credentials - works in both Docker and Cloud Run
try:
# First try Application Default Credentials (works in Cloud Run)
credentials, project = default()
except Exception:
# Fall back to explicit credentials file (for Docker)
credentials_path = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
if credentials_path:
credentials = service_account.Credentials.from_service_account_file(credentials_path)
else:
raise Exception("No credentials available")
RAW_DATA_FOLDER= os.getenv('RAW_DATA_FOLDER')
def chunk_data():
# Load all JSON files from a directory
Expand Down Expand Up @@ -52,4 +65,4 @@ def chunk_data():

if __name__=="__main__":
chunk_data()
upload_faiss_index_to_bucket()
Loading
Loading