AWS Intelligent Service Recommender

A dual-engine natural language recommendation system designed to map user requirements (e.g., "quickly deliver content globally") to specific AWS services (e.g., Amazon CloudFront) by analyzing technical whitepapers.

This project implements an A/B Architecture comparing a statistical baseline (TF-IDF) against a neural semantic model (BERT) to demonstrate the shift from keyword matching to context understanding.

Architecture Overview

The project is structured into two isolated engines sharing a common data source. Both are containerized using Docker to ensure reproducibility across environments.

Statistical Engine (Baseline)

Core Logic: TF-IDF (Term Frequency-Inverse Document Frequency).
Features: Unigrams + Bigrams (N-Grams) to capture phrases like "content delivery."
Tech Stack: PySpark (for distributed text processing), Java/OpenJDK.
Interface: Command Line Interface (CLI).
Why used: Establishes a baseline for keyword-based search performance.

Neural Engine (Production)

Core Logic: Semantic Search using Dense Vector Embeddings.
Model: all-MiniLM-L6-v2 (BERT-based Transformer).
Tech Stack: PyTorch, Sentence-Transformers, Pandas.
Interface: Interactive Web App (Streamlit).
Why used: Captures intent and context (e.g., knowing that "latency" relates to "speed") which keyword matching misses.

Project Structure

AWS_Recommender/
│
├── convert_pdf_to_csv.py       # Master Data Pipeline (PDF Extraction)
├── raw_data/                   # Source of Truth (AWS Whitepapers)
│
├── tfidf_engine/               # [Spark Engine]
│   ├── train_model.py          # PySpark pipeline (Tokenization -> HashingTF -> IDF)
│   ├── cli_recommend.py        # CLI tool for querying the Spark model
│   └── Dockerfile              # Java + Python environment
│
└── bert_engine/                # [BERT Engine]
    ├── generate_brain.py       # Embedding generation & noise cleaning
    ├── app.py                  # Streamlit Web Application
    └── Dockerfile              # Lightweight Python-slim environment

Next Steps

Improve accuracy via sliding-window chunking techniques
Implement hybrid search, combining TF-IDF scores with BERT scores for maximum accuracy
Improve UI
Deploy for testing with interested individuals

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bert_engine		bert_engine
tfidf_engine		tfidf_engine
.gitignore		.gitignore
README.md		README.md
convert_pdf_to_csv.py		convert_pdf_to_csv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AWS Intelligent Service Recommender

Architecture Overview

Statistical Engine (Baseline)

Neural Engine (Production)

Project Structure

Next Steps

About

Uh oh!

Releases

Packages

Languages

ChloeW125/aws-recommender-system

Folders and files

Latest commit

History

Repository files navigation

AWS Intelligent Service Recommender

Architecture Overview

Statistical Engine (Baseline)

Neural Engine (Production)

Project Structure

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages