Skip to content

GridGain-Demos/prediction_cache_ga

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Prediction cache GA

This project demonstrates the use of GridGain as a high-performance prediction cache for a product recommendation model built on Google Analytics data.

Table of Contents

  1. Features
  2. Architecture & Project Structure
  3. Setup Instructions
  4. Running the Project
  5. API Documentation

Features

  • BigQuery Integration: Create datasets, tables, and recommendation models in BigQuery.
  • Data Export: Export data from BigQuery to Google Cloud Storage (GCS) in Parquet format.
  • AWS S3 Integration: Create S3 buckets and transfer data from GCS to S3.
  • GridGain based prediction Caching: Create tables and push data to GridGain for fast, in-memory access.
  • Recommendation Engine:
    • Generate recommendations using BigQuery ML.
    • Retrieve cached recommendations from GridGain.
  • RESTful API: All functionalities exposed through a well-structured RESTful API. -- Flexible Configuration: The project, dataset, access creds are all parameterized within the API

Architecture & Project Structure

Architecture

  1. FastAPI Application (api.py): Sets up the FastAPI application and defines all the API endpoints for BigQuery, AWS, and GridGain operations.

  2. GCP Helper (gcp_helper.py): Contains functions for interacting with Google Cloud Platform services, including BigQuery and Google Cloud Storage.

  3. GridGain Helper (gg_helper.py): Handles operations related to GridGain, including table creation, data pushing, and cached recommendation retrieval.

Setup Instructions

Prerequisites

  • Python 3.11.7

    • You can use pyenv to manage multiple Python versions (optional):
      1. Install pyenv: brew install pyenv (or your system's package manager)
      2. Create and activate the environment:
        pyenv virtualenv 3.11.7 ga-demo-env
        source $HOME/.pyenv/versions/ga-demo-env/bin/activate 
    • Alternatively, ensure Python 3.11.7 is installed directly.
  • GCP

    • GCP CLI

      1. You should have the gcp cli installed and configured with your GCP Default Credentials.
    • GCP Project : Create a project in GCP with the following

      1. APIs enabled
      2. Dataform API
      3. Analytics Hub API
      4. BigQuery API
      5. BigQuery Connection API
      6. BigQuery Data Policy API
      7. BigQuery Migration API
      8. BigQuery Reservation API
      9. BigQuery Storage API
      10. Cloud Dataplex API
      11. Google Cloud Data Catalog API
      12. Google Cloud Storage JSON API
      13. Storage Insights API
    • GCP Roles : The following roles must be assigned to the user on the GCP project

      1. BigQuery Data Editor
      2. BigQuery Job User

BigQuery ML Slot Reservation Setup

Before implementing the retail recommender model, you must configure specific resource allocations in your Google Cloud Project:

  1. Enable BigQuery Reservation API

  2. Create Slot Reservation

    • Access BigQuery Admin Console

    • Navigate to "Capacity Management"

      Architecture

    • Click "Create Reservation"

      Architecture

    • Configure:

      • Location: US (must match your dataset location)
      • Reservation name: (e.g., "ml-training-reservation")

      Architecture

  3. Create Assignment

    • In the Reservations page, click "Create Assignment"

      Architecture

    • Configure:

      • Project: ga-ignite-test
      • Job type: QUERY

      Architecture

Cost Considerations:

  • Flex slots are billed by the second
  • Minimum commitment: 100 slots
  • Can be deleted after model training is complete
  • Consider monthly/annual commitments for production workloads

Installation

  1. Install project dependencies using pip:
    pip install pygridgain s3fs pandas==2.2.2 numpy==1.26.4 google-cloud-storage google-cloud-bigquery fastapi==0.111.0 pydantic==2.7.4 uvicorn==0.30.1 pyarrow==16.1.0 requests==2.32.3

Running the Project

Authenticate to GCP: The gcp cli requires regular authentication, it expires after some time. Please run gcloud auth application-default login to reauthenticate.

Start FastAPI Server:

cd src
uvicorn api:app --reload
  • Access the Swagger UI at http://localhost:8000/docs to explore and test the API endpoints.

API Documentation

This application provides endpoints for managing BigQuery datasets, creating recommendation models, and interacting with GridGain and AWS S3.

Some important points to note:

We do not necessarily need to load data from AWS to GCS, the GridGain cache can be kept empty at the start and loaded with each execution of /get_recommendations api.

API Endpoints

1. Create Dataset

  • Endpoint: /bigquery/create_dataset
  • Method: POST
  • Description: Creates a BigQuery dataset.
  • Parameters:
    {
      "project_id": "ga-ignite-test",
      "dataset_id": "ga_dataset"
    }

2. Create Aggregate Web Stats Table

  • Endpoint: /bigquery/create_aggregate_web_stats_table
  • Method: POST
  • Description: Creates or replaces the aggregate_web_stats table.
  • Parameters:
    {
      "project_id": "ga-ignite-test",
      "dataset_id": "ga_dataset"
    }

3. Create Retail Recommender Model

  • Endpoint: /bigquery/create_retail_recommender_model
  • Method: POST
  • Description: Creates or replaces the retail_recommender matrix factorization model.
  • Note: Requires proper BigQuery ML reservation setup to avoid the following error:
    google.api_core.exceptions.BadRequest: 400 Training Matrix Factorization models is not available for on-demand usage. To train, please set up a reservation (flex or regular) based on instructions in BigQuery public docs.
    
  • Parameters:
    {
      "project_id": "ga-ignite-test",
      "dataset_id": "ga_dataset"
    }

4. Generate Recommendations

  • Endpoint: /bigquery/generate_recommendations
  • Method: POST
  • Description: Generates recommendations and stores them in the recommend_content table.
  • Parameters:
    {
      "project_id": "ga-ignite-test",
      "dataset_id": "ga_dataset"
    }

5. Create All Recommendations Table

  • Endpoint: /bigquery/create_all_recommendations_table
  • Method: POST
  • Description: Creates or replaces the all_recommendations table with unique IDs.
  • Parameters:
    {
      "project_id": "ga-ignite-test",
      "dataset_id": "ga_dataset"
    }

6. Create All Recommendations Table in GridGain

  • Endpoint: /gridgain/create_table
  • Method: POST
  • Description: Creates or replaces the all_recommendations table in GridGain.
  • Parameters:
    {
    "username": "<your gridgain cluster username>",
    "password": "<your gridgain cluster password>",
    "url": "<your gridgain cluster url>",
    "port": 10800
    }

7. Get Recommendations

  • Endpoint: /get_recommendations
  • Method: POST
  • Description: Gets a recommendation from GridGain model, if not found then gets a recommendation from the BQ Model and updates it in the GridGain Cache.
  • Parameters:
    {
      "gg": {
        "username": "<your gridgain cluster username>",
        "password": "<your gridgain cluster password>",
        "url": "<your gridgain cluster url>",
        "port": 10800
      },
      "gcp": {
        "project_id": "ga-ignite-test",
        "visitor_id": "8016003971239765913-2"
      }
    }

Optional Endpoints

1. Get Predicted Recommendations

  • Endpoint: /bigquery/get_predicted_recommendations
  • Method: POST
  • Description: Gets a recommendation from the BQ Model. This is the older method, it does not cache the recommendation in GridGain.
  • Parameters:
    {
      "project_id": "ga-ignite-test",
      "visitor_id": "8016003971239765913-2"
    }

2. Get Cached Recommendations

  • Endpoint: /gridgain/get_cached_recommendations
  • Method: POST
  • Description: Gets a recommendation from the cache.
  • Parameters:
    {
      "username": "<your gridgain cluster username>",
      "password": "<your gridgain cluster password>",
      "url": "<your gridgain cluster url>",
      "port": 10800,
      "visitor_id": "8016003971239765913-2"
    }

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages