SensCritique WeekReal Database 🎬

Overview

The SensCritique WeeklyReal Database project is an advanced ETL (Extract, Transform, Load) application developed in Python. It focuses on gathering weekly cinema release data from sens-critique. For the transformation phase, we will leverage a Large Language Model (LLM) and the TEI project to vectorize the reviews. The project's primary aim is to extract movie data, transform it using these advanced tools, and then store it in a PGVector database, a specialized vector data structure. This choice is motivated by the need to process and embed movie reviews, categorizing them into positive or negative sentiments, which is pivotal for subsequent data analysis and visualization.

Key Features

Automated ETL Pipeline: Extracts data from sens-critique, transforms it, and loads it into a PGVector database.
Review Analysis: Captures and categorizes movie reviews, enabling detailed sentiment analysis.
Vector Database Utilization: Leverages PGVector for efficient handling and querying of vector data.
Dashboard Compatibility: Designed to support data visualization and dashboard creation in PowerBI.
Scheduled and On-Demand Execution: The process can be executed at any time, with checks to prevent reprocessing of current week's data.

Important Note

🚨🚨🚨

Code Maintenance: The code might not always be up-to-date due to possible changes in sens-critique's website structure. While re-adaptation of the code is straightforward, regular updates may not be feasible.

Technology Stack

Text Embedding Inference (TEI): For processing and embedding review texts.
PGVector: A vector database for efficient data storage and retrieval.
Docker: For containerizing the ETL process.
Selenium: For web scraping and data extraction.
PwerBI: For reporting.

Repository Structure

Directory/File	Description
`etl/`	Package containing Extract, Transform, Load modules.
`docker-compose.yml`	Docker Compose file to link VDB, the app, and TEI.
`Dockerfile`	Dockerfile for creating the application's image.
`main.py`	Script to execute the ETL process.
`setup_vcb.py`	Script for initial database setup (if running without volumes).
`bddr-sc-env.yml`	Script for setup the conda env.
`requirements.txt`	To install the dependencies with pip.
`reporting/`	Folder containing all the reporting section.

Usage

Docker Setup: Fetch the docker-compose.yml, required volumes, and project image. Run main.py within the container. If running without volumes, execute setup_vcb.py first.
Conda Environment: Setup a Conda environment and execute main.py, or use the classes within a Notebook. In this case, setup the rights ENV VAR
Note: Don't forget to launch pgvector and TEI images.

HANDBOOK available here

Reporting: For reporting purposes, retrieve only the volume, launch a PGVector instance, and connect to the database from PowerBI. See reporting/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SensCritique WeekReal Database 🎬

Table of Contents

Overview

Key Features

Important Note

Technology Stack

Repository Structure

Usage

🎬

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
etl		etl
handbook		handbook
reporting		reporting
res		res
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
bddr-sc-env.yml		bddr-sc-env.yml
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt
setup_vcb.py		setup_vcb.py

ilanaliouchouche/WeeklyMovies-VDB

Folders and files

Latest commit

History

Repository files navigation

SensCritique WeekReal Database 🎬

Table of Contents

Overview

Key Features

Important Note

Technology Stack

Repository Structure

Usage

🎬

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages