Skip to content
This repository has been archived by the owner on Feb 1, 2025. It is now read-only.
/ websem-og24 Public archive

🌐 A semantic web project using deep learning and knowledge graphs to create an enriched data model of the Paris 2024 Olympic Games.

Notifications You must be signed in to change notification settings

marcpinet/websem-og24

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Paris 2024 Olympics - Semantic Web Project

A comprehensive knowledge graph project for the Paris 2024 Olympic Games, combining semantic web technologies with advanced NLP for data enrichment.

πŸ“‹ Overview

This project builds an enriched semantic model of Olympic Games data through:

  • Custom RDF/OWL ontology for Olympic domain modeling
  • Deep learning-based information extraction using spaCy's fr_core_news_lg
  • SKOS-based sport classification system
  • SPARQL inference rules and SHACL constraints
  • External knowledge integration (DBpedia/Wikidata)
  • Real-time weather data integration

🧠 Deep Learning & NLP

The project leverages spaCy's fr_core_news_lg model (1.7GB) for advanced text analysis:

  • Named Entity Recognition optimized for athlete identification
  • Sport-specific term classification
  • Performance metric extraction
  • Contextual relationship mapping

Model Performance:

  • 89% accuracy on athlete name recognition
  • 97% accuracy on sports terminology classification
  • 2,500+ athlete mentions processed
  • 1,800+ performance records extracted

πŸ—οΈ Architecture

Semantic Layer

  • Modular ontology with Person, SportingEvent, and Location hierarchies
  • SKOS taxonomy for sports classification
  • SHACL constraints for data validation
  • Custom SPARQL rules for knowledge inference

Data Integration Layer

  • DBpedia and Wikidata entity linking
  • Real-time weather data integration
  • Unstructured text processing pipeline
  • CSV data transformation system

Visualization Layer

  • Interactive knowledge graph exploration
  • Medal distribution analytics
  • Event timeline visualization
  • Weather condition monitoring

πŸ“ˆ Visualizations

Knowledge Graph (KG)

graph.mp4

Medals queries

medals.mp4

Simple charts

img2

πŸ› οΈ Prerequisites

  • Python 3.8+
  • Docker 20.10+
  • 4GB RAM minimum (8GB recommended for full graph processing)

πŸš€ Quick Start

  1. Clone and install dependencies:
git clone https://github.com/[username]/jo2024-semantic.git
cd jo2024-semantic
pip install -r requirements.txt
  1. Launch weather service:
docker compose up -d
  1. Run data pipeline:
python 01_data_enricher.py  # NLP-based enrichment
python 02_data_linking.py   # Knowledge base linking
python 03_generate_graph.py # Visualization generation

πŸ“Š Key Results

  • Initial dataset: 355 triples
  • After enrichment: 15,152 triples
  • External links created: 2,843
  • NLP-extracted relationships: 4,500+
  • Real-time weather monitoring for 5 venues

πŸ” Available Analysis

  1. Athlete Performance Analysis

    • Medal distribution by country
    • Performance trends over time
    • Cross-discipline achievements
  2. Event Analysis

    • Temporal distribution
    • Venue utilization
    • Weather impact assessment
  3. Knowledge Graph Exploration

    • Entity relationship visualization
    • Path finding between entities
    • Cluster analysis

⚠️ Known Limitations

  • Weather API coordinate precision limited by infrastructure constraints
  • Graph visualization limited to 100 triples for performance
  • French language model occasionally struggles with rare athlete names

πŸ—ΊοΈ Future Development

  • Expansion of the sports ontology
  • Integration of live event streaming data
  • Enhanced predictive analytics
  • Multi-language support

✍️ Authors

πŸ“„ License

MIT License - see LICENSE file for details.


For detailed documentation on SPARQL queries and ontology structure, see the docs/ directory.

About

🌐 A semantic web project using deep learning and knowledge graphs to create an enriched data model of the Paris 2024 Olympic Games.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published