Table of Contents
- Paris 2024 Olympics - Semantic Web Project
Paris 2024 Olympics - Semantic Web Project
A comprehensive knowledge graph project for the Paris 2024 Olympic Games, combining semantic web technologies with advanced NLP for data enrichment.
π Overview
This project builds an enriched semantic model of Olympic Games data through:
- Custom RDF/OWL ontology for Olympic domain modeling
- Deep learning-based information extraction using spaCy's fr_core_news_lg
- SKOS-based sport classification system
- SPARQL inference rules and SHACL constraints
- External knowledge integration (DBpedia/Wikidata)
- Real-time weather data integration
π§ Deep Learning & NLP
The project leverages spaCy's fr_core_news_lg model (1.7GB) for advanced text analysis:
- Named Entity Recognition optimized for athlete identification
- Sport-specific term classification
- Performance metric extraction
- Contextual relationship mapping
Model Performance:
- 89% accuracy on athlete name recognition
- 97% accuracy on sports terminology classification
- 2,500+ athlete mentions processed
- 1,800+ performance records extracted
ποΈ Architecture
Semantic Layer
- Modular ontology with Person, SportingEvent, and Location hierarchies
- SKOS taxonomy for sports classification
- SHACL constraints for data validation
- Custom SPARQL rules for knowledge inference
Data Integration Layer
- DBpedia and Wikidata entity linking
- Real-time weather data integration
- Unstructured text processing pipeline
- CSV data transformation system
Visualization Layer
- Interactive knowledge graph exploration
- Medal distribution analytics
- Event timeline visualization
- Weather condition monitoring
π Visualizations
Knowledge Graph (KG)
Medals queries
Simple charts
π οΈ Prerequisites
- Python 3.8+
- Docker 20.10+
- 4GB RAM minimum (8GB recommended for full graph processing)
π Quick Start
- Clone and install dependencies:
- Launch weather service:
- Run data pipeline:
π Key Results
- Initial dataset: 355 triples
- After enrichment: 15,152 triples
- External links created: 2,843
- NLP-extracted relationships: 4,500+
- Real-time weather monitoring for 5 venues
π Available Analysis
Athlete Performance Analysis
- Medal distribution by country
- Performance trends over time
- Cross-discipline achievements
Event Analysis
- Temporal distribution
- Venue utilization
- Weather impact assessment
Knowledge Graph Exploration
- Entity relationship visualization
- Path finding between entities
- Cluster analysis
β οΈ Known Limitations
- Weather API coordinate precision limited by infrastructure constraints
- Graph visualization limited to 100 triples for performance
- French language model occasionally struggles with rare athlete names
πΊοΈ Future Development
- Expansion of the sports ontology
- Integration of live event streaming data
- Enhanced predictive analytics
- Multi-language support
βοΈ Authors
- Marc Pinet - marcpinet
- Arthur Rodriguez - rodriguezarthur
- Amine Haddou - H4znow
π License
MIT License - see LICENSE file for details.
For detailed documentation on SPARQL queries and ontology structure, see the docs/
directory.