A Java-based application that demonstrates Neo4j database connectivity and data processing for employment/occupation data analysis.
This project (RDBL1 - Relational Database Lab 1) is designed to work with Neo4j graph database to process and analyze employment data. The application connects to a Neo4j database, reads data from files, and creates data objects for further analysis.
Created: October 2024
- Neo4j Database Integration: Connects to local Neo4j database instance
- Batch Data Processing: Efficiently processes large datasets (38+ million records) in batches
- Occupation Data Mapping: Maps occupation IDs to occupation descriptions
- File-based Data Import: Reads and processes data from external files
- Optimized Performance: Configured for high-performance data insertion with connection pooling
- Java 18 or higher
- Maven 3.6+ for dependency management
- Neo4j Database (local instance running on port 7687)
- Neo4j database credentials (default: neo4j/neo12345)
- Neo4j Java Driver (v5.22.0) - Database connectivity
- SLF4J API (v2.1.0-alpha1) - Logging framework
- SLF4J Simple (v2.1.0-alpha1) - Logging implementation
-
Clone the repository:
git clone https://github.com/snxethan/DBT230-FINAL.git cd DBT230-FINAL -
Install Neo4j:
- Download and install Neo4j Desktop or Neo4j Community Edition
- Start Neo4j database service on
localhost:7687 - Set up authentication: username
neo4j, passwordneo12345
-
Build the project:
mvn clean compile
-
Run the application:
mvn exec:java -Dexec.mainClass="Main"
src/main/java/
├── Main.java # Application entry point
├── Neo4jController.java # Database connection and data processing
└── DataObject.java # Data model for occupation/employment data
The application is configured with the following default settings:
- Database URI:
neo4j://localhost:7687 - Username:
neo4j - Password:
neo12345 - Connection Pool Size: 10,000
- Batch Size: 800,000 records
To modify these settings, update the constants in Neo4jController.java.
The DataObject class represents employment data with the following fields:
seriesID- Unique identifier for data seriesyear- Year of the data pointmonth- Month of the data pointvalue- Numerical value (employment figures)occupationID- Occupation category identifier
The application is optimized for large-scale data processing:
- Processes 38,861,474 records in approximately 8 minutes
- Uses batch processing with configurable batch sizes
- Implements connection pooling for optimal database performance
- Ensure Neo4j database is running and accessible
- Place your data files in the appropriate directory
- Run the main application class
- The application will:
- Connect to Neo4j database
- Create data objects from files
- Process and insert data in batches
- Close database connection
- Connection Issues: Verify Neo4j is running on port 7687
- Authentication Errors: Check username/password credentials
- Performance Issues: Adjust batch size and connection pool settings
- Memory Issues: Increase JVM heap size for large datasets
- Ethan Townsend (snxethan)
- Ethan Smith
- Victor Keeler
- Jacob Brincefield