⚠️ This project is archived and no longer actively maintained.
A Java-based project demonstrating Amazon Machine Learning (Amazon ML) capabilities, including data generation for machine learning experiments and real-time prediction using AWS ML services.
- Overview
- Features
- Prerequisites
- Installation
- Usage
- Project Structure
- Components
- Data Generation
- Machine Learning Predictions
- Configuration
- Contributing
- License
This project provides tools and examples for working with Amazon Machine Learning services. It includes:
- Data Generator: Creates synthetic web log data for ML training and testing
- ML Sample: Demonstrates real-time prediction using AWS ML endpoints
- Spark Examples: Code samples for Apache Spark integration
Originally created to support the blog post: Test Drive: AWS Machine Learning + Redshift
- 🔄 Synthetic Data Generation: Generate realistic web log data with demographic variations
- 🤖 Real-time ML Predictions: Connect to AWS ML endpoints for live predictions
- 📊 Data Processing: Handle CSV data with customizable delimiters
- 🌍 Geographic Variations: Include state-based pricing multipliers
- 📅 Temporal Patterns: Seasonal and time-based data variations
- 🔧 Maven Integration: Easy dependency management and build process
- Java 8 or higher
- Maven 3.3 or higher
- AWS Account with Machine Learning service access
- AWS Credentials configured (via AWS CLI or environment variables)
-
Clone the repository
git clone https://github.com/julien/amazon-machine-learning.git cd amazon-machine-learning -
Build the project
mvn clean compile
-
Configure AWS credentials
aws configure # Or set environment variables: export AWS_ACCESS_KEY_ID=your_access_key export AWS_SECRET_ACCESS_KEY=your_secret_key export AWS_DEFAULT_REGION=eu-west-1
Generate synthetic web log data:
# Compile and run the generator
javac -cp ".:lib/*" Generator.java
java -cp ".:lib/*" org.julien.datastuff.GeneratorThis will create data-batch-prediction.txt with 1000 synthetic records.
Make predictions using an existing ML model:
# Compile the ML sample
javac -cp ".:lib/*" src/org/julien/datastuff/MLSample.java
# Run predictions (replace MODEL_ID with your actual model ID)
java -cp ".:lib/*:src" org.julien.datastuff.MLSample MODEL_IDamazon-machine-learning/
├── README.md # This file
├── pom.xml # Maven configuration
├── Generator.java # Data generation utility
├── data-batch-prediction.txt # Generated sample data
├── dist.all.last # Last names dataset
├── dist.female.first # Female first names dataset
├── dist.male.first # Male first names dataset
├── US_States.txt # US states dataset
└── src/
└── org/julien/datastuff/
└── MLSample.java # ML prediction example
Generates synthetic web log data with the following features:
- Demographic Data: Names, gender, age, location
- Temporal Data: Day of year, hour, minutes
- Purchase Data: Number of items, basket price
- Geographic Multipliers: State-based pricing variations
- Seasonal Patterns: Holiday season multipliers
Generated Data Format:
LastName,FirstName,Gender,State,Age,DayOfYear,Hour,Minute,Items,PurchaseAmount
Demonstrates real-time prediction using AWS Machine Learning:
- Model Discovery: Lists available ML models
- Endpoint Connection: Connects to real-time prediction endpoints
- Prediction Requests: Sends data for prediction
- Response Processing: Handles prediction results
The data generator creates realistic e-commerce data with the following characteristics:
- Gender-based pricing: Female customers have 25% higher average purchase values
- Age-based patterns: Customers aged 25-45 have 20% higher purchase values
- Geographic variations: Different states have different pricing multipliers
- California, Florida: 1.5x multiplier
- New York: 1.75x multiplier
- District of Columbia: 2.0x multiplier
- Holiday season (Nov-Dec): 1.5x to 2.5x multipliers
- Peak shopping periods: Enhanced purchase values
The ML sample demonstrates:
- Model Discovery: Find and list available ML models
- Endpoint Connection: Connect to real-time prediction endpoints
- Data Preparation: Format input data for prediction
- Prediction Execution: Send requests and receive results
- Performance Monitoring: Track request response times
request.addRecordEntry("age", "32")
.addRecordEntry("job", "management")
.addRecordEntry("marital", "married")
// ... additional features- Region: Defaults to
eu-west-1 - Credentials: Use AWS CLI or environment variables
- Permissions: Requires ML model access and prediction permissions
The generator uses the following data files:
dist.all.last: Last names databasedist.male.first: Male first names databasedist.female.first: Female first names databaseUS_States.txt: US states list
Since this project is archived, contributions are not being accepted. However, you can:
- Fork the repository for your own use
- Create issues for documentation improvements
- Use the code as a reference for your own ML projects
This project is licensed under the MIT License - see the LICENSE file for details.
Note: This project was created for educational and demonstration purposes. The AWS Machine Learning service has been deprecated in favor of Amazon SageMaker. Consider migrating to SageMaker for production ML workloads.