Skip to content

juliensimon/amazon-machine-learning

Repository files navigation

Amazon Machine Learning Project

Java Maven AWS License Status Build

⚠️ This project is archived and no longer actively maintained.

A Java-based project demonstrating Amazon Machine Learning (Amazon ML) capabilities, including data generation for machine learning experiments and real-time prediction using AWS ML services.

📋 Table of Contents

🎯 Overview

This project provides tools and examples for working with Amazon Machine Learning services. It includes:

  • Data Generator: Creates synthetic web log data for ML training and testing
  • ML Sample: Demonstrates real-time prediction using AWS ML endpoints
  • Spark Examples: Code samples for Apache Spark integration

Originally created to support the blog post: Test Drive: AWS Machine Learning + Redshift

✨ Features

  • 🔄 Synthetic Data Generation: Generate realistic web log data with demographic variations
  • 🤖 Real-time ML Predictions: Connect to AWS ML endpoints for live predictions
  • 📊 Data Processing: Handle CSV data with customizable delimiters
  • 🌍 Geographic Variations: Include state-based pricing multipliers
  • 📅 Temporal Patterns: Seasonal and time-based data variations
  • 🔧 Maven Integration: Easy dependency management and build process

🛠 Prerequisites

  • Java 8 or higher
  • Maven 3.3 or higher
  • AWS Account with Machine Learning service access
  • AWS Credentials configured (via AWS CLI or environment variables)

📦 Installation

  1. Clone the repository

    git clone https://github.com/julien/amazon-machine-learning.git
    cd amazon-machine-learning
  2. Build the project

    mvn clean compile
  3. Configure AWS credentials

    aws configure
    # Or set environment variables:
    export AWS_ACCESS_KEY_ID=your_access_key
    export AWS_SECRET_ACCESS_KEY=your_secret_key
    export AWS_DEFAULT_REGION=eu-west-1

🚀 Usage

Data Generation

Generate synthetic web log data:

# Compile and run the generator
javac -cp ".:lib/*" Generator.java
java -cp ".:lib/*" org.julien.datastuff.Generator

This will create data-batch-prediction.txt with 1000 synthetic records.

Machine Learning Predictions

Make predictions using an existing ML model:

# Compile the ML sample
javac -cp ".:lib/*" src/org/julien/datastuff/MLSample.java

# Run predictions (replace MODEL_ID with your actual model ID)
java -cp ".:lib/*:src" org.julien.datastuff.MLSample MODEL_ID

�� Project Structure

amazon-machine-learning/
├── README.md                 # This file
├── pom.xml                   # Maven configuration
├── Generator.java            # Data generation utility
├── data-batch-prediction.txt # Generated sample data
├── dist.all.last            # Last names dataset
├── dist.female.first        # Female first names dataset
├── dist.male.first          # Male first names dataset
├── US_States.txt            # US states dataset
└── src/
    └── org/julien/datastuff/
        └── MLSample.java     # ML prediction example

🔧 Components

Data Generator (Generator.java)

Generates synthetic web log data with the following features:

  • Demographic Data: Names, gender, age, location
  • Temporal Data: Day of year, hour, minutes
  • Purchase Data: Number of items, basket price
  • Geographic Multipliers: State-based pricing variations
  • Seasonal Patterns: Holiday season multipliers

Generated Data Format:

LastName,FirstName,Gender,State,Age,DayOfYear,Hour,Minute,Items,PurchaseAmount

ML Sample (MLSample.java)

Demonstrates real-time prediction using AWS Machine Learning:

  • Model Discovery: Lists available ML models
  • Endpoint Connection: Connects to real-time prediction endpoints
  • Prediction Requests: Sends data for prediction
  • Response Processing: Handles prediction results

📊 Data Generation

The data generator creates realistic e-commerce data with the following characteristics:

Demographic Variations

  • Gender-based pricing: Female customers have 25% higher average purchase values
  • Age-based patterns: Customers aged 25-45 have 20% higher purchase values
  • Geographic variations: Different states have different pricing multipliers

Geographic Multipliers

  • California, Florida: 1.5x multiplier
  • New York: 1.75x multiplier
  • District of Columbia: 2.0x multiplier

Seasonal Patterns

  • Holiday season (Nov-Dec): 1.5x to 2.5x multipliers
  • Peak shopping periods: Enhanced purchase values

🤖 Machine Learning Predictions

The ML sample demonstrates:

  1. Model Discovery: Find and list available ML models
  2. Endpoint Connection: Connect to real-time prediction endpoints
  3. Data Preparation: Format input data for prediction
  4. Prediction Execution: Send requests and receive results
  5. Performance Monitoring: Track request response times

Example Prediction Request

request.addRecordEntry("age", "32")
       .addRecordEntry("job", "management")
       .addRecordEntry("marital", "married")
       // ... additional features

⚙️ Configuration

AWS Configuration

  • Region: Defaults to eu-west-1
  • Credentials: Use AWS CLI or environment variables
  • Permissions: Requires ML model access and prediction permissions

Data Files

The generator uses the following data files:

  • dist.all.last: Last names database
  • dist.male.first: Male first names database
  • dist.female.first: Female first names database
  • US_States.txt: US states list

🤝 Contributing

Since this project is archived, contributions are not being accepted. However, you can:

  1. Fork the repository for your own use
  2. Create issues for documentation improvements
  3. Use the code as a reference for your own ML projects

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Related Resources


Note: This project was created for educational and demonstration purposes. The AWS Machine Learning service has been deprecated in favor of Amazon SageMaker. Consider migrating to SageMaker for production ML workloads.

About

A Java-based project demonstrating Amazon Machine Learning (Amazon ML)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages