Skip to content

Latest commit

 

History

History
457 lines (340 loc) · 13.7 KB

File metadata and controls

457 lines (340 loc) · 13.7 KB

Session 3: Open-Source Model Discovery and Management

Overview

This session focuses on practical model discovery and management with Foundry Local. You'll learn how to list available models, test different options, and understand basic performance characteristics. The approach emphasizes hands-on exploration with the foundry CLI to help you select the right models for your use cases.

Learning Objectives

  • Master foundry CLI commands for model discovery and management
  • Understand model cache and local storage patterns
  • Learn to quickly test and compare different models
  • Establish practical workflows for model selection and benchmarking
  • Explore the growing ecosystem of models available through Foundry Local

Prerequisites

  • Completed Session 1: Getting Started with Foundry Local
  • Foundry Local CLI installed and accessible
  • Sufficient storage space for model downloads (models can range from 1GB to 20GB+)
  • Basic understanding of model types and use cases-Source Models with Foundry Local

Overview

This session explores how to bring open-source models to Foundry L## Part 6: Hands-On Exercise

Exercise: Model Discovery and Comparison

Create your own model evaluation script based on Sample 03:

REM create_model_test.cmd
@echo off
echo Model Discovery and Testing Script
echo =====================================

echo.
echo Step 1: List available models
foundry model list

echo.
echo Step 2: Check what's cached
foundry cache list

echo.
echo Step 3: Start phi-4-mini for testing
foundry model run phi-4-mini --verbose

echo.
echo Step 4: Test with a simple prompt
curl -X POST http://localhost:8000/v1/chat/completions ^
  -H "Content-Type: application/json" ^
  -d "{\"model\":\"phi-4-mini\",\"messages\":[{\"role\":\"user\",\"content\":\"Hello, please introduce yourself.\"}],\"max_tokens\":100}"

echo.
echo Model test complete!

Your Task

  1. Run the Sample 03 script: samples\03\list_and_bench.cmd
  2. Try different models: Test at least 3 different models
  3. Compare performance: Note differences in speed and response quality
  4. Document findings: Create a simple comparison chart

Example Comparison Format

Model Comparison Results:
========================
phi-4-mini:        Fast (~2s), good for general chat
qwen2.5-7b:       Slower (~5s), better reasoning  
deepseek-r1:      Medium (~3s), excellent for code

Recommendation: Start with phi-4-mini for development, 
switch to qwen2.5-7b for production reasoning tasks.

Part 7: Troubleshooting and Best Practices

Common Issues and Solutions

Model Won't Start:

REM Check service status
foundry service status

REM Restart service if needed
foundry service stop
foundry service start

REM Try with verbose output
foundry model run phi-4-mini --verbose

Insufficient Memory:

  • Start with smaller models (phi-4-mini)
  • Close other applications
  • Upgrade RAM if frequently hitting limits

Slow Performance:

  • Ensure model is fully loaded (check verbose output)
  • Close unnecessary background applications
  • Consider faster storage (SSD)

Best Practices

  1. Start Small: Begin with phi-4-mini to validate setup
  2. One Model at a Time: Stop previous models before starting new ones
  3. Monitor Resources: Keep an eye on memory usage
  4. Test Consistently: Use the same prompts for fair comparisons
  5. Document Results: Keep notes on model performance for your use cases

Part 8: Next Steps and References

Preparing for Session 4

  • Session 4 Focus: Optimization tools and techniques
  • Prerequisites: Comfortable with model switching and basic performance testing
  • Recommended: Have 2-3 favorite models identified from this session

Additional Resources

Key Takeaways

Model Discovery: Use foundry model list to explore available models
Quick Testing: The list_and_bench.cmd pattern for rapid evaluation
Performance Monitoring: Basic resource usage and response time measurement
Model Selection: Practical guidelines for choosing models by use case
Cache Management: Understanding storage and cleanup procedures

You now have the practical skills to discover, test, and select appropriate models for your AI applications using Foundry Local's straightforward CLI approach.: selecting community models, integrating Hugging Face content, and adopting “bring your own model” (BYOM) strategies. You’ll also discover the Model Mondays series for continuous learning and model discovery.

References:

Learning Objectives

  • Discover and evaluate open-source models for local inference
  • Compile and run select Hugging Face models within Foundry Local
  • Apply model selection strategies for accuracy, latency, and resource needs
  • Manage models locally with cache and versioning

Part 1: Model Discovery with Foundry CLI

Basic Model Management Commands

The foundry CLI provides straightforward commands for model discovery and management:

REM List all available models in the catalog
foundry model list

REM List cached (downloaded) models
foundry cache list

REM Check cache directory location
foundry cache ls

Running Your First Models

Start with popular, well-tested models to understand performance characteristics:

REM Run Phi-4-Mini (lightweight, fast)
foundry model run phi-4-mini --verbose

REM Run Qwen 2.5 7B (larger, more capable)
foundry model run qwen2.5-7b --verbose

REM Run DeepSeek (specialized for coding)
foundry model run deepseek-r1-7b --verbose

Note: The --verbose flag provides detailed startup information, including:

  • Model download progress (on first run)
  • Memory allocation details
  • Service binding information
  • Performance initialization metrics

Understanding Model Categories

Small Language Models (SLMs):

  • phi-4-mini: Fast, efficient, great for general chat
  • phi-4: More capable version with better reasoning

Medium Models:

  • qwen2.5-7b: Excellent reasoning and longer context
  • deepseek-r1-7b: Optimized for code generation

Larger Models:

  • llama-3.2: Meta's latest open-source model
  • qwen2.5-14b: Enterprise-grade reasoning

Part 2: Quick Model Testing and Comparison

Sample 03 Approach: Simple List and Bench

Based on our Sample 03 pattern, here's the minimal workflow:

@echo off
REM Sample 03 - List and bench pattern
echo Listing available models...
foundry model list

echo.
echo Checking cached models...
foundry cache list

echo.
echo Starting phi-4-mini with verbose output...
foundry model run phi-4-mini --verbose

Testing Model Performance

Once a model is running, test it with consistent prompts:

REM Test via curl (Windows Command Prompt)
curl -X POST http://localhost:8000/v1/chat/completions ^
  -H "Content-Type: application/json" ^
  -d "{\"model\":\"phi-4-mini\",\"messages\":[{\"role\":\"user\",\"content\":\"Explain edge AI in one sentence.\"}],\"max_tokens\":50}"

PowerShell Testing Alternative

# PowerShell approach for testing
$body = @{
    model = "phi-4-mini"
    messages = @(
        @{
            role = "user"
            content = "Explain edge AI in one sentence."
        }
    )
    max_tokens = 50
} | ConvertTo-Json -Depth 3

Invoke-RestMethod -Uri "http://localhost:8000/v1/chat/completions" -Method Post -Body $body -ContentType "application/json"

Part 3: Model Cache and Storage Management

Understanding the Model Cache

Foundry Local automatically manages model downloads and caching:

REM Check cache directory and contents
foundry cache ls

REM View cache location
foundry cache cd

REM Clean up unused models (if needed)
foundry cache clean

Model Storage Considerations

Typical Model Sizes:

  • phi-4-mini: ~2.5 GB
  • qwen2.5-7b: ~4.1 GB
  • deepseek-r1-7b: ~4.3 GB
  • llama-3.2: ~4.9 GB
  • qwen2.5-14b: ~8.2 GB

Storage Best Practices:

  • Keep 2-3 models cached for quick switching
  • Remove unused models to free space: foundry cache clean
  • Monitor disk usage, especially on smaller SSDs
  • Consider model size vs. capability trade-offs

Model Performance Monitoring

While models are running, monitor system resources:

Windows Task Manager:

  • Watch memory usage (models stay loaded in RAM)
  • Monitor CPU utilization during inference
  • Check disk I/O during initial model loading

Command Line Monitoring:

REM Check memory usage (PowerShell)
Get-Process | Where-Object {$_.ProcessName -like "*foundry*"} | Select-Object ProcessName, WorkingSet64

REM Monitor running models
foundry service ps

Part 4: Practical Model Selection Guidelines

Choosing Models by Use Case

For General Chat and Q&A:

  • Start with: phi-4-mini (fast, efficient)
  • Upgrade to: phi-4 (better reasoning)
  • Advanced: qwen2.5-7b (longer context)

For Code Generation:

  • Recommended: deepseek-r1-7b
  • Alternative: qwen2.5-7b (also good for code)

For Complex Reasoning:

  • Best: qwen2.5-7b or qwen2.5-14b
  • Budget option: phi-4

Hardware Requirements Guide

Minimum System Requirements:

phi-4-mini:     8GB RAM,  entry-level CPU
phi-4:         12GB RAM,  mid-range CPU
qwen2.5-7b:    16GB RAM,  mid-range CPU
deepseek-r1:   16GB RAM,  mid-range CPU
qwen2.5-14b:   24GB RAM,  high-end CPU

Recommended for Best Performance:

  • 32GB+ RAM for comfortable multi-model switching
  • SSD storage for faster model loading
  • Modern CPU with good single-thread performance
  • NPU support (Windows 11 Copilot+ PCs) for acceleration

Model Switching Workflow

REM Stop current model (if needed)
foundry service stop

REM Start different model
foundry model run qwen2.5-7b

REM Verify model is running
foundry service status

Part 5: Simple Model Benchmarking

Basic Performance Testing

Here's a straightforward approach to compare model performance:

# simple_bench.py - Based on Sample 03 patterns
import time
import requests
import json

def test_model_response(model_name, prompt="Explain edge AI in one sentence."):
    """Test a single model with a prompt and measure response time."""
    start_time = time.time()
    
    try:
        response = requests.post(
            "http://localhost:8000/v1/chat/completions",
            headers={"Content-Type": "application/json"},
            json={
                "model": model_name,
                "messages": [{"role": "user", "content": prompt}],
                "max_tokens": 64
            },
            timeout=30
        )
        
        elapsed = time.time() - start_time
        
        if response.status_code == 200:
            result = response.json()
            return {
                "model": model_name,
                "latency_sec": round(elapsed, 3),
                "response": result["choices"][0]["message"]["content"],
                "status": "success"
            }
        else:
            return {
                "model": model_name,
                "status": "error",
                "error": f"HTTP {response.status_code}"
            }
            
    except Exception as e:
        return {
            "model": model_name,
            "status": "error", 
            "error": str(e)
        }

# Test the currently running model
if __name__ == "__main__":
    # Test with different models (start each model first)
    test_models = ["phi-4-mini", "qwen2.5-7b", "deepseek-r1-7b"]
    
    print("Model Performance Test")
    print("=" * 50)
    
    for model in test_models:
        print(f"\nTesting {model}...")
        print("Note: Make sure this model is running first with 'foundry model run {model}'")
        
        result = test_model_response(model)
        
        if result["status"] == "success":
            print(f"✅ {model}: {result['latency_sec']}s")
            print(f"   Response: {result['response'][:100]}...")
        else:
            print(f"❌ {model}: {result['error']}")

Manual Quality Assessment

For each model, test with consistent prompts and manually evaluate:

Test Prompts:

  1. "Explain quantum computing in simple terms."
  2. "Write a Python function to sort a list."
  3. "What are the pros and cons of remote work?"
  4. "Summarize the benefits of edge AI."

Evaluation Criteria:

  • Accuracy: Is the information correct?
  • Clarity: Is the explanation easy to understand?
  • Completeness: Does it address the full question?
  • Speed: How quickly does it respond?

Resource Usage Monitoring

REM Monitor while testing different models
REM Start model
foundry model run phi-4-mini

REM In another terminal, monitor resources
foundry service status
foundry service ps

REM Check system resources (PowerShell)
Get-Process | Where-Object ProcessName -Like "*foundry*" | Format-Table ProcessName, WorkingSet64, CPU

Part 6: Next Steps

  • Subscribe to Model Mondays for new models and tips: https://aka.ms/model-mondays
  • Contribute findings to your team’s models.json
  • Prepare for Session 4: comparing LLMs vs SLMs, local vs cloud inference, and hands-on demos