A comprehensive machine learning classification demonstration using multiple algorithms on the Breast Cancer Wisconsin dataset.
- Python 3.9+
- Poetry
# Install Poetry (if not already installed)
pip install poetry
# Clone the repository
git clone https://github.com/yourusername/classification.git
cd classification
# Install dependencies
poetry install
# Activate the virtual environment
poetry shell
# Run the main script
poetry run classification
classification/
: Source codetests/
: Unit testsnotebooks/
: Jupyter notebooks for interactive exploration
- lightgbm
- Understanding classification techniques
- Model evaluation
- Performance visualization
- Feature importance analysis
the major components in your Azure ML project structure
The foundation of Azure resource organization. Resource groups act as logical containers that hold related resources for an Azure solution. They enable you to manage and organize your Azure ML resources efficiently, apply permissions across multiple resources, and handle lifecycle management.
Azure ML workspace is your main working environment in the cloud. It's the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create. The workspace keeps track of all training runs, deployments, metrics, and more.
Azure Storage accounts are essential for Azure ML operations. They provide secure, scalable cloud storage for various data types including datasets, model files, and logs. The storage account is automatically linked to your workspace and serves as the default storage for training data and model artifacts.
A compute cluster is a managed compute infrastructure that allows you to easily create a single or multi-node compute. The cluster scales up and down automatically when submitting a job, making it ideal for batch training and parallel processing of large datasets.
A compute instance is a fully managed cloud workstation optimized for your machine learning development environment. It provides a Jupyter notebook server, VS Code integration, and other development tools pre-configured for ML work.
Containers in Azure ML provide isolated, consistent environments for training and deployment. They package code and dependencies together, ensuring reproducibility across different environments and making it easier to deploy models as web services.
Data assets provide a way to store and version your data in Azure ML. They abstract the underlying storage details and provide a consistent interface to access data, whether it's files, tables, or other formats, making data management and sharing easier.
Datastores are abstractions for cloud storage. They encapsulate the information required to connect to data sources like Azure Storage, Azure Data Lake, or Azure SQL Database. Datastores provide credential-free access to your data in training scripts.
Job training components handle the execution of ML training workflows. They manage the logistics of running training scripts on compute targets, handling data input/output, and tracking metrics and artifacts.
The deployment components include:
- Endpoints: Entry points for real-time inference
- Deployment configurations: Settings for model serving
- Scoring scripts: Code that defines how your model processes requests
Integration components help connect Azure ML with your development workflow:
- Conda environments for dependency management
- Testing frameworks for CI/CD pipelines
- Requirements specification for reproducible environments