Skip to content
This repository has been archived by the owner on Feb 23, 2023. It is now read-only.
Daniela Bauer edited this page May 24, 2019 · 81 revisions

VMDIRAC is an extension of DIRAC

The DIRAC (Distributed Infrastructure with Remote Agent Control) project is a complete Grid, Cloud, Host and Volunteer solution for a community of users such as the LHCb Collaboration, Belle Collaboration or NGI multi-VO portals (DIRAC4EGI, FranceGrilles...). DIRAC forms a layer between a particular community and various compute resources to allow optimized, transparent and reliable usage.

DIRAC Documentation

DIRAC Overview

A more detailed description of the DIRAC system can be found at this DIRAC system

The DIRAC Workload Management system realizes the task scheduling paradigm with Generic Pilot Jobs. This task scheduling method solves many problems of using unstable distributed computing resources which are available in computing distributed infrastructures. In particular, it helps the management of the user activities in large Virtual Organizations such as LHC experiments. In more details the DIRAC WMS with Pilot Jobs is described in this DIRAC pilots model

You can have a look to new adopters (providers and user) questions at VMDIRAC FAQs

VMDIRAC Extension

The up-to-date documentation for VMDIRAC is now part of the standard DIRAC documentation.

VMDIRAC is the DIRAC extension to integrate Federated Clouds in a transparent manner to the user. You can install both DIRAC core and VMDIRAC extension with:

wget --no-check-certificate -O dirac-install 'https://github.com/DIRACGrid/DIRAC/raw/integration/Core/scripts/dirac-install.py'
su dirac -c'python dirac-install -V "VMDIRAC"'

See DIRAC server installation detailed procedure. Onces you have configured a DIRAC Configuration Server (CS) instance, you can configure a VMDIRAC extension.

VMDIRAC installation instructions

Notes that will lead you through the installation steps of VMDIRAC.

DIRAC Requirements

VMDIRAC is based upon the two following packages:

they both need to be installed prior attempting VMDIRAC installation.

Externals

Amazon EC2

Amazon EC2 boto API installation

sudo yum install python-boto

OpenNebula

OCCI 0.8 (OpenNebula) client installation

OpenNebula install

SL6 platform notes: There is no package, Install all the opennebula 3.4.0 rpm of Centos 6.0 in SL6 with:

sudo yum localinstall --nogpgcheck CentOS-6.0-opennebula-3.4.0-1.x86_64.rpm

Alternativelly, install only the client from sources:

See OpenNebula install for opennebula-3.4.0.tar.gz

currently we are maintaining opennebula end-points > 3.4.0

Installing only the client on the VMDIRAC server:

./install.sh -c
rOCCI 1.1 (OpenNebula) installation

NEW feautre in v0r8 including X509 authentication and generic ssh contextualization supported rOCCI client is >4.1.0

gem install rake
gem install occi-cli 

SL5 platform notes: Package incompatibility, manually installation instructions

Standard SL5 dependencies:

# rpm -qa|grep ruby
ruby-libs-1.8.5-29.el5_9.i386
ruby-1.8.5-29.el5_9.x86_64
ruby-libs-1.8.5-29.el5_9.x86_64
rubygems-1.3.1-1.el5.noarch
ruby-shadow-1.4.1-7.el5.x86_64
ruby-rdoc-1.8.5-29.el5_9.x86_64
ruby-devel-1.8.5-29.el5_9.i386
ruby-irb-1.8.5-29.el5_9.x86_64
ruby-augeas-0.3.0-1.el5.x86_64
libselinux-ruby-1.33.4-5.7.el5.x86_64
ruby-devel-1.8.5-29.el5_9.x86_64

occi needs ruby-1.9.3, in SL5 one can install and use Ruby enVironment Manager to setup a configurable Ruby environment, additional info at rvm homepage

Onces rvm is installed, then go to the dirac bashrc and add:

# RVM
source ~/.rvm/scripts/rvm
rvm use 1.9.3

OpenStack

OpenStack simple user/password auth

Nova 1.1 (OpenStack) python libraries:

pip install apache-libcloud
OpenStack with X509 VOMS auth

NEW feautre v0r9 (testing), currently need non supported folder of libcloud, (not supported by apache libcloud, it was working in previous versions but it is not maintained since apache-libcloud-0.10.1 ) if you have a previous libcloud folder move it to libcloud.bak and install from trunk:

Nova 1.1 (OpenStack) libcloud with X509:
cd /opt
git clone https://github.com/alvarolopez/libcloud.git
cd libcloud
python setup.py clean
python setup.py build
python setup.py install

ssh contextualization dependencies

If you want generic VM contextualization for any image with sshd available, then you need to install Paramiko

Paramiko:

pip install paramiko

VMDIRAC install step by step

Install the components either from the sysadmin tool or directly from the machine.

DB : VirtualMachineDB

install DB WorkloadManagement/VirtualMachineDB

this will create a new DB on the MySQL server ( without tables ! )

Service : VirtualMachineManagerHandler

install service WorkloadManagement/VirtualMachineManager

be careful with the port, if it is taken by other service you may want to update it to avoid collisions. Check the VMDIRAC.WorkloadManagement.ConfigTemplate for further information. Furthermore, running the service will generate the necessary tables in the database. This service is going to be contacted by the Web server and ALL the virtual machines. You may expect some load here, depending on the number of VMs running.

Agent : VirtualMachineScheduler

install agent WorkloadManagement/VirtualMachineScheduler

this agent is the one taking care of booting the virtual machine acording to needs

Agent : VirtualMachineContextualization

install agent WorkloadManagement/VirtualMachineContextualization

optional agent when using ssh contextualization method

Web : VMDIRAC.Web

nothing to do if the extension is properly declared on dirac.cfg

VMDIRAC setup overview

The main VMDIRAC setup is concerning to Image and Contextualization management There are three major sections to setup at DIRAC Configuration Server:

Running Pods:

VMDIRAC defines the Running Pod as a logical abstraction of a particular running conditions. A Running Pod is matching an Image with the corresponding cloud end-point list to run VMs of such Image.

Images:

VMDIRAC concept of an Image, is including a boot image and optionally the contextualization of such image.

End-points:

A cloud manager has at least one end-point in some API (f.e. OCCI, EC2 or native APIs). An End-point section has all the specific values for the use of a cloud manager end-point.

Images setup to run DIRAC Virtual Machines

NEW VMDIRAC v1r1

VMDIRAC v1r1 provides an enterprise contextualization method, based in a golden image with cloudinit installed, and some automatic scripts for each cloud endpoint and DIRAC image to run a VM. Supported cloudmangers for cloudinit with DIRAC:

  • OpenNebula 4.6.1 (client installation in VMDIRAC server: rocci client 4.2.5)
  • Openstack Grizzly (API installation in VMDIRAC server: apache-libcloud-0.14.0-beta3)
  • Amazon EC2 (API boto) NEW in v1r3

VMDIRAC is using a script builder, which creates, on-the-fly, a minimal cloudinit script taking CS parameters (RunningPods/Requitements, Images/clouinit, CloudEndpoints) and a common static script template parsing the common parameters, downloading and launching the contextualization scripts for specific communities needs, see for example EGI Fedcloud scprit. In this way, most of the contextualization updates remain in whatever URL, without the need of modifiying the contextualization logic at VMDIRAC level. Another assets for adminsitrators is that a single context script can be used for all the cloud managers, golden images, and endpoints, while DIRAC automates the specifics from CS.

Note about available contextualization templates and scripts at WorkloadManagement/private/bootstrap: Every community should copy the most convenient scripts and create their onws, modifications of existing scripts will not be accepted for pull-requests

Note about golden images Available templates and scripts are designed to work with regular golden images including cloudinit utilities (tested in SL 6, Centos 6 and Ubuntu 14)

CernVM 3: CernVM 3 has a major impediment for elastic clouds: it has a two phases booting sequences:

  1. a Minimal image boot loader is preparing the root filesystem with cernvmfs.
  2. the root filesystem is booted, cernvmfs is downloading only the necessary files to such booting (about 200Mb) the rest of the files would be uploaded only when required by user.

Inconvenient: Such 200Mb downlaoded take about 20-30 min to download in a standar cloud manager, which is incompatible with elastic cloud.

Possible tested solution: To install in each hypervisor a squid proxy and iptables redirections in order to increase booting time after the first image files download. To connect these local squids with the site squid.

Conclusion: While this problem is not resolved at your IaaS provider, then, CernVM3 is not recommended

Additionally, VMDIRAC supports generic ssh contextualization (for those images which can not install cloudinit), old contextualization methods (prolog/epilog, raw amiconfig) and ad-hoc images for testing and out-of-the-box VM creation, see other VM contextualization

VM horizontal auto-scaling setup

VMDIRAC can be configured with different policies for the creation and stoppage of the VMs. Each end-point has associated a VM allocation policy (vmPolicy) and a VM stoppage policy (vmStopPolicy).

VM allocation policy

The VM allocation policy can be elastic or static.

vmPolicy = static

static VM allocation is used when a IaaS provider defines a constant number of VM slots that can be accessed.

vmPolicy = elastic

The elastic allocation is used to create new VMs when there are jobs queued in DIRAC. For this purposed the Running Pod configuration section has the CPUPerInstance option, which defines the minimal overall CPU of the DIRAC jobs waiting in the task queued to submit a new VM. The parameter is used for the tuning of the VM delivery elasticity. Therefore, a CPUPerInstance can be set to a longer time to use the available resources in a more efficient manner, saving creation overheads, and to a shorter time to setup an exhaustive use of the available resources aiming to finish the production in a shorter total wall time, but with higher resource costs due to additional overhead.

In regular basis, CPUPerInstance references, from shorter to longer values, could be defined to:

######a) Zero to submit a new VM with no minimal CPU in the jobs of the tasks queue. ######b) A longer value could be the average required CPU of the jobs as a compromise solution between VM efficiency and total wall time. ######c) A very large value to maximize the efficiency in terms of VM creation overhead, for the cases where the production total wall time is not a constrain.

VM stoppage policy

The VM stoppage policy can be setup to elastic or never. Anyway, VMs can be stopped by the VM operator or by the HEPiX stoppage using CernVM images in responsibility of each IaaS provider. If a running VM is required to be stopped, then the VM orderly stops, then halting the VM.

vmStopPolicy = never

The VM stoppage is not depending on jobs running, only by external stoppage request (VM operators at VMDIRAC portal, Iaas provider)

vmStopPolicy = elastic

Elastic policy stops the VM if there are no more jobs running in the last VM halting margin time, which is an option to be setup.

VMDIRAC Release information:

[rel-v1r4]
NEW: A image can have different bootImageName and Flavor depending in the cloud endpoint
NEW: Try to halt a VM when declared Stalled, if success then go Halt status
Several bugfixes, particularly of interest is:
    FIX: UniqueID up to 255 VARCHAR
    requires a VirtualMachine_DB update to the field size
Bug fixes:
FIX: UniqueID up to 255 VARCHAR  [Víctor Méndez]
FIX: nova pulic ip could be none in delete VM  [Víctor Méndez]
FIX: No hurry to declare a VM in Stalled  [Víctor Méndez]
FIX: lastest breakseq sw stack in opt  [Víctor Méndez]
FIX: get public key in eth1 only for ssh context method  [Víctor Méndez]
FIX: enable both user proxy and cert for VM cred  [Víctor Méndez]
UPGRADE with DIRAC v6r14: user proxy  in a pilot way  [Víctor Méndez]
FIX: if user proxy then copy to tmp and dirac-configure with proxy  [Víctor Méndez]
FIX: If proxy credentials dirac-configuration with user  proxy  [Víctor Méndez]
FIX: Avoiding exceed user data size, when long proxy as cert and key  [Víctor Méndez]
UPGRADE with DIRAC v6r14 uses DataManager instead of ReplicaManger  [Víctor Méndez]


[v1r3]
NEW: Cloudinit contextualization for Amazon EC2 using boto API.
CHANGE: AmazonImage and AmazonInstance has changed to the same design as openstack and opennebula, using Utilitites/configuration.py and contextualization.py

  
[v1r2] VMDIRACWebApp with the new DIRAC Web technology 

[v1r1] Stable with cloudinti as pefered contextualization method, tested for openNebula and OpenStack

[v0r1]
STABLE: After some testing this PR is collecting different BUGFIXes from v1r0pre
NEW: and a big NEW feature cloudinit
cloudinit is a common contextualization method for different cloudmanagers and most of Images it is recomended in terms of image management and federated cloud compexity, with cloudinti VMDIRAC is able to implement a complete automate common contextualization for different cloud mangers
Supported cloudmangers for cloudinit with DIRAC should be at least:
OpenNebula 4.6.1 (client installation in VMDIRAC server: rocci client 4.2.5)
Openstack Grizzly (client installation in VMDIRAC server: apache-libcloud-0.14.0-beta3)

[v0r9]
TESTING: VOMS support for OpenStack (with non official apache-libcloud)

[v0r8]
NEW: rOCCI 1.1 DIRAC driver.
     rOCCI authentication by X509 proxies with third-party VOMS (rOCCI do the matching work in a transparent manner)
     Generic SSH contextualization DIRAC client to all Operating System Images and Cloud Managers (software convergence).
     In current release, SSH contextualization has been tested for OpenNebula and OpenStack.
NEW: VM local dirac.cfg updater agent for pilot/dirac release is updated
CHANGE: OcciImage and Occi09 migrated to new WMS/Utilities style

[v0r7]
NEW: endpoint vmPolicy "static" -> slots driven, indicated by maxEndpointInstances, endpoint CS parameter
     endpoint vmPolicy "elastic" -> jobs driven, one by one

NEW: endpoint vmStopPolicy "never"
     endpoint vmStopPolicy "elastic" -> no more jobs + VM halting margin time
CHANGE: Both cases: VMs can be stoped from "Instances Overview" screen: VirtualMachineDB.py in function instanceIDHeartBeat:
Send empty dir just in case we want to send flags (such as stop vm)
TODO: The particular HEPiX method to ask VMs to shootdown from the IaaS site provider site (requirement to be specified by HEPiX)

[v0r6]

NEW: nova-1.1 driver and ssh contextualization method, ready to extend to amiconfig contextualization.

[v0r5]

Multi-endpoint OpenNebula and CloudStack in a fed way. Running-pads, DIRAC-Images and Endpoints scheme.

[v0r4]

FIX: Added missing occi and cloud director files

[v0r3]

NEW: Redesign of VMDirector to allow for more Cloud provider drivers without modifications to the VM core components.
NEW: An image is componed of a bootstrap image and context files. Optionally a context image can be included also.
CHANGE: A CS Cloud Endpoint has all the contextualization and configuration parameters for a specific endpoint
CHANGE: A running Pod contains a DIRAC image, a list of endpoings and the necessary parameters and requirements.

[v0r2]

Initial version of VMDIRAC. It includes support for Occi and Amazon clouds.

RFC:

The VMDIRAC RFCs are descriptions of proposals for new functionalities which are exposed by the authors to the VMDIRAC development team for comments. The RFC description is maintained and updated in the corresponding VMDIRAC Wiki page. Each new RFC must have a distinct number by which it can be referred to, the author and the date of the first submission.

The current RFCs:

RFC #1: Renewal proxy for the VMs instead of a cert (pub,key)