This Dockerfile may be used to bootstrap a Ceph cluster with all the Ceph daemons running. To run a certain type of daemon, simply use the name of the daemon as $1
. Valid values are:
mon
deploys a Ceph monitorosd
deploys an OSD using the method specified byOSD_TYPE
osd_directory
deploys one or multiple OSDs in a single container using a prepared directory (used in scenario where the operator doesn't want to use--privileged=true
)osd_directory_single
deploys an single OSD per container using a prepared directory (used in scenario where the operator doesn't want to use--privileged=true
)osd_ceph_disk
deploys an OSD using ceph-disk, so you have to provide a whole device (ie: /dev/sdb)mds
deploys a MDSrgw
deploys a Rados Gateway
You can use this container to bootstrap any Ceph daemon.
CLUSTER
is the name of the cluster (DEFAULT: ceph)
If SELinux is enabled, run the following commands:
sudo chcon -Rt svirt_sandbox_file_t /etc/ceph
sudo chcon -Rt svirt_sandbox_file_t /var/lib/ceph
We currently support one KV backend to store our configuration flags, keys and maps: etcd.
There is a ceph.defaults
config file in the image that is used for defaults to bootstrap daemons. It will add the keys if they are not already present. You can either pre-populate the KV store with your own settings, or provide a ceph.defaults config file. To supply your own defaults, make sure to mount the /etc/ceph/ volume and place your ceph.defaults file there.
Important variables in ceph.defaults
to add/change when you bootstrap an OSD:
/osd/osd_journal_size
/osd/cluster_network
/osd/public_network
Note: cluster_network
and public_network
are currently not populated in the defaults, but can be passed as environment variables with -e CEPH_PUBLIC_NETWORK=...
for more flexibility
docker run -d --net=host \
-e KV_TYPE=etcd \
-e KV_IP=127.0.0.1 \
-e KV_PORT=2379 \
ceph/daemon populate_kvstore
Sometimes you might want to destroy partition tables from a disk. For this you can use the zap_device
scenario that works as follow:
docker run -d --privileged=true \
-v /dev/:/dev/ \
-e OSD_DEVICE=/dev/sdd \
ceph/daemon zap_device
A monitor requires some persistent storage for the docker container. If a KV store is used, /etc/ceph
will be auto-generated from data kept in the KV store. /var/lib/ceph
, however, must be provided by a docker volume. The ceph mon will periodically store data into /var/lib/ceph
, including the latest copy of the CRUSH map. If a mon restarts, it will attempt to download the latest monmap and CRUSH map from other peer monitors. However, if all mon daemons have gone down, monitors must be able to recover their previous maps. The docker volume used for /var/lib/ceph
should be backed by some durable storage, and must be able to survive container and node restarts.
Without KV store, run:
docker run -d --net=host \
-v /etc/ceph:/etc/ceph \
-v /var/lib/ceph/:/var/lib/ceph/ \
-e MON_IP=192.168.0.20 \
-e CEPH_PUBLIC_NETWORK=192.168.0.0/24 \
ceph/daemon mon
With KV store, run:
docker run -d --net=host \
-v /var/lib/ceph:/var/lib/ceph \
-e MON_IP=192.168.0.20 \
-e CEPH_PUBLIC_NETWORK=192.168.0.0/24 \
-e KV_TYPE=etcd \
-e KV_IP=192.168.0.20 \
ceph/daemon mon
List of available options:
-
MON_NAME
: name of the monitor (default to hostname) -
CEPH_PUBLIC_NETWORK
: CIDR of the host running Docker, it should be in the same network as theMON_IP
-
CEPH_CLUSTER_NETWORK
: CIDR of a secondary interface of the host running Docker. Used for the OSD replication traffic -
MON_IP
: IP address of the host running Docker -
NETWORK_AUTO_DETECT
: Whether and how to attempt IP and network autodetection. Meant to be used without--net=host
.- 0 = Do not detect (default)
- 1 = Detect IPv6, fallback to IPv4 (if no globally-routable IPv6 address detected)
- 4 = Detect IPv4 only
- 6 = Detect IPv6 only
Since luminous, a manager daemon is mandatory, see docs
Without KV store, run:
docker run -d --net=host \
-v /etc/ceph:/etc/ceph \
-v /var/lib/ceph/:/var/lib/ceph/ \
ceph/daemon mgr
With KV store, run:
docker run -d --net=host \
-v /var/lib/ceph:/var/lib/ceph \
-e KV_TYPE=etcd \
-e KV_IP=192.168.0.20 \
ceph/daemon mgr
There are four available OSD_TYPE
values:
<none>
- if noOSD_TYPE
is set; one ofdisk
,activate
ordirectory
will be used based on autodetection of the current OSD bootstrap stateactivate
- the daemon expects to be passed a block device of aceph-disk
-prepared disk (via theOSD_DEVICE
environment variable); no bootstrapping will be performeddirectory
- the daemon expects to find the OSD filesystem(s) already mounted in/var/lib/ceph/osd/
disk
- the daemon expects to be passed a block device via theOSD_DEVICE
environment variableprepare
- the daemon expects to be passed a block device and runceph-disk
prepare to bootstrap the disk (via theOSD_DEVICE
environment variable)
Options for OSDs (TODO: consolidate these options between the types):
JOURNAL_DIR
- if provided, new OSDs will be bootstrapped to use the specified directory as a common journal area. This is usually used to store the journals for more than one OSD on a common, separate disk. This currently only applies to thedirectory
OSD type.JOURNAL
- if provided, the new OSD will be bootstrapped to use the specified journal file (if you do not wish to use the default). This is currently only supported by thedirectory
OSD typeOSD_DEVICE
- mandatory foractivate
anddisk
OSD types; this specifies which block device to use as the OSDOSD_JOURNAL
- optional override of the OSD journal file. this only applies to theactivate
anddisk
OSD types
If the operator does not specify an OSD_TYPE
autodetection happens:
disk
is used if no bootstrapped OSD is found.activate
is used if a bootstrapped OSD is found andOSD_DEVICE
is also provided.directory
is used if a bootstrapped OSD is found and noOSD_DEVICE
is provided.
Without KV backend:
docker run -d --net=host \
--pid=host \
--privileged=true \
-v /etc/ceph:/etc/ceph \
-v /var/lib/ceph/:/var/lib/ceph/ \
-v /dev/:/dev/ \
-e OSD_DEVICE=/dev/vdd \
ceph/daemon osd
With KV backend:
docker run -d --net=host \
--privileged=true \
--pid=host \
-v /dev/:/dev/ \
-e OSD_DEVICE=/dev/vdd \
-e KV_TYPE=etcd \
-e KV_IP=192.168.0.20 \
ceph/daemon osd
Without KV backend:
docker run -d --net=host \
--privileged=true \
--pid=host \
-v /etc/ceph:/etc/ceph \
-v /var/lib/ceph/:/var/lib/ceph/ \
-v /dev/:/dev/ \
-e OSD_DEVICE=/dev/vdd \
-e OSD_TYPE=disk \
ceph/daemon osd
Using bluestore:
docker run -d --net=host \
--privileged=true \
--pid=host \
-v /etc/ceph:/etc/ceph \
-v /var/lib/ceph/:/var/lib/ceph/ \
-v /dev/:/dev/ \
-e OSD_DEVICE=/dev/vdd \
-e OSD_TYPE=disk \
-e OSD_BLUESTORE=1 \
ceph/daemon osd
Using dmcrypt:
docker run -d --net=host \
--privileged=true \
--pid=host \
-v /etc/ceph:/etc/ceph \
-v /var/lib/ceph/:/var/lib/ceph/ \
-v /dev/:/dev/ \
-e OSD_DEVICE=/dev/vdd \
-e OSD_TYPE=disk \
-e OSD_DMCRYPT=1 \
ceph/daemon osd
With KV backend:
docker run -d --net=host \
--privileged=true \
--pid=host \
-v /dev/:/dev/ \
-e OSD_DEVICE=/dev/vdd \
-e OSD_TYPE=disk \
-e KV_TYPE=etcd \
-e KV_IP=192.168.0.20 \
ceph/daemon osd
Using bluestore with KV backend:
docker run -d --net=host \
--privileged=true \
--pid=host \
-v /dev/:/dev/ \
-e OSD_DEVICE=/dev/vdd \
-e OSD_TYPE=disk \
-e KV_TYPE=etcd \
-e KV_IP=192.168.0.20 \
-e OSD_BLUESTORE=1 \
ceph/daemon osd
Using dmcrypt with KV backend:
docker run -d --net=host \
--privileged=true \
--pid=host \
-v /dev/:/dev/ \
-e OSD_DEVICE=/dev/vdd \
-e OSD_TYPE=disk \
-e KV_TYPE=etcd \
-e KV_IP=192.168.0.20 \
-e OSD_DMCRYPT=1 \
ceph/daemon osd
List of available options:
OSD_DEVICE
is the OSD deviceOSD_JOURNAL
is the journal for a given OSDHOSTNAME
is used to place the OSD in the CRUSH map
If you do not want to use --privileged=true
, please fall back on the second example.
This function is balance between ceph-disk and osd directory where the operator can use ceph-disk outside of the container (directly on the host) to prepare the devices. Devices will be prepared with ceph-disk prepare
, then they will get activated inside the container. A priviledged container is still required as ceph-disk needs to access /dev/. So this has minimum value compare to the ceph-disk but might fit some use cases where the operators want to prepare their devices outside of a container.
docker run -d --net=host \
--privileged=true \
--pid=host \
-v /etc/ceph:/etc/ceph \
-v /var/lib/ceph/:/var/lib/ceph/ \
-v /dev/:/dev/ \
-e OSD_DEVICE=/dev/vdd \
-e OSD_TYPE=activate \
ceph/daemon osd
There are a number of environment variables which are used to configure the execution of the OSD:
CLUSTER
is the name of the ceph cluster (defaults toceph
)
If the OSD is not already created (key, configuration, OSD data), the following environment variables will control its creation:
WEIGHT
is the of the OSD when it is added to the CRUSH map (default is1.0
)JOURNAL
is the location of the journal (default is thejournal
file inside the OSD data directory)HOSTNAME
is the name of the host; it is used as a flag when adding the OSD to the CRUSH map
The old option OSD_ID
is now unused. Instead, the script will scan for each directory in /var/lib/ceph/osd
of the form <cluster>-<osd_id>
.
To create your OSDs simply run the following command:
docker exec <mon-container-id> ceph osd create
.
Note that we now default to dropping root privileges, so it is important to set the proper ownership for your OSD directories. The Ceph OSD runs as UID:64045, GID:64045, so:
chown -R 64045:64045 /var/lib/ceph/osd/
There is a problem when attempting run run multiple OSD containers on a single docker host. See issue #19.
There are two workarounds, at present:
- Run each OSD with the
--pid=host
option - Run multiple OSDs within the same container
To run multiple OSDs within the same container, simply bind-mount each OSD datastore directory:
docker run -v /osds/1:/var/lib/ceph/osd/ceph-1 -v /osds/2:/var/lib/ceph/osd/ceph-2
Ceph OSD directory single has a similar design to Ceph OSD directory since they both aim to run OSD processes from an already bootstrapped directory. So we assume the OSD directory has been populated already. The major different is that Ceph OSD directory single has a much simpler implementation since it only runs a single OSD process per container. It doesn't do anything with the journal as it assumes journal's symlink was provided during the initialization sequence of the OSD.
This scenario goes through the OSD directory (/var/lib/ceph/osd
) and looks for OSDs that don't have a lock held by any other OSD. If no lock is found, the OSD process starts. If all the OSDs are already running, we gently exit 0 and explain that all the OSDs are already running.
Important note: if you are aiming at running multiple OSD containers on a same machine (things that you will likely do with Ceph anyway), you must enable --pid=host
. However if you are running Docker 1.12 (based on moby/moby#22481), you can just share the same PID namespace for the OSD containers only using: --pid=container:<id>
.
If your OSD is BTRFS and you want to use PARALLEL journal mode, you will need to run this container with --privileged
set to true. Otherwise, ceph-osd
will have insufficient permissions and it will revert to the slower WRITEAHEAD mode.
Re: [Ulexus/docker-ceph#5]
A user has reported a consterning (and difficult to diagnose) problem wherein the OSD crashes frequently due to Docker running out of sufficient open file handles. This is understandable, as the OSDs use a great many ports during periods of high traffic. It is, therefore, recommended that you increase the number of open file handles available to Docker.
On CoreOS (and probably other systemd-based systems), you can do this by creating the a file named /etc/systemd/system/docker.service.d/limits.conf
with content something like:
[Service]
LimitNOFILE=4096
By default, the MDS does NOT create a ceph filesystem. If you wish to have this MDS create a ceph filesystem (it will only do this if the specified CEPHFS_NAME
does not already exist), you must set, at a minimum, CEPHFS_CREATE=1
. It is strongly recommended that you read the rest of this section, as well.
For most people, the defaults for the following optional environment variables are fine, but if you wish to customize the data and metadata pools in which your CephFS is stored, you may override the following as you wish:
CEPHFS_CREATE
: Whether to create the ceph filesystem (0 = no / 1 = yes), if it doesn't exist. Defaults to 0 (no)CEPHFS_NAME
: The name of the new ceph filesystem and the basis on which the later variables are created. Defaults tocephfs
CEPHFS_DATA_POOL
: The name of the data pool for the ceph filesystem. If it does not exist, it will be created. Defaults to${CEPHFS_NAME}_data
CEPHFS_DATA_POOL_PG
: The number of placement groups for the data pool. Defaults to8
CEPHFS_METADATA_POOL
: The name of the metadata pool for the ceph filesystem. If it does not exist, it will be created. Defaults to${CEPHFS_NAME}_metadata
CEPHFS_METADATA_POOL_PG
: The number of placement groups for the metadata pool. Defaults to8
Without KV backend, run:
docker run -d --net=host \
-v /var/lib/ceph/:/var/lib/ceph/ \
-v /etc/ceph:/etc/ceph \
-e CEPHFS_CREATE=1 \
ceph/daemon mds
With KV backend, run:
docker run -d --net=host \
-e CEPHFS_CREATE=1 \
-e KV_TYPE=etcd \
-e KV_IP=192.168.0.20 \
ceph/daemon mds
List of available options:
MDS_NAME
is the name the MDS server (DEFAULT: mds-$(hostname)). One thing to note is that metadata servers are not machine-restricted. They are not bound by their data directories and can move around the cluster. As a result, you can run more than one MDS on a single machine. If you plan to do so, you better set this variable and do something like:mds-$(hostname)-a
,mds-$(hostname)-b
etc...
For the Rados Gateway, we deploy it with civetweb
enabled by default. However it is possible to use different CGI frontends by simply giving remote address and port.
Without kv backend, run:
docker run -d --net=host \
-v /var/lib/ceph/:/var/lib/ceph/ \
-v /etc/ceph:/etc/ceph \
ceph/daemon rgw
With kv backend, run:
docker run -d --net=host \
-e KV_TYPE=etcd \
-e KV_IP=192.168.0.20 \
ceph/daemon rgw
List of available options:
RGW_CIVETWEB_PORT
is the port to which civetweb is listening on (DEFAULT: 8080)RGW_NAME
: default to hostnameRGW_ZONEGROUP
: zonegroup to use (DEFAULT: empty)RGW_ZONE
: zone to use (DEFAULT: empty)
Administration via radosgw-admin from the Docker host if the RGW_NAME
variable hasn't been supplied:
docker exec <containerId> radosgw-admin -n client.rgw.$(hostname) -k /var/lib/ceph/radosgw/$(hostname)/keyring <commands>
If otherwise, $(hostname)
has to be replaced by the value of RGW_NAME
.
To enable an external CGI interface instead of civetweb set:
RGW_REMOTE_CGI=1
RGW_REMOTE_CGI_HOST=192.168.0.1
RGW_REMOTE_CGI_PORT=9000
And run the container like this docker run -d -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph -e CEPH_DAEMON=RGW -e RGW_NAME=myrgw -p 9000:9000 -e RGW_REMOTE_CGI=1 -e RGW_REMOTE_CGI_HOST=192.168.0.1 -e RGW_REMOTE_CGI_PORT=9000 ceph/daemon
This is pretty straighforward. The --net=host
is not mandatory, if you don't use it do not forget to expose the RESTAPI_PORT
.
docker run -d --net=host \
-e KV_TYPE=etcd \
-e KV_IP=192.168.0.20 \
ceph/daemon restapi
This is pretty straighforward. The --net=host
is not mandatory, with KV we do:
docker run -d --net=host \
-e KV_TYPE=etcd \
-e KV_IP=192.168.0.20 \
ceph/daemon rbd_mirror
Without KV we do:
docker run -d --net=host \
ceph/daemon rbd_mirror
List of available options:
RESTAPI_IP
is the IP address to listen on (DEFAULT: 0.0.0.0)RESTAPI_PORT
is the listening port of the REST API (DEFAULT: 5000)RESTAPI_BASE_URL
is the base URL of the API (DEFAULT: /api/v0.1)RESTAPI_LOG_LEVEL
is the log level of the API (DEFAULT: warning)RESTAPI_LOG_FILE
is the location of the log file (DEFAULT: /var/log/ceph/ceph-restapi.log)