Skip to content

Commit

Permalink
feat: Support graceful shutdown (#407)
Browse files Browse the repository at this point in the history
* feat: Support graceful shutdown

* update docs

* docs

* changelog

* link code in docs

* increase default of datanodes to 30 min

* move into constants

* use new operator-rs

* docs: Format 15 minutes

* Use new operator-rs

* improve docs

* fix link

* use operator-rs 0.55.0

* fixup

* improve docs

* set error context

* Added a high level description of graceful shutdown

* Revert "Added a high level description of graceful shutdown"

This reverts commit 7733ec1.

Moved to stackabletech/documentation#473

---------

Co-authored-by: Jim Halfpenny <jim@source321.com>
  • Loading branch information
sbernauer and Jimvin authored Oct 19, 2023
1 parent c120c0a commit feb1da3
Show file tree
Hide file tree
Showing 9 changed files with 134 additions and 10 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ All notable changes to this project will be documented in this file.
- Default stackableVersion to operator version ([#381]).
- Configuration overrides for the JVM security properties, such as DNS caching ([#384]).
- Support PodDisruptionBudgets ([#394]).
- Support graceful shutdown ([#407]).
- Added support for 3.2.4, 3.3.6 ([#409]).

### Changed
Expand All @@ -33,6 +34,7 @@ All notable changes to this project will be documented in this file.
[#402]: https://github.com/stackabletech/hdfs-operator/pull/402
[#404]: https://github.com/stackabletech/hdfs-operator/pull/404
[#405]: https://github.com/stackabletech/hdfs-operator/pull/405
[#407]: https://github.com/stackabletech/hdfs-operator/pull/407
[#409]: https://github.com/stackabletech/hdfs-operator/pull/409

## [23.7.0] - 2023-07-14
Expand Down
24 changes: 24 additions & 0 deletions deploy/helm/hdfs-operator/crds/crds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -576,6 +576,10 @@ spec:
type: array
type: object
type: object
gracefulShutdownTimeout:
description: Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
nullable: true
type: string
logging:
default:
enableVectorAgent: null
Expand Down Expand Up @@ -4069,6 +4073,10 @@ spec:
type: array
type: object
type: object
gracefulShutdownTimeout:
description: Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
nullable: true
type: string
logging:
default:
enableVectorAgent: null
Expand Down Expand Up @@ -7621,6 +7629,10 @@ spec:
type: array
type: object
type: object
gracefulShutdownTimeout:
description: Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
nullable: true
type: string
logging:
default:
enableVectorAgent: null
Expand Down Expand Up @@ -11105,6 +11117,10 @@ spec:
type: array
type: object
type: object
gracefulShutdownTimeout:
description: Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
nullable: true
type: string
logging:
default:
enableVectorAgent: null
Expand Down Expand Up @@ -14606,6 +14622,10 @@ spec:
type: array
type: object
type: object
gracefulShutdownTimeout:
description: Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
nullable: true
type: string
logging:
default:
enableVectorAgent: null
Expand Down Expand Up @@ -18090,6 +18110,10 @@ spec:
type: array
type: object
type: object
gracefulShutdownTimeout:
description: Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
nullable: true
type: string
logging:
default:
enableVectorAgent: null
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,36 @@
= Graceful shutdown

Graceful shutdown of HDFS nodes is either not supported by the product itself
or we have not implemented it yet.
You can configure the graceful shutdown as described in xref:concepts:operations/graceful_shutdown.adoc[].

Outstanding implementation work for the graceful shutdowns of all products where this functionality is relevant is tracked in https://github.com/stackabletech/issues/issues/357
== JournalNodes

As a default, JournalNodes have `15 minutes` to terminate gracefully.

The JournalNode process will always run as PID `1` and will get a `SIGTERM` once Kubernetes wants to terminate the Pod.
It will log the received signal as show in the log below and initiate a graceful shutdown.
After the graceful shutdown timeout is passed and the process still didn't exit, Kubernetes will issue an `SIGKILL` to force-kill the process.

https://github.com/apache/hadoop/blob/a585a73c3e02ac62350c136643a5e7f6095a3dbb/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L2004[This] is the relevant code that gets executed in the JournalNodes as of HDFS version `3.3.4`.

[source,text]
----
2023-10-10 13:37:41,525 ERROR server.JournalNode (LogAdapter.java:error(75)) - RECEIVED SIGNAL 15: SIGTERM
2023-10-10 13:37:41,526 INFO server.JournalNode (LogAdapter.java:info(51)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down JournalNode at hdfs-journalnode-default-0/10.244.0.38
************************************************************/
----

== NameNodes

As a default, NameNodes have `15 minutes` to terminate gracefully.
They go through the same mechanism as documented for the <<_journalnodes>> above.

https://github.com/apache/hadoop/blob/a585a73c3e02ac62350c136643a5e7f6095a3dbb/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java#L1080[This] is the relevant code that gets executed in the NameNodes as of HDFS version `3.3.4`.

== DataNodes

As a default, DataNodes have `30 minutes` to terminate gracefully.
They go through the same mechanism as documented for the <<_journalnodes>> above.

https://github.com/apache/hadoop/blob/a585a73c3e02ac62350c136643a5e7f6095a3dbb/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNode.java#L272[This] is the relevant code that gets executed in the DataNodes as of HDFS version `3.3.4`.
9 changes: 9 additions & 0 deletions rust/crd/src/constants.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
use stackable_operator::time::Duration;

pub const DEFAULT_DFS_REPLICATION_FACTOR: u8 = 3;

pub const CONTROLLER_NAME: &str = "hdfsclusters.hdfs.stackable.tech";
Expand Down Expand Up @@ -41,6 +43,13 @@ pub const DEFAULT_JOURNAL_NODE_HTTP_PORT: u16 = 8480;
pub const DEFAULT_JOURNAL_NODE_HTTPS_PORT: u16 = 8481;
pub const DEFAULT_JOURNAL_NODE_RPC_PORT: u16 = 8485;

pub const DEFAULT_JOURNAL_NODE_GRACEFUL_SHUTDOWN_TIMEOUT: Duration =
Duration::from_minutes_unchecked(15);
pub const DEFAULT_NAME_NODE_GRACEFUL_SHUTDOWN_TIMEOUT: Duration =
Duration::from_minutes_unchecked(15);
pub const DEFAULT_DATA_NODE_GRACEFUL_SHUTDOWN_TIMEOUT: Duration =
Duration::from_minutes_unchecked(30);

// hdfs-site.xml
pub const DFS_NAMENODE_NAME_DIR: &str = "dfs.namenode.name.dir";
pub const DFS_NAMENODE_SHARED_EDITS_DIR: &str = "dfs.namenode.shared.edits.dir";
Expand Down
26 changes: 26 additions & 0 deletions rust/crd/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ use stackable_operator::{
role_utils::{GenericRoleConfig, Role, RoleGroup, RoleGroupRef},
schemars::{self, JsonSchema},
status::condition::{ClusterCondition, HasStatusCondition},
time::Duration,
};
use std::collections::{BTreeMap, HashMap};
use storage::{
Expand Down Expand Up @@ -156,6 +157,7 @@ pub trait MergedConfig {
None
}
fn affinity(&self) -> &StackableAffinity;
fn graceful_shutdown_timeout(&self) -> Option<&Duration>;
/// Main container shared by all roles
fn hdfs_logging(&self) -> ContainerLogConfig;
/// Vector container shared by all roles
Expand Down Expand Up @@ -841,6 +843,9 @@ pub struct NameNodeConfig {
pub logging: Logging<NameNodeContainer>,
#[fragment_attrs(serde(default))]
pub affinity: StackableAffinity,
#[fragment_attrs(serde(default))]
/// Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
pub graceful_shutdown_timeout: Option<Duration>,
}

impl MergedConfig for NameNodeConfig {
Expand All @@ -852,6 +857,10 @@ impl MergedConfig for NameNodeConfig {
&self.affinity
}

fn graceful_shutdown_timeout(&self) -> Option<&Duration> {
self.graceful_shutdown_timeout.as_ref()
}

fn hdfs_logging(&self) -> ContainerLogConfig {
self.logging
.containers
Expand Down Expand Up @@ -916,6 +925,7 @@ impl NameNodeConfigFragment {
},
logging: product_logging::spec::default_logging(),
affinity: get_affinity(cluster_name, role),
graceful_shutdown_timeout: Some(DEFAULT_NAME_NODE_GRACEFUL_SHUTDOWN_TIMEOUT),
}
}
}
Expand Down Expand Up @@ -1001,6 +1011,9 @@ pub struct DataNodeConfig {
pub logging: Logging<DataNodeContainer>,
#[fragment_attrs(serde(default))]
pub affinity: StackableAffinity,
#[fragment_attrs(serde(default))]
/// Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
pub graceful_shutdown_timeout: Option<Duration>,
}

impl MergedConfig for DataNodeConfig {
Expand All @@ -1014,6 +1027,10 @@ impl MergedConfig for DataNodeConfig {
&self.affinity
}

fn graceful_shutdown_timeout(&self) -> Option<&Duration> {
self.graceful_shutdown_timeout.as_ref()
}

fn hdfs_logging(&self) -> ContainerLogConfig {
self.logging
.containers
Expand Down Expand Up @@ -1069,6 +1086,7 @@ impl DataNodeConfigFragment {
},
logging: product_logging::spec::default_logging(),
affinity: get_affinity(cluster_name, role),
graceful_shutdown_timeout: Some(DEFAULT_DATA_NODE_GRACEFUL_SHUTDOWN_TIMEOUT),
}
}
}
Expand Down Expand Up @@ -1152,6 +1170,9 @@ pub struct JournalNodeConfig {
pub logging: Logging<JournalNodeContainer>,
#[fragment_attrs(serde(default))]
pub affinity: StackableAffinity,
#[fragment_attrs(serde(default))]
/// Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
pub graceful_shutdown_timeout: Option<Duration>,
}

impl MergedConfig for JournalNodeConfig {
Expand All @@ -1163,6 +1184,10 @@ impl MergedConfig for JournalNodeConfig {
&self.affinity
}

fn graceful_shutdown_timeout(&self) -> Option<&Duration> {
self.graceful_shutdown_timeout.as_ref()
}

fn hdfs_logging(&self) -> ContainerLogConfig {
self.logging
.containers
Expand Down Expand Up @@ -1206,6 +1231,7 @@ impl JournalNodeConfigFragment {
},
logging: product_logging::spec::default_logging(),
affinity: get_affinity(cluster_name, role),
graceful_shutdown_timeout: Some(DEFAULT_JOURNAL_NODE_GRACEFUL_SHUTDOWN_TIMEOUT),
}
}
}
Expand Down
17 changes: 10 additions & 7 deletions rust/operator/src/hdfs_controller.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ use crate::{
discovery::build_discovery_configmap,
event::{build_invalid_replica_message, publish_event},
kerberos,
operations::pdb::add_pdbs,
operations::{self, graceful_shutdown::add_graceful_shutdown_config, pdb::add_pdbs},
product_logging::{extend_role_group_config_map, resolve_vector_aggregator_address},
OPERATOR_NAME,
};
Expand Down Expand Up @@ -166,14 +166,15 @@ pub enum Error {
"kerberos not supported for HDFS versions < 3.3.x. Please use at least version 3.3.x"
))]
KerberosNotSupported {},
#[snafu(display(
"failed to serialize [{JVM_SECURITY_PROPERTIES_FILE}] for {}",
rolegroup
))]
JvmSecurityPoperties {
#[snafu(display("failed to serialize [{JVM_SECURITY_PROPERTIES_FILE}] for {rolegroup}",))]
JvmSecurityProperties {
source: stackable_operator::product_config::writer::PropertiesWriterError,
rolegroup: String,
},
#[snafu(display("failed to configure graceful shutdown"), context(false))]
GracefulShutdown {
source: operations::graceful_shutdown::Error,
},
}

impl ReconcilerError for Error {
Expand Down Expand Up @@ -599,7 +600,7 @@ fn rolegroup_config_map(
.add_data(
JVM_SECURITY_PROPERTIES_FILE,
to_java_properties_string(jvm_sec_props.iter()).with_context(|_| {
JvmSecurityPopertiesSnafu {
JvmSecurityPropertiesSnafu {
rolegroup: rolegroup_ref.role_group.clone(),
}
})?,
Expand Down Expand Up @@ -667,6 +668,8 @@ fn rolegroup_statefulset(
)
.context(FailedToCreateContainerAndVolumeConfigurationSnafu)?;

add_graceful_shutdown_config(merged_config, &mut pb)?;

let mut pod_template = pb.build_template();
if let Some(pod_overrides) = hdfs.pod_overrides_for_role(role) {
pod_template.merge_from(pod_overrides.clone());
Expand Down
26 changes: 26 additions & 0 deletions rust/operator/src/operations/graceful_shutdown.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
use snafu::{ResultExt, Snafu};
use stackable_hdfs_crd::MergedConfig;
use stackable_operator::builder::PodBuilder;

#[derive(Debug, Snafu)]
pub enum Error {
#[snafu(display("Failed to set terminationGracePeriod"))]
SetTerminationGracePeriod {
source: stackable_operator::builder::pod::Error,
},
}

pub fn add_graceful_shutdown_config(
merged_config: &(dyn MergedConfig + Send + 'static),
pod_builder: &mut PodBuilder,
) -> Result<(), Error> {
// This must be always set by the merge mechanism, as we provide a default value,
// users can not disable graceful shutdown.
if let Some(graceful_shutdown_timeout) = merged_config.graceful_shutdown_timeout() {
pod_builder
.termination_grace_period(graceful_shutdown_timeout)
.context(SetTerminationGracePeriodSnafu)?;
}

Ok(())
}
1 change: 1 addition & 0 deletions rust/operator/src/operations/mod.rs
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
pub mod graceful_shutdown;
pub mod pdb;
3 changes: 3 additions & 0 deletions tests/templates/kuttl/smoke/30-assert.yaml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ spec:
- name: vector
{% endif %}
- name: zkfc
terminationGracePeriodSeconds: 900
status:
readyReplicas: 2
replicas: 2
Expand All @@ -46,6 +47,7 @@ spec:
{% if lookup('env', 'VECTOR_AGGREGATOR') %}
- name: vector
{% endif %}
terminationGracePeriodSeconds: 900
status:
readyReplicas: 1
replicas: 1
Expand All @@ -69,6 +71,7 @@ spec:
{% if lookup('env', 'VECTOR_AGGREGATOR') %}
- name: vector
{% endif %}
terminationGracePeriodSeconds: 1800
status:
readyReplicas: {{ test_scenario['values']['number-of-datanodes'] }}
replicas: {{ test_scenario['values']['number-of-datanodes'] }}
Expand Down

0 comments on commit feb1da3

Please sign in to comment.