Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADH-5052] Support kerberos in docker test stand #116

Merged
merged 2 commits into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions build-images.sh
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@ case $CLUSTER_TYPE in
--build-arg="HADOOP_VERSION=${HADOOP_VERSION}" \
--build-arg="SSM_APP_VERSION=${SSM_APP_VERSION}" .

docker build -f ./supports/tools/docker/multihost/kerberos/Dockerfile-kdc -t cloud-hub.adsw.io/library/ssm-kdc-server:${HADOOP_VERSION} .

docker build -f ./supports/tools/docker/multihost/datanode/Dockerfile-hadoop-datanode -t cloud-hub.adsw.io/library/hadoop-datanode:${HADOOP_VERSION} .

docker build -f ./supports/tools/docker/multihost/namenode/Dockerfile-hadoop-namenode -t cloud-hub.adsw.io/library/hadoop-namenode:${HADOOP_VERSION} .
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,16 @@
import org.smartdata.action.ActionException;
import org.smartdata.hdfs.MiniClusterHarness;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Collectors;

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNotNull;
Expand Down Expand Up @@ -60,11 +65,10 @@ public void testCheckSumAction() throws IOException {
action.init(args);
action.run();

String expectedLog = "/testPath/file1\tMD5-of-1MD5-of-50CRC32C\t"
+ "000000320000000000000001cd5359474b0be93eb57b7f1aaf9f3f55\n";
List<String> expectedChecksumFiles = Collections.singletonList("/testPath/file1");

assertTrue(action.getExpectedAfterRun());
assertEquals(expectedLog, action.getActionStatus().getLog());
assertEquals(expectedChecksumFiles, getChecksumFiles());
}

@Test
Expand All @@ -82,16 +86,14 @@ public void testCheckSumActionDirectoryArg() throws IOException {
action.init(args);
action.run();

String expectedLog =
"/testPath/0\tMD5-of-0MD5-of-50CRC32C\t"
+ "00000032000000000000000067ec113c30452f3ebfda70343c1363cf\n"
+ "/testPath/1\tMD5-of-0MD5-of-50CRC32C\t"
+ "000000320000000000000000ecaf7978b63f94cc35068ff56ae97ecb\n"
+ "/testPath/2\tMD5-of-0MD5-of-50CRC32C\t"
+ "000000320000000000000000e90604bcd8b102008713620df0a3e56f\n";
List<String> expectedChecksumFiles = Arrays.asList(
"/testPath/0",
"/testPath/1",
"/testPath/2"
);

assertTrue(action.getExpectedAfterRun());
assertEquals(expectedLog, action.getActionStatus().getLog());
assertEquals(expectedChecksumFiles, getChecksumFiles());
}

@Test
Expand Down Expand Up @@ -121,4 +123,16 @@ public void testThrowIfDirectoryNotFound() throws IOException {
assertTrue(error instanceof ActionException);
assertEquals("Provided directory doesn't exist: /unknownDir/", error.getMessage());
}

private List<String> getChecksumFiles() throws UnsupportedEncodingException {
String[] logLines = Optional.ofNullable(action.getActionStatus().getLog())
.map(log -> log.split("\n"))
.orElse(new String[0]);

return Arrays.stream(logLines)
.map(line -> line.split("\t"))
.filter(tokens -> tokens.length != 0)
.map(tokens -> tokens[0])
.collect(Collectors.toList());
}
}
77 changes: 64 additions & 13 deletions supports/tools/docker/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Run Hadoop cluster with SSM in docker containers

There are two cluster types:

* singlehost
* multihost

Expand All @@ -19,7 +20,7 @@ Command to build docker images in singlehost cluster mode (from project root dir
./build-images.sh --cluster=singlehost --hadoop=3.3
```

Command to start docker containers
Command to start docker containers

```shell
cd ./supports/tools/docker
Expand All @@ -32,6 +33,7 @@ cd ./supports/tools/docker
* Hadoop namenode, node manager, resource manager in container
* SSM Server container
* SSM metastore as postgres container
* Kerberos KDC container

Command to build docker images in multihost cluster mode (from project root dir)

Expand All @@ -46,17 +48,53 @@ cd ./supports/tools/docker
./start-demo.sh --cluster=multihost --hadoop=3.3
```

Use one of the following credentials to log in to the Web UI

| Login | Password | Type |
|----------------|-----------|----------|
| john | 1234 | static |
| krb_user1@DEMO | krb_pass1 | kerberos |
| krb_user2@DEMO | krb_pass2 | kerberos |

### Testing SPNEGO auth

In order to test SPNEGO authentication provider, you need to:

1. Move the `supports/tools/docker/multihost/kerberos/krb5.conf` Kerberos configuration file to the `/etc` directory
(after backing up your old config file)
2. Log in to the KDC server with one of the Kerberos principals

```shell
kinit krb_user1
```

3. Add the following lines to the `/etc/hosts` file

```
127.0.0.1 ssm-server.demo
127.0.0.1 kdc-server.demo
```

4. Try to access any SSM resource. Following query should respond with code 200 and json body:

```shell
curl --negotiate http://ssm-server.demo:8081/api/v2/audit/events
```

# Run/Test SSM with Docker

Docker can greately reduce boring time for installing and maintaining software on servers and developer machines. This document presents this basic workflow of Run/test ssm with docker. [Docker Quick Start](https://docs.docker.com/get-started/)
Docker can greately reduce boring time for installing and maintaining software on servers and developer machines. This
document presents this basic workflow of Run/test ssm with
docker. [Docker Quick Start](https://docs.docker.com/get-started/)

## Necessary Components

### MetaStore(Postgresql) on Docker

#### Launch a postgresql container

Pull latest postgresql official image from docker store. You can use `postgres:tag` to specify the Postgresql version (`tag`) you want.
Pull latest postgresql official image from docker store. You can use `postgres:tag` to specify the Postgresql version (
`tag`) you want.

```
docker pull postgres
Expand All @@ -67,19 +105,22 @@ Launch a postgres container with a given {passowrd} on 5432, and create a test d
```bash
docker run -p 5432:5432 --name {container_name} -e POSTGRES_PASSWORD={password} -e POSTGRES_DB={database_name} -d postgres:latest
```

**Parameters:**

- `container_name` name of container
- `password` root password of user root for login and access.
- `database_name` Create a new database/schema with given name.
- `database_name` Create a new database/schema with given name.

### HDFS on Docker

**Note that this part is not suggested on OSX (mac), becasue the containers' newtork is limited on OSX.**

Pull a well-known third-party hadoop image from docker store. You can use `hadoop-docker:tag` to specify the Hadoop version (`tag`) you want.
Pull a well-known third-party hadoop image from docker store. You can use `hadoop-docker:tag` to specify the Hadoop
version (`tag`) you want.

#### Set a HDFS Container

```bash
docker pull sequenceiq/hadoop-docker
```
Expand All @@ -89,15 +130,19 @@ Launch a Hadoop container with a exposed namenode.rpcserver.
```bash
docker run -it --add-host=moby:127.0.0.1 --ulimit memlock=2024000000:2024000000 -p 9000:9000 --name=hadoop sequenceiq/hadoop-docker /etc/bootstrap.sh -bash
```
Note that we try to launch a interactive docker container. Use the following command to check HDFS status. We also set `memlock=2024000000` for cache size.

Note that we try to launch a interactive docker container. Use the following command to check HDFS status. We also set
`memlock=2024000000` for cache size.

```
cd $HADOOP_PREFIX
bin/hdfs dfs -ls /
```

#### Configure HDFS with multiple storage types and cache
Edit `$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml` and add the property below. This will turn off premission check to avoid `Access denied for user ***. Superuser privilege is required`.

Edit `$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml` and add the property below. This will turn off premission check to avoid
`Access denied for user ***. Superuser privilege is required`.

```
<property>
Expand All @@ -106,7 +151,8 @@ Edit `$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml` and add the property below. This
</property>
```

Create `/tmp/hadoop-root/dfs/data1~3` for different storage types. Delete all content in `/tmp/hadoop-root/dfs/data` and `/tmp/hadoop-root/dfs/name`, then use `bin/hdfs namenode -format` to format HDFS.
Create `/tmp/hadoop-root/dfs/data1~3` for different storage types. Delete all content in `/tmp/hadoop-root/dfs/data` and
`/tmp/hadoop-root/dfs/name`, then use `bin/hdfs namenode -format` to format HDFS.

Add the following properties to `$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml`.

Expand All @@ -129,8 +175,8 @@ Add the following properties to `$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml`.
</property>
```


Restart HDFS.

```
$HADOOP_PREFIX/sbin/stop-dfs.sh
$HADOOP_PREFIX/sbin/start-dfs.sh
Expand All @@ -151,14 +197,18 @@ Assuming you are in SSM root directory, modify `conf/druid.xml` to enable SSM to
<entry key="username">root</entry>
<entry key="password">{root_password}</entry>
```
Wait for at least 10 seconds. Then, use `bin/start-smart.sh -format` to format (re-init) the database. Also, you can use this command to clear all data in database in tests.

#### Stop/Remove Postgres container
Wait for at least 10 seconds. Then, use `bin/start-smart.sh -format` to format (re-init) the database. Also, you can use
this command to clear all data in database in tests.

You can use the `docker stop {contrainer_name}` to stop postgres container. Then, this postgres service cannot be accessed, until you start it again with `docker start {contrainer_name}`. Note that, `stop/start` will not remove any data from your postgres container.
#### Stop/Remove Postgres container

Use `docker rm {container_name}` to remove postgres container, if this container is not necessary. If you don't remember the specific name of container, you can use `docker ps -a` to look for it.
You can use the `docker stop {contrainer_name}` to stop postgres container. Then, this postgres service cannot be
accessed, until you start it again with `docker start {contrainer_name}`. Note that, `stop/start` will not remove any
data from your postgres container.

Use `docker rm {container_name}` to remove postgres container, if this container is not necessary. If you don't remember
the specific name of container, you can use `docker ps -a` to look for it.

### HDFS

Expand All @@ -167,6 +217,7 @@ Use `docker rm {container_name}` to remove postgres container, if this container
Configure `namenode.rpcserver` in `smart-site.xml`.

```xml

<configuration>
<property>
<name>smart.dfs.namenode.rpcserver</name>
Expand Down
2 changes: 1 addition & 1 deletion supports/tools/docker/multihost/Dockerfile-hadoop-base
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ENV SSM_HOME=/opt/ssm
ENV HADOOP_URL https://archive.apache.org/dist/hadoop/core/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
net-tools curl wget netcat procps gnupg libsnappy-dev && rm -rf /var/lib/apt/lists/*
net-tools curl wget netcat procps gnupg libsnappy-dev krb5-user && rm -rf /var/lib/apt/lists/*

# Install SSH server
RUN apt-get update \
Expand Down
2 changes: 1 addition & 1 deletion supports/tools/docker/multihost/conf/agents
Original file line number Diff line number Diff line change
@@ -1 +1 @@
hadoop-datanode
hadoop-datanode.demo
15 changes: 14 additions & 1 deletion supports/tools/docker/multihost/conf/core-site.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,24 @@
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-namenode:8020</value>
<value>hdfs://hadoop-namenode.demo:8020</value>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.smartdata.hadoop.filesystem.SmartFileSystem</value>
<description>The FileSystem for hdfs URL</description>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>

<property>
<name>smart.server.kerberos.principal</name>
<value>ssm/ssm-server.demo@DEMO</value>
</property>
</configuration>
2 changes: 1 addition & 1 deletion supports/tools/docker/multihost/conf/druid.xml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<entry key="url">jdbc:postgresql://ssm-metastore-db:5432/metastore</entry>
<entry key="url">jdbc:postgresql://ssm-metastore-db.demo:5432/metastore</entry>
<entry key="username">ssm</entry>
<entry key="password">ssm</entry>

Expand Down
54 changes: 46 additions & 8 deletions supports/tools/docker/multihost/conf/hdfs-site.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,30 +20,68 @@
<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- Turn security off for tests by default -->
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
</property>
<!-- Disable min block size since most tests use tiny blocks -->
<property>
<name>dfs.namenode.fs-limits.min-block-size</name>
<value>0</value>
</property>
<property>
<name>smart.server.rpc.address</name>
<value>ssm-server:7042</value>
<value>ssm-server.demo:7042</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>[RAM_DISK]file://hadoop/dfs/ram-data,[SSD]file://hadoop/dfs/ssd-data,[DISK]file://hadoop/dfs/data,[ARCHIVE]file://hadoop/dfs/archive-data</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
<name>hadoop.user.group.static.mapping.overrides</name>
<value>ssm=supergroup;agent=supergroup</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>1048576</value>
</property>
<property>
<name>dfs.namenode.keytab.file</name>
<value>/etc/secrets/namenode.keytab</value>
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>namenode/_HOST@DEMO</value>
</property>
<property>
<name>dfs.namenode.delegation.token.max-lifetime</name>
<value>604800000</value>
<description>The maximum lifetime in milliseconds for which a delegation token is valid.</description>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>/etc/secrets/datanode.keytab</value>
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>datanode/_HOST@DEMO</value>
</property>
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>

<!-- Set privileged ports -->
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:1004</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:1006</value>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>0.0.0.0:1007</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:1005</value>
</property>
</configuration>
2 changes: 1 addition & 1 deletion supports/tools/docker/multihost/conf/servers
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ssm-server
ssm-server.demo
Loading
Loading