from aztk.spark.models.plugins import PluginConfiguration, PluginFile,PluginPort, PluginTarget, PluginTargetRole
cluster_config = ClusterConfiguration(
...# Other config,
plugins=[
PluginConfiguration(
name="my-custom-plugin",
files=[
PluginFile("file.sh", "/my/local/path/to/file.sh"),
PluginFile("data/one.json", "/my/local/path/to/data/one.json"),
PluginFile("data/two.json", "/my/local/path/to/data/two.json"),
],
execute="file.sh", # This must be one of the files defined in the file list and match the target path,
env=dict(
SOME_ENV_VAR="foo"
),
args=["arg1"], # Those arguments are passed to your execute script
ports=[
PluginPort(internal="1234"), # Internal only(For node communication for example)
PluginPort(internal="2345", public=True), # Open the port to the public(When ssh into). Used for UI for example
],
# Pick where you want the plugin to run
target=PluginTarget.Host, # The script will be run on the host. Default value is to run in the spark container
target_role=PluginTargetRole.All, # If the plugin should be run only on the master worker or all. You can use environment variables(See below to have different master/worker config)
)
]
)
Name of your plugin(This will be used for creating folder, it is recommended to have a simple letter, dash, underscore only name)
List of files to upload
Script to execute. This script must be defined in the files above and must match its remote path
List of arguments to be passed to your execute scripts
List of environment variables to access in the script(This can be used to pass arguments to your script instead of args)
List of ports to open if the script is running in a container. A port can also be specific public and it will then be accessible when ssh into the master node.
Define where the execute script should be running. Potential values are PluginTarget.SparkContainer(Default)
and PluginTarget.Host
If the plugin should be run only on the master worker or all. You can use environment variables(See below to have different master/worker config)
Where the file should be dropped relative to the plugin working directory
Path to the local file you want to upload(Could form the plugins parameters)
Where the file should be dropped relative to the plugin working directory
Path to the local file you want to upload(Could form the plugins parameters)
Internal port to open on the docker container
If the port should be open publicly(Default: False
)
AZTK provide a few environment variables that can be used in your plugin script
AZTK_IS_MASTER
: Is the plugin running on the master node. Can be eithertrue
orfalse
AZTK_IS_WORKER
: Is a worker setup on the current node(This might also be a master if you haveworker_on_master
set to true) Can be eithertrue
orfalse
AZTK_MASTER_IP
: Internal ip of the master
When your plugin is not working as expected there is a few things you do to investigate issues
Check the logs, you can either use the debug tool or BatchLabs
Navigate to startup/wd/logs/plugins
-
Now if you see a file named
<your-plugin-name>.txt
under that folder it means that your plugin started correctly and you can check this file to see what you execute script logged. -
IF this file doesn't exists this means the script was not run on this node. There could be multiple reasons for this:
- If you want your plugin to run on the spark container check the
startup/wd/logs/docker.log
file for information about this - If you want your plugin to run on the host check the
startup/stdout.txt
andstartup/stderr.txt
The log could mention you picked the wrong target or target role for that plugin which is why this plugin is not running on this node.
- If you want your plugin to run on the spark container check the