Skip to content

Docker Swarm operator is broken: json: cannot unmarshal string into Go struct field Resources.MemoryBytes of type int64 #11617

@jnunezgts

Description

@jnunezgts

Apache Airflow version:
1.10.12, using SQLLite as the backend

Kubernetes version (if you are using kubernetes) (use kubectl version):
N/A. Using Docker Swarm 19.03.8

Environment:

  • Cloud provider or hardware configuration:
    No cloud, bare-metal server:
HP ProLiant DL560 Gen8, BIOS P77 12/20/2013, 64 cpus
  • OS (e.g. from /etc/os-release):
Fedora release 29 (Twenty Nine)
  • Kernel (e.g. uname -a):
Linux server.company.com 4.19.82-1300.fc29.x86_64 #1 SMP Fri Nov 8 10:49:58 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
pip
  • Others:
Python 3.7.2 (default, Jan 16 2019, 19:49:22)
[GCC 8.2.1 20181215 (Red Hat 8.2.1-6)] on linux

Docker info:

Client:
 Debug Mode: false

Server:
 Containers: 21
  Running: 0
  Paused: 0
  Stopped: 21
 Images: 12
 Server Version: 19.03.8
 Storage Driver: overlay2
  Backing Filesystem: <unknown>
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: systemd
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: j0pl320hoxuqcaaa14z2znvgo
  Is Manager: true
  ClusterID: kpgz783mpw8aapdxchtwdu2ff
  Managers: 1
  Nodes: 4
  Default Address Pool: 10.0.0.0/8
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 172.29.248.55
  Manager Addresses:
   172.29.248.55:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.19.82-1300.fc29.x86_64
 Operating System: Fedora 29 (Twenty Nine)
 OSType: linux
 Architecture: x86_64
 CPUs: 64
 Total Memory: 125.9GiB
 Name: server.company.com
 ID: 7ESU:O253:JGNS:YJIY:XXX:CYTI:WFQC:6L5C:XXXX:62IO:VH23:XXXX
 Docker Root Dir: /opt/docker
 Debug Mode: false
 HTTP Proxy: http://proxy.company.com:8080/
 HTTPS Proxy: http://proxy.company:8080/
 No Proxy: localhost,127.0.0.1,server.company.com,.company.com
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  privatereg.company.com:5000
  localhost:5000
  server.company.com:5000
  127.0.0.0/8
 Live Restore Enabled: false

What happened:

Created the following DAG to schedule a one time shot job:

from datetime import time
from datetime import datetime
from datetime import timedelta
from airflow import DAG
from airflow.contrib.operators.docker_swarm_operator import DockerSwarmOperator

  DEFAULT_ARGS = {
      'retry_delay': timedelta(minutes=5),
      'retries': 1,
      'email_on_failure': True,
      'email_on_retry': False,
      'email': ['myemail@company.com']
  }

with DAG('24_7_box', description='24 x 7. With retries', default_args=DEFAULT_ARGS, schedule_interval='0 * * * Mon-Sun', start_date=datetime(2019, 7, 23), max_active_runs=1, catchup=False) as twenty_four_by_seven_dag:
      # See:
      # https://airflow.apache.org/docs/stable/_api/airflow/contrib/operators/docker_swarm_operator/index.html
      # https://airflow.apache.org/docs/stable/_modules/airflow/contrib/operators/docker_swarm_operator.html
      SLEEP_TASK = DockerSwarmOperator(
          task_id="SLEEP_TASK",
          image="fedora:29",
          api_version="auto",
          command="/bin/sleep 60",
          docker_url="unix://var/run/docker-sysavtbuild.sock",
          force_pull=False,
          mem_limit="500m",
          auto_remove=True,
      )

      SLEEP_TASK

What you expected to happen:

I was expecting the container to be created and be alive for 60 seconds, exit with code=0 after that. No ouput.

Other have reported success in the past using Docker Swarm Operator.

Not sure. The Airflow log shows the following:

[2020-10-17 09:46:58,475] {taskinstance.py:1150} ERROR - 400 Client Error: Bad Request ("json: cannot unmarshal string into Go struct field Resources.MemoryBytes of type int64")

I can run this command from docker CLI as follows:

[user@server dags]$ docker run --rm --detach fedora:29 /bin/sleep 45
29912c34f43e2dfa20d417cb80113059a183518b99215609c0aa7b37874c27db
[user@server dags]$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
29912c34f43e        fedora:29           "/bin/sleep 45"     7 seconds ago       Up 6 seconds                            gifted_pare

How to reproduce it:

  1. Copy the DAG provided into ~/airflow/dags
  2. Turn ON the DAG
  3. Trigger the DAG or let the scheduler run it. Error will show up eventually

Anything else we need to know:

Airflow.log *** Reading local file: /home/user/airflow/logs/avt_24_7_box/SLEEP_TASK/2020-10-17T13:24:26.101897+00:00/2.log [2020-10-17 09:46:58,312] {taskinstance.py:670} INFO - Dependencies all met for [2020-10-17 09:46:58,321] {taskinstance.py:670} INFO - Dependencies all met for [2020-10-17 09:46:58,321] {taskinstance.py:880} INFO - -------------------------------------------------------------------------------- [2020-10-17 09:46:58,321] {taskinstance.py:881} INFO - Starting attempt 2 of 2 [2020-10-17 09:46:58,321] {taskinstance.py:882} INFO - -------------------------------------------------------------------------------- [2020-10-17 09:46:58,328] {taskinstance.py:901} INFO - Executing on 2020-10-17T13:24:26.101897+00:00 [2020-10-17 09:46:58,335] {standard_task_runner.py:54} INFO - Started process 35637 to run task [2020-10-17 09:46:58,371] {standard_task_runner.py:77} INFO - Running: ['airflow', 'run', '24_7_box', 'SLEEP_TASK', '2020-10-17T13:24:26.101897+00:00', '--job_id', '55', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/avt_test3.py', '--cfg_path', '/tmp/tmpaivrdhuu'] [2020-10-17 09:46:58,372] {standard_task_runner.py:78} INFO - Job 55: Subtask SLEEP_TASK [2020-10-17 09:46:58,398] {logging_mixin.py:112} INFO - Running %s on host %s server.company.com [2020-10-17 09:46:58,467] {docker_swarm_operator.py:105} INFO - Starting docker service from image fedora:29 [2020-10-17 09:46:58,475] {taskinstance.py:1150} ERROR - 400 Client Error: Bad Request ("json: cannot unmarshal string into Go struct field Resources.MemoryBytes of type int64") Traceback (most recent call last): File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/api/client.py", line 259, in _raise_for_status response.raise_for_status() File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/requests/models.py", line 941, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http+docker://localhost/v1.40/services/create

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/airflow/models/taskinstance.py", line 984, in run_raw_task
result = task_copy.execute(context=context)
File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/airflow/operators/docker_operator.py", line 277, in execute
return self.run_image()
File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/airflow/contrib/operators/docker_swarm_operator.py", line 119, in run_image
labels={'name': 'airflow
%s
_%s' % (self.dag_id, self.task_id)}
File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/utils/decorators.py", line 34, in wrapper
return f(self, *args, **kwargs)
File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/api/service.py", line 190, in create_service
self._post_json(url, data=data, headers=headers), True
File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/api/client.py", line 265, in _result
self._raise_for_status(response)
File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/api/client.py", line 261, in _raise_for_status
raise create_api_error_from_http_exception(e)
File "/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 400 Client Error: Bad Request ("json: cannot unmarshal string into Go struct field Resources.MemoryBytes of type int64")
[2020-10-17 09:46:58,481] {taskinstance.py:1194} INFO - Marking task as FAILED. dag_id=24_7_box, task_id=SLEEP_TASK, execution_date=20201017T132426, start_date=20201017T134658, end_date=20201017T134658
[2020-10-17 09:46:58,509] {configuration.py:373} WARNING - section/key [smtp/smtp_user] not found in config
[2020-10-17 09:46:58,583] {email.py:132} INFO - Sent an alert email to ['user@company.com']
[2020-10-17 09:47:03,312] {local_task_job.py:102} INFO - Task exited with return code 1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions