Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging config for bakeries #92

Open
TomAugspurger opened this issue Apr 2, 2021 · 5 comments
Open

Logging config for bakeries #92

TomAugspurger opened this issue Apr 2, 2021 · 5 comments

Comments

@TomAugspurger
Copy link
Contributor

This is a followup to #84 (comment).

The pangeo-forge library and bakeries need to coordinate on producing and collecting rich logging information for debugging. As recommended in https://docs.python.org/3/howto/logging.html#configuring-logging-for-a-library, libraries like pangeo-forge shouldn't do any configuration. That's up to the applications (bakeries in our case).

I imagine that each bakery will have some configuration: https://docs.python.org/3/library/logging.config.html (either a file or an in-memory dictionary, whatever is easiest). The tricky part here is that we need to propagate that configuration to all of the Dask workers. https://docs.dask.org/en/latest/debugging.html#logs has documentation on configuring Dask worker logs (what you get with logging.getLogger("distributed.worker")). We need to make sure that the worker processes are also configured to log the logger.getLogger("pangeo_forge") logs. I believe this is doable by including the configuration for the pangeo-forge logger under the logging key in the dask config, but that needs to be confirmed (I have to run now so I can't check it, but I wanted to write this up before forgetting).

cc @sharkinsspatial and @CiaranEvans for the bakery side.

@rabernat
Copy link
Contributor

rabernat commented Apr 2, 2021

I believe this is doable by including the configuration for the pangeo-forge logger under the logging key in the dask config, but that needs to be confirmed (I have to run now so I can't check it, but I wanted to write this up before forgetting)

It would be awesome if we could make this work via a config file.

During testing, I also needed to see pangeo_forge logs from the dask workers. I came up with this slightly hacky solution to add a run-time option to pytest:

https://github.com/pangeo-forge/pangeo-forge/blob/e5f7e7cc463ea31513088e14047b962a90cb69f9/tests/conftest.py#L221-L234

That code uses client.run to run a function on the workers to redirect the log output. A more elegant way to do this would be with a Dask worker plugin.

@TomAugspurger
Copy link
Contributor Author

That code uses client.run to run a function on the workers to redirect the log output. A more elegant way to do this would be with a Dask worker plugin.

My hope is that by setting this in the Dask config, that won't be necessary. As the Dask workers load the Dask config they logging config will automatically be applied (to all the loggers, not just the Dask ones)

@sharkinsspatial
Copy link
Contributor

@TomAugspurger Thus far on the bakery side we are taking a 2 pronged logging approach. Individual worker logging is redirected to cloud provider log capture. In the case of AWS this means there are separate logs for each worker used by a flow in the dask-ecs log group. In addition, we wrap the flow's steps with individual module logging prior to registration. I currently have this hardcoded to only the pangeo-forge.recipe module but we can make this configurable by the recipe author via an entry in the recipe's meta.yaml.

There are a few issues with this approach, all of the worker logs are consolidated and displayed as a single log stream in Prefect Cloud. This can be noisy and difficult to track with very verbose logging. In addition, there is an outstanding issue that prevents log redirection for dask.distributed that I need to address with the Prefect team.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented May 3, 2021

@sharkinsspatial does the second code block at https://docs.dask.org/en/latest/debugging.html?highlight=logging#logs help? I imagine that we could have something like

# file: /etc/dask/logging.yaml
logging:
  version: 1
  handlers:
    console:
      class: logging.StreamHandler
      level: INFO
  loggers:
    pange_forge:
      level: INFO
      handlers:
        - console
    distributed.worker:
      level: INFO
      handlers:
        - console
    distributed.scheduler:
      level: INFO
      handlers:
        - console

In addition, we would need to check

  1. A custom formatter that includes things like the hostname, time, etc.
  2. Whether prefect provides any special handlers we should use, or whether piping to stdout is the best option.

LMK know if you have other questions about this approach.

This can be noisy and difficult to track with very verbose logging

I've never used it before, but there is the Filter object: https://docs.python.org/3/library/logging.html#filter-objects for filtering based on specific criteria, if level and module aren't enough.

@sharkinsspatial
Copy link
Contributor

@TomAugspurger I'll continue tracking this here. Recipe module logging level is configurable at flow registration time with this wrapper https://github.com/pangeo-forge/pangeo-forge-prefect/blob/master/pangeo_forge_prefect/flow_manager.py#L58. This works correctly within each cloud provider's log solution (Cloudwatch in the ECS case) but I need to coordinate further with the Prefect team to get a clearer picture of the limitations on log shipping to Prefect Cloud. Currently I am using their extra loggers functionality but it appears that there is an issue with the timing of log packaging and flushing that is preventing entire worker log streams from being written and only a segment of the stream is being shipped to Prefect Cloud.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants