Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to turn on logging? #84

Open
rabernat opened this issue Mar 20, 2021 · 8 comments
Open

how to turn on logging? #84

rabernat opened this issue Mar 20, 2021 · 8 comments

Comments

@rabernat
Copy link
Contributor

I put a lot of logging statements into the recipe class to help with debugging. But it is never easy to turn them on. How can we make pangeo-forge work like prefect, where logs are automatically on by default? In particular, I would really like to see the logs on the dask workers.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 20, 2021

In general, libraries like pangeo-forge should only provide the logger objects and the logging statements. It's typically up to applications using the library to configure logging for their needs.

So I imagine that anyone executing the recipes would have something like

logger = logging.getLogger("pangeo_forge.recipe")  # or whatever logger to target. Maybe just "pangeo_forge")

# configure the handler, formatter
handler = logging.StreamHandler()  # print to stdout.
handler.setLevel(logging.INFO)
logger.addHandler(handler)
logger.setLevel(logging.INFO)

That's configured things to print to stdout for messages with INFO or higher (so not DEBUG). I think pangeo-forge might provide a standard handler and formatter that executors could use.

Side-note, libraries should typically use logger.debug("thing %s, ds") rather than logger.debug(f"thing {ds}") in case calling str(ds) is "expensive". With the %s-style then str(ds) isn't called unless there's a handler that's actually listening at DEBUG level.

Configuring the Dask workers is a bit more involved. I'll have to look up some examples. But it follows a similar pattern of adding a handler. The extra complication is that the handlers are on a different machine. We can pull the logs with client.get_worker_logs (I think) or have them pushed somewhere central with a more complicated handler.

@rabernat
Copy link
Contributor Author

Thanks Tom! I found I also needed to follow this advice to get the logs to show in jupyterlab.

@martindurant
Copy link
Contributor

I have no idea!

@rabernat
Copy link
Contributor Author

If we execute using prefect, it should be straightforward to get the logs: https://docs.prefect.io/core/concepts/logging.html#extra-loggers

@rabernat
Copy link
Contributor Author

rabernat commented Apr 1, 2021

I am really stuck on something and I would love some help from a Dask guru (@martindurant or @TomAugspurger).

I am debugging a problem with distributed locking (see #86) and I really need to see the pangeo-forge logging from dask workers during pytest execution. I can turn on logging from pytest by running e.g.

py.test --log-cli-level=DEBUG  -s tests/test_recipes.py::test_chunks[manual-netCDFtoZarr_sequential_recipe-target_chunks0-True-chunk_expectation0-2-D]

However, when I am executing with dask, e.g.

py.test --log-cli-level=DEBUG  -s tests/test_recipes.py::test_chunks[dask-netCDFtoZarr_sequential_recipe-target_chunks0-True-chunk_expectation0-2-D]

there is no output because the logging happens on the workers. I have tried all sorts of hacks to try to get the output to show, but I am stuck.

I CAN see STDOUT from the workers.

Any hints would be appreciated.

@TomAugspurger
Copy link
Contributor

Can you add a handler to redirect dask worker logs to stdout? With something like the following somewhere (at the top of the test_recipes.py file maybe)

import logging

logger = logging.getLogger("distributed.worker")
handler = logging.StreamHandler()
handler.setLevel(logging.DEBUG)
logger.setLevel(logging.DEBUG)
logger.addHandler(handler)

@rabernat
Copy link
Contributor Author

rabernat commented Apr 1, 2021

To be clear, I am not looking for the distributed.worker logs. Those already print. I am looking for the pangeo_forge.recipe logs.

That was one of the things I tried (with "pangeo_forge" instead of "distributed.worker"), and it did not work.

However, it gave me an idea. I added this to the fixture that creates the dask cluster

                def redirect_logs():
                    import logging
                    logger = logging.getLogger("pangeo_forge")
                    handler = logging.StreamHandler()
                    handler.setLevel(logging.DEBUG)
                    logger.setLevel(logging.DEBUG)
                    logger.addHandler(handler)
                client.run(redirect_logs)

and it worked! 🎉

@TomAugspurger
Copy link
Contributor

Gotcha, sorry I missed that point.

I think that the bakeries will want some kind of logging config file and then we'd set the Dask config to load it (so that it's picked up everywhere). It's a bit finnicky to get right, but hopefully we just have to do it once (per bakery).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants