Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 473 Add optional debug/info logging to extract_xri script #514

Merged
merged 2 commits into from
Sep 19, 2024

Conversation

rminsil
Copy link
Collaborator

@rminsil rminsil commented Sep 12, 2024

This PR adds optional logging to the script.

Currently the script makes certain assumptions about the data which can make it a bit brittle causing it to crash mid-processing.

Some example assumptions:

  • all columns exist with the expected schema
  • no embedded newlines (e.g. each row is on one line)
  • id's can all be parsed to int

Logging will help to quickly identify the row of data breaking it.

Currently the logging is off by default. Enabling with -log [PATH] sends the logs to the file specified.

Note this PR is built on top of another unmerged PR #473. I'll rebase this and target master when #473 is merged.


This change is Reviewable

@rminsil rminsil linked an issue Sep 12, 2024 that may be closed by this pull request
Copy link
Collaborator

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @rminsil)


silnlp/common/extract_xri.py line 52 at r1 (raw file):

logger = logging.getLogger(__name__)

You don't want to use __name__ here, because it is set to __main__. You will want to use the following:

LOGGER = logging.getLogger(__package__ + ".extract_xri")

silnlp/common/extract_xri.py line 197 at r1 (raw file):

        "-output", help="Optional path to the output directory where extract files are generated", type=str
    )
    parser.add_argument("-log", help="Optional path to enable logging. Logs are sent to the file passed", type=str)

In silnlp, we always have logging enabled and we output directly to stdout. I don't think there is a need for logging to a file.

@rminsil
Copy link
Collaborator Author

rminsil commented Sep 13, 2024

In silnlp, we always have logging enabled and we output directly to stdout. I don't think there is a need for logging to a file.

Each line of the file produces some logs and the files are around 4K entires, so you're typically getting about 8K-12K lines of logging, that's why I thought it better to have it off by default and send it to a file if it's enabled. Otherwise it's like a snowstorm in the terminal and it will usually overrun the terminal's buffer. We can pipe it to a file I guess but that's an extra step.

I was going to reserve stdout or stderr for the more meaningful output like statistics and summaries of data quality issues noticed.

Let me know what you think, I'm happy to do whatever.

UPDATE: Sorry I think I misunderstood some of what you're saying and was missing some context.

@rminsil
Copy link
Collaborator Author

rminsil commented Sep 13, 2024

@ddaspit I found the logging config in the parent __init.py__ and understand what you mean now.

I've removed the option to log to a file.

I've changed the logger name to be specific to the module. I'll make an analogous change to the repair logger used in #515.

I've added an optional flag to allow setting the debug level from the command line.

Copy link
Collaborator

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @rminsil)

@rminsil rminsil force-pushed the issue-473-add-detailed-logging branch from f4601bb to 24b02d4 Compare September 19, 2024 05:04
@rminsil rminsil changed the base branch from issue-473-change-output-location to master September 19, 2024 05:05
@rminsil rminsil merged commit f2e7e96 into master Sep 19, 2024
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create initial xri_etl script
2 participants