Skip to content

Commit aeacc89

Browse files
areedaJoseph Areeda
and
Joseph Areeda
authored
Fix Max lookback issue (#178)
* Do not cross metric day boundaries. * Merge day boundary (#146) * Address issue #126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this * Do not merge files if they overlap "metric days" * Do not cross metric day boundaries. Co-authored-by: Joseph Areeda <joseph.areeda@ligo.org> * Check point merge (#147) * Do not cross metric day boundaries. * add log file arg delete empty directories when done * Tweak remove empty dir removal * Tweak remove empty dir removal again * Merge day boundary (#146) * Address issue #126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this * Do not merge files if they overlap "metric days" * Do not cross metric day boundaries. Co-authored-by: Joseph Areeda <joseph.areeda@ligo.org> * rebase agaist last approved PR * rebase against last approved PR * rebase against last approved PR again, fix flake8 * Fix a bug in remove empty directories. Co-authored-by: Joseph Areeda <joseph.areeda@ligo.org> * Merge day boundary (#146) * Address issue #126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this * Do not merge files if they overlap "metric days" * Do not cross metric day boundaries. Co-authored-by: Joseph Areeda <joseph.areeda@ligo.org> * minor doc changes * Fix a bug where an xml.gz file could get compressed again in merge-with-gaps * Fix a double gzip of ligolw files (#151) * Do not cross metric day boundaries. * Merge day boundary (#146) * Address issue #126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this * Do not merge files if they overlap "metric days" * Do not cross metric day boundaries. Co-authored-by: Joseph Areeda <joseph.areeda@ligo.org> * Check point merge (#147) * Do not cross metric day boundaries. * add log file arg delete empty directories when done * Tweak remove empty dir removal * Tweak remove empty dir removal again * Merge day boundary (#146) * Address issue #126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this * Do not merge files if they overlap "metric days" * Do not cross metric day boundaries. Co-authored-by: Joseph Areeda <joseph.areeda@ligo.org> * rebase agaist last approved PR * rebase against last approved PR * rebase against last approved PR again, fix flake8 * Fix a bug in remove empty directories. Co-authored-by: Joseph Areeda <joseph.areeda@ligo.org> * Merge day boundary (#146) * Address issue #126. Allow pyomicron to run from a frame cache without accessing dqsegdb. Add documentation for this * Do not merge files if they overlap "metric days" * Do not cross metric day boundaries. Co-authored-by: Joseph Areeda <joseph.areeda@ligo.org> * minor doc changes * Fix a bug where an xml.gz file could get compressed again in merge-with-gaps * Implement a periodic vacate to address permanent D-state (uninterupptible wait) causing jobs to fail to complete * Always create a log file. If not specified put one in the output directory * Fix a problem with periodic vacate. * Up the periodic vacate time to 3 hrs * Found a job killing typo * Add time limits to post processing also * Don't save segments.txt file if no sgments founds because we don't know if it's an issue of not finding them or a valid not analyzable state. * disable periodic vacate to demo the problem. * Fix reported version in some utilities. Only update segments.txt if omicron is actually run. * Clarify relative imports. and add details to a few log messages * Resolve flake8 issues --------- Co-authored-by: Joseph Areeda <joseph.areeda@ligo.org> * Resolve flake8 issues * Update log format to use human readble date/time instead of gps tweak logging to better underst guardian channel usage- * remove old setup.cfg * Work vonpytest failures. The remaining errors are the result of omicron segfaults if environment variable not set * missing blank line, from flake8 * Fix a problem wit hmax-online-lookback not working properly in all pathes. Add some uman readable date/times to gps messages * Fix logging problems from different gps time objects * Better logging why online effort did not run * Up default lookback window to 40 min * Up default maximum lookback window to 60 min. Better logging of why we did not run. * Fix flake8 and more logging updates * Fix flake8 and more logging updates * More logging updates * More logging updates, paths through could cause error * Trap and print errors from main() * fix dag submission command * add smart postscript to allow retries befor ignoring errors * tst version of scitokens and smart post script * tst version of scitokens and smart post script * add arg to specify auth type (x5099, vault or apissuer) * memopry units in the wrong place * memory units in the wrong place, condor_run * flake8 nit picked * Again try to get periodic_release and periodic_remove correct * Sort console scripts * Typo in periodic_remove * Better error message when programs not available * implement cona run for all jobs in dag * conda run complications with cvmfs * archive.py deals with renamed trigger files * archive.py deals with renamed trigger files take 2 * condor run needed in all scripts. * minor logging changes * working on archive issues * working on archive issues, keep "temporary" files to help debugging * more logging * Default max lookback changed to 30min. more logging tweaks * Add omicron_utils to insta;; requirements * Work on build and test workflow error. * Still working on build and test workflow error. Remove Python 3.9 from workflows * Set loglevel for OmicronConfig to Critical so --version command is clean * Resolve all conversations * try to deal with github error ``` Error: This request has been automatically failed because it uses a deprecated version of `actions/upload-artifact: v2`. Learn more: https://github.blog/changelog/2024-02-13-deprecation-notice-v1-and-v2-of-the-artifact-actions/ ``` * try to deal with github error ``` Error: This request has been automatically failed because it uses a deprecated version of `actions/upload-artifact: v2`. Learn more: https://github.blog/changelog/2024-02-13-deprecation-notice-v1-and-v2-of-the-artifact-actions/ ``` --------- Co-authored-by: Joseph Areeda <joseph.areeda@ligo.org>
1 parent af50aad commit aeacc89

File tree

8 files changed

+436
-116
lines changed

8 files changed

+436
-116
lines changed

.github/workflows/build.yml

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,6 @@ jobs:
3737
- macOS
3838
- Ubuntu
3939
python-version:
40-
- "3.9"
4140
- "3.10"
4241
- "3.11"
4342
runs-on: ${{ matrix.os }}-latest
@@ -49,12 +48,12 @@ jobs:
4948

5049
steps:
5150
- name: Get source code
52-
uses: actions/checkout@v2
51+
uses: actions/checkout@v4
5352
with:
5453
fetch-depth: 0
5554

5655
- name: Cache conda packages
57-
uses: actions/cache@v2
56+
uses: actions/cache@v4
5857
env:
5958
# increment to reset cache
6059
CACHE_NUMBER: 0
@@ -64,7 +63,7 @@ jobs:
6463
restore-keys: ${{ runner.os }}-conda-${{ matrix.python-version }}-
6564

6665
- name: Configure conda
67-
uses: conda-incubator/setup-miniconda@v2
66+
uses: conda-incubator/setup-miniconda@v3
6867
with:
6968
activate-environment: test
7069
miniforge-variant: Mambaforge
@@ -111,14 +110,14 @@ jobs:
111110
run: python -m coverage xml
112111

113112
- name: Publish coverage to Codecov
114-
uses: codecov/codecov-action@v3
113+
uses: codecov/codecov-action@v4
115114
with:
116115
files: coverage.xml
117116
flags: Conda,${{ runner.os }},python${{ matrix.python-version }}
118117

119118
- name: Upload test results
120119
if: always()
121-
uses: actions/upload-artifact@v2
120+
uses: actions/upload-artifact@v4
122121
with:
123122
name: pytest-conda-${{ matrix.os }}-${{ matrix.python-version }}
124123
path: pytest.xml

omicron/cli/archive.py

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ def scandir(otrigdir):
9090
def process_dir(dir_path, outdir, logger, keep_files):
9191
"""
9292
Copy all trigget files to appropriate directory
93-
@param logger: program'sclogger
93+
@param logger: program's logger
9494
@param Path dir_path: input directory
9595
@param Path outdir: top level output directory eg ${HOME}/triggers
9696
@param boolean keep_files: Do not delete files after copying to archive
@@ -116,14 +116,17 @@ def process_dir(dir_path, outdir, logger, keep_files):
116116
tspan = Segment(strt, strt + dur)
117117

118118
otrigdir = outdir / ifo / chan / str(int(strt / 1e5))
119+
120+
logger.debug(f'Trigger file:\n'
121+
f' {tfile_path.name}\n'
122+
f' ifo: [{ifo}], chan: [{chan}], strt: {strt}, duration: {dur} ext: [{ext}]\n'
123+
f' outdir: {str(otrigdir.absolute())}')
124+
119125
if str(otrigdir.absolute()) not in dest_segs.keys():
120126
dest_segs[str(otrigdir.absolute())] = scandir(otrigdir)
121127

122-
logger.debug(
123-
f'ifo: [{ifo}], chan: [{chan}], strt: {strt}, ext: [{ext}] -> {str(otrigdir.absolute())}')
124-
125128
if dest_segs[str(otrigdir.absolute())].intersects_segment(tspan):
126-
logger.warn(f'{tfile_path.name} ignored because it would overlap')
129+
logger.warning(f'{tfile_path.name} ignored because it would overlap')
127130
else:
128131
otrigdir.mkdir(mode=0o755, parents=True, exist_ok=True)
129132
shutil.copy(tfile, str(otrigdir.absolute()))
@@ -134,11 +137,14 @@ def process_dir(dir_path, outdir, logger, keep_files):
134137

135138

136139
def main():
137-
logging.basicConfig()
140+
# global logger
141+
log_file_format = "%(asctime)s - %(levelname)s - %(funcName)s %(lineno)d: %(message)s"
142+
log_file_date_format = '%m-%d %H:%M:%S'
143+
logging.basicConfig(format=log_file_format, datefmt=log_file_date_format)
138144
logger = logging.getLogger(__process_name__)
139145
logger.setLevel(logging.DEBUG)
140146

141-
home = os.getenv('HOME')
147+
home = Path.home()
142148
outdir_default = os.getenv('OMICRON_HOME', f'{home}/triggers')
143149
parser = argparse.ArgumentParser(description=textwrap.dedent(__doc__),
144150
formatter_class=argparse.RawDescriptionHelpFormatter,
@@ -169,6 +175,10 @@ def main():
169175
else:
170176
logger.setLevel(logging.DEBUG)
171177

178+
logger.debug("Command line args:")
179+
for arg in vars(args):
180+
logger.debug(f' {arg} = {str(getattr(args, arg))}')
181+
172182
indir = Path(args.indir)
173183
outdir = Path(args.outdir)
174184
if not outdir.exists():

omicron/cli/omicron_post_script.py

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
#!/usr/bin/env python
2+
# -*- coding: utf-8 -*-
3+
# vim: nu:ai:ts=4:sw=4
4+
5+
#
6+
# Copyright (C) 2024 Joseph Areeda <joseph.areeda@ligo.org>
7+
#
8+
# This program is free software: you can redistribute it and/or modify
9+
# it under the terms of the GNU General Public License as published by
10+
# the Free Software Foundation, either version 3 of the License, or
11+
# (at your option) any later version.
12+
#
13+
# This program is distributed in the hope that it will be useful,
14+
# but WITHOUT ANY WARRANTY; without even the implied warranty of
15+
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16+
# GNU General Public License for more details.
17+
#
18+
# You should have received a copy of the GNU General Public License
19+
# along with this program. If not, see <http://www.gnu.org/licenses/>.
20+
#
21+
22+
"""
23+
The situation is that we run DAGs with many omicron jobs, some of which fail for data dependent reasons that
24+
are valid and permanent but others are transient like network issues that could be resolved with a retry.
25+
26+
This program isun as a post script to allow us to retry the job but return a success code even if it fails
27+
repeatedly so that the DAG completes.
28+
"""
29+
import textwrap
30+
import time
31+
32+
start_time = time.time()
33+
34+
import argparse
35+
import logging
36+
from pathlib import Path
37+
import sys
38+
import traceback
39+
40+
try:
41+
from ._version import __version__
42+
except ImportError:
43+
__version__ = '0.0.0'
44+
45+
__author__ = 'joseph areeda'
46+
__email__ = 'joseph.areeda@ligo.org'
47+
__process_name__ = Path(__file__).name
48+
49+
logger = None
50+
51+
52+
def parser_add_args(parser):
53+
"""
54+
Set up command parser
55+
:param argparse.ArgumentParser parser:
56+
:return: None but parser object is updated
57+
"""
58+
parser.add_argument('-v', '--verbose', action='count', default=1,
59+
help='increase verbose output')
60+
parser.add_argument('-V', '--version', action='version',
61+
version=__version__)
62+
parser.add_argument('-q', '--quiet', default=False, action='store_true',
63+
help='show only fatal errors')
64+
parser.add_argument('--return-code', help='Program return code')
65+
parser.add_argument('--max-retry', help='condor max retry value')
66+
parser.add_argument('--retry', help='current try starting at 0')
67+
parser.add_argument('--log', help='Path for a copy of our logger output')
68+
69+
70+
def main():
71+
global logger
72+
73+
log_file_format = "%(asctime)s - %(levelname)s, %(pathname)s:%(lineno)d: %(message)s"
74+
log_file_date_format = '%m-%d %H:%M:%S'
75+
logging.basicConfig(format=log_file_format, datefmt=log_file_date_format)
76+
logger = logging.getLogger(__process_name__)
77+
logger.setLevel(logging.DEBUG)
78+
79+
epilog = textwrap.dedent("""
80+
This progam is designed to be run as a post script in a Condor DAG. For available arguments see:
81+
https://htcondor.readthedocs.io/en/latest/automated-workflows/dagman-scripts.html#special-script-argument-macros
82+
A typical lne in the DAG might look like:
83+
python omicron_post_script.py -vvv --return $(RETURN) --retry $(RETRY) --max-retry $(MAX_RETRIES) --log
84+
<path_to_log>
85+
""")
86+
87+
parser = argparse.ArgumentParser(description=__doc__, prog=__process_name__, epilog=epilog,
88+
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
89+
parser_add_args(parser)
90+
args = parser.parse_args()
91+
verbosity = 0 if args.quiet else args.verbose
92+
93+
if verbosity < 1:
94+
logger.setLevel(logging.CRITICAL)
95+
elif verbosity < 2:
96+
logger.setLevel(logging.INFO)
97+
else:
98+
logger.setLevel(logging.DEBUG)
99+
100+
if args.log:
101+
log = Path(args.log)
102+
log.parent.mkdir(0o775, exist_ok=True, parents=True)
103+
file_handler = logging.FileHandler(log, mode='a')
104+
log_formatter = logging.Formatter(log_file_format, datefmt=log_file_date_format)
105+
file_handler.setFormatter(log_formatter)
106+
logger.addHandler(file_handler)
107+
108+
me = Path(__file__).name
109+
logger.info(f'--------- Running {str(me)}')
110+
# debugging?
111+
logger.debug(f'{__process_name__} version: {__version__} called with arguments:')
112+
for k, v in args.__dict__.items():
113+
logger.debug(' {} = {}'.format(k, v))
114+
115+
ret = int(args.return_code)
116+
retry = int(args.retry)
117+
max_retry = int(args.max_retry)
118+
ret = ret if retry < max_retry or ret == 0 else 0
119+
logger.info(f'returning {ret}')
120+
return ret
121+
122+
123+
if __name__ == "__main__":
124+
try:
125+
ret = main()
126+
except (ValueError, TypeError, OSError, NameError, ArithmeticError, RuntimeError) as ex:
127+
print(ex, file=sys.stderr)
128+
traceback.print_exc(file=sys.stderr)
129+
ret = 21
130+
131+
if logger is None:
132+
logging.basicConfig()
133+
logger = logging.getLogger(__process_name__)
134+
logger.setLevel(logging.DEBUG)
135+
# report our run time
136+
logger.info(f'Elapsed time: {time.time() - start_time:.1f}s')
137+
sys.exit(ret)

0 commit comments

Comments
 (0)