Skip to content

Commit

Permalink
Merge pull request #44 from DataONEorg/develop
Browse files Browse the repository at this point in the history
Release v0.1.1
  • Loading branch information
iannesbitt authored Mar 7, 2024
2 parents 75840e5 + 2698389 commit 718bd23
Show file tree
Hide file tree
Showing 24 changed files with 4,737 additions and 3,136 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
instance/
dbs/
.vscode
nohup.out

docs/diagrams/C4-PlantUML

Expand Down
8 changes: 4 additions & 4 deletions docs/operation.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ Harvesting is implemented as a scrapy crawler[^scrapy]. Given a sitemap, crawls
## DataONE production and testing hosts

- Test server: so.test.dataone.org
- Environment: ~vieglais
- Virtual env: mnlite
- Environment: `~mnlite`
- Virtual env: `mnlite`
- Production server: sonode.dataone.org
- Environment: ``~mnlite`
- Environment: `~mnlite`
- Virtual env: `mnlite`

## Testing
Expand All @@ -25,7 +25,7 @@ Harvesting is implemented as a scrapy crawler[^scrapy]. Given a sitemap, crawls

1. Log in to sonode.dataone.org (or so.test.dataone.org for testing)
2. `sudo su - mnlite`
3. `workon mnlite`
3. `workon mnlite` (`conda activate mnlite` on the test node)
4. `cd WORK/mnlite`
5. Initialize a new repository: `opersist -f instance/nodes/HAKAI_IYS init`
6. Create a contact subject: `opersist -f instance/nodes/HAKAI_IYS sub -o create -n "Brett Johnson" -s "http://orcid.org/0000-0001-9317-0364"`
Expand Down
5 changes: 5 additions & 0 deletions mnlite/mnode.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,11 @@
]
}
}
"""
Default node configuration dictionary. Defines the node document upon loading
into mnlite system service (see eg: the
`OpenTopography node document <https://sonode.dataone.org/OPENTOPO/v2/node>`_)
"""


def getMNodeNameFromRequest():
Expand Down
44 changes: 44 additions & 0 deletions mnlite/xmnlite.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#uWSGI configuration for mnlite
[uwsgi]
strict = true
master = true
processes = 5
enable-threads = true
vacuum = true ; Delete sockets during shutdown
single-interpreter = true
die-on-term = true ; Shutdown when receiving SIGTERM (default is respawn)
need-app = true

#disable-logging = true ; Disable built-in logging
#log-4xx = true ; but log 4xx's anyway
#log-5xx = true ; and 5xx's

##harakiri = 60 ; forcefully kill workers after 60 seconds
#py-callos-afterfork = true ; allow workers to trap signals

##max-requests = 1000 ; Restart workers after this many requests
##max-worker-lifetime = 3600 ; Restart workers after this many seconds
##reload-on-rss = 2048 ; Restart workers after this much resident memory
##worker-reload-mercy = 60 ; How long to wait before forcefully killing workers

#cheaper-algo = busyness
#processes = 128 ; Maximum number of workers allowed
#cheaper = 8 ; Minimum number of workers allowed
#cheaper-initial = 16 ; Workers created at startup
#cheaper-overload = 1 ; Length of a cycle in seconds
#cheaper-step = 16 ; How many workers to spawn at a time
#cheaper-busyness-multiplier = 30 ; How many cycles to wait before killing workers
#cheaper-busyness-min = 20 ; Below this threshold, kill workers (if stable for multiplier cycles)
#cheaper-busyness-max = 70 ; Above this threshold, spawn new workers
##cheaper-busyness-backlog-alert = 16 ; Spawn emergency workers if more than this many requests are waiting in the queue
##cheaper-busyness-backlog-step = 2 ; How many emergency workers to create if there are too many requests in the queue

##plugins = python
##virtualenv = /home/mnlite/miniconda3/envs/mnlite
module = mnlite:create_app()
socket = /home/mnlite/WORK/mnlite/mnlite/tmp/mnlite.sock
chmod-socket = 664

#stats = /tmp/stats.socket
##stats = 127.0.0.1:9191
##stats-http = true
66 changes: 66 additions & 0 deletions mnonboard/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# mnonboard

This module is designed to provide a wrapper around `opersist` and `mnlite` in order to streamline the [DataONE member node onboarding process](https://github.com/DataONEorg/mnlite/blob/feature/onboarding/docs/operation.md).
It takes as input either a json document manually edited from a template, or converts direct user input to a json document.

## Usage

This script requires working installations of both [sonormal](https://github.com/datadavev/sonormal) and [mnlite](https://github.com/DataONEorg/mnlite) to function properly.

### CLI options

```
Usage: cli [ OPTIONS ]
where OPTIONS := {
-c | --check=[ NUMBER ]
number of random metadata files to check for schema.org compliance
-d | --dump=[ FILE ]
dump default member node json file to configure manually
-h | --help
display this help message
-i | --init
initialize a new member node from scratch
-l | --load=[ FILE ]
initialize a new member node from a json file
-P | --production
run this script in production mode (uses the D1 cn API in searches)
-L | --local
run this script in local mode (will not scrape the remote site for new metadata)
}
```

### Onboarding process

Let's say you are in the `mnlite` base directory.
Start by activating the `mnlite` virtual environment and changing the working directory to `./mnonboard`:

```
workon mnlite
cd mnonboard
```

**Note:** Node data is stored in `instance/nodes/<NODENAME>`

#### Using an existing `node.json`

To onboard a member node with an existing `node.json` file:

```
python cli.py -l ../instance/nodes/BONARES/node.json
```

The script will guide you through the steps to set up the node and harvest its metadata.

#### No existing `node.json`

The script can also ask the user questions to set up the `node.json` file in an assisted manner. To do so, use the `-i` (initialize) flag:

```
python cli.py -i
```

Keep in mind that you should always check the `node.json` file to ensure correct values.

## Other functionality

Coming soon (see [#21](https://github.com/DataONEorg/mnlite/issues/21))
63 changes: 63 additions & 0 deletions mnonboard/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
import os
import logging
from datetime import datetime

from opersist.cli import LOG_DATE_FORMAT, LOG_FORMAT
from mnlite.mnode import DEFAULT_NODE_CONFIG

DEFAULT_JSON = DEFAULT_NODE_CONFIG

__version__ = 'v0.0.1'

LOG_FORMAT = "%(asctime)s %(funcName)s:%(levelname)s: %(message)s" # overrides import

FN_DATE = datetime.now().strftime('%Y-%m-%d')
HM_DATE = datetime.now().strftime('%Y-%m-%d-%H%M')
YM_DATE = datetime.now().strftime('%Y-%m')
LOG_DIR = '/var/log/mnlite/'
LOG_NAME = 'mnonboard-%s.log' % (FN_DATE)
LOG_LOC = os.path.join(LOG_DIR, LOG_NAME)

HARVEST_LOG_NAME = '-crawl-%s.log' % YM_DATE

def start_logging():
"""
Initialize logger.
:returns: The logger to use
:rtype: logging.Logger
"""
logger = logging.getLogger('mnonboard')
logger.setLevel(logging.DEBUG)
formatter = logging.Formatter(fmt=LOG_FORMAT, datefmt=LOG_DATE_FORMAT)
s = logging.StreamHandler()
s.setLevel(logging.INFO)
s.setFormatter(formatter)
# this initializes logging to file
f = logging.FileHandler(LOG_LOC)
f.setLevel(logging.DEBUG)
f.setFormatter(formatter)
# warnings also go to file
# initialize logging
logger.addHandler(s) # stream
logger.addHandler(f) # file
logger.info('----- mnonboard %s start -----' % __version__)
return logger

L = start_logging()

# absolute path of current file
CUR_PATH_ABS = os.path.dirname(os.path.abspath(__file__))

# relative path from root of mnlite dir to nodes directory
NODE_PATH_REL = 'instance/nodes/'

def default_json(fx='Unspecified'):
"""
A function that spits out a dict to be used in onboarding.
:returns: A dict of values to be used in member node creation
:rtype: dict
"""
L.info('%s function loading default json template.' % (fx))
return DEFAULT_JSON
155 changes: 155 additions & 0 deletions mnonboard/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
import os, sys
import getopt
import time

from mnonboard import utils
from mnonboard import info_chx
from mnonboard import data_chx
from mnonboard import cn
from mnonboard.defs import CFG, HELP_TEXT, SO_SRVR, CN_SRVR, CN_SRVR_BASEURL, CN_CERT_LOC, APPROVE_SCRIPT_LOC
from mnonboard import default_json, L

def run(cfg):
"""
Wrapper around opersist that simplifies the process of onboarding a new
member node to DataONE.
:param dict cfg: Dict containing config variables
"""
# auth
if not cfg['token']:
cfg['token'] = os.environ.get('D1_AUTH_TOKEN')
if not cfg['token']:
print('Your DataONE auth token is missing. Please enter it here and/or store it in the env variable "D1_AUTH_TOKEN".')
cfg['token'] = info_chx.req_input('Please enter your DataONE authentication token: ')
os.environ['D1_AUTH_TOKEN'] = cfg['token']
cfg['cert_loc'] = CN_CERT_LOC[cfg['mode']]
DC = cn.init_client(cn_url=cfg['cn_url'], auth_token=cfg['token'])
if cfg['info'] == 'user':
# do the full user-driven info gathering process
ufields = info_chx.user_input()
fields = info_chx.transfer_info(ufields)
else:
# grab the info from a json
fields = utils.load_json(cfg['json_file'])
info_chx.input_test(fields)
# still need to ask the user for some names
# now we're cooking
# get the node path using the end of the path in the 'node_id' field
end_node_subj = fields['node']['node_id'].split(':')[-1]
loc = utils.node_path(nodedir=end_node_subj)
# initialize a repository there (step 5)
utils.init_repo(loc)
names = {}
for f in ('default_owner', 'default_submitter', 'contact_subject'):
# add a subject for owner and submitter (may not be necessary if they exist already)
# add subject for technical contact (step 6)
val = fields[f] if f not in 'contact_subject' else fields['node'][f]
name = utils.get_or_create_subj(loc=loc, value=val, cn_url=cfg['cn_url'], title=f)
# store this for a few steps later
names[val] = name
# set the update schedule and set the state to up
fields['node']['schedule'] = utils.set_schedule()
fields['node']['state'] = 'up'
# okay, now overwrite the default node.json with our new one (step 8)
utils.save_json(loc=os.path.join(loc, 'node.json'), jf=fields)
# add node as a subject (step 7)
utils.get_or_create_subj(loc=loc, value=fields['node']['node_id'],
cn_url=cfg['cn_url'],
name=end_node_subj)
# restart the mnlite process to pick up the new node.json (step 9)
utils.restart_mnlite()
# run scrapy to harvest metadata (step 10)
if not cfg['local']:
utils.harvest_data(loc, end_node_subj)
# now run tests
data_chx.test_mdata(loc, num_tests=cfg['check_files'])
# create xml to upload for validation (step 15)
files = utils.create_names_xml(loc, node_id=fields['node']['node_id'], names=names)
# uploading xml (proceed to step 14 and ssh to find xml in ~/d1_xml)
ssh, work_dir, node_id = utils.start_ssh(server=cfg['cn_url'],
node_id=fields['node']['node_id'],
loc=loc,
ssh=cfg['ssh'])
time.sleep(0.5)
utils.upload_xml(ssh=ssh, server=CN_SRVR[cfg['mode']], files=files, node_id=node_id, loc=loc)
# create and validate the subject in the accounts service (step 16)
utils.create_subj_in_acct_svc(ssh=ssh, cert=cfg['cert_loc'], files=files, cn=cfg['cn_url'], loc=loc)
utils.validate_subj_in_acct_svc(ssh=ssh, cert=cfg['cert_loc'], names=names, cn=cfg['cn_url'], loc=loc)
# download the node capabilities and register the node
node_filename = utils.dl_node_capabilities(ssh=ssh, baseurl=SO_SRVR[cfg['mode']], node_id=node_id, loc=loc)
utils.register_node(ssh=ssh, cert=cfg['cert_loc'], node_filename=node_filename, cn=cfg['cn_url'], loc=loc)
utils.approve_node(ssh=ssh, script_loc=APPROVE_SCRIPT_LOC, loc=loc)
# close connection
ssh.close() if ssh else None

def main():
"""
Uses getopt to set config values in order to call
:py:func:`mnlite.mnonboard.cli.run`.
:returns: Config variable dict to use in :py:func:`mnlite.mnonboard.cli.run`
:rtype: dict
"""
# get arguments
try:
opts = getopt.getopt(sys.argv[1:], 'hiPvLd:l:c:',
['help', 'init', 'production', 'verbose', 'local' 'dump=', 'load=', 'check=']
)[0]
except Exception as e:
L.error('Error: %s' % e)
print(HELP_TEXT)
exit(1)
for o, a in opts:
if o in ('-h', '--help'):
# help
print(HELP_TEXT)
exit(0)
if o in ('-i', '--init'):
# do data gathering
CFG['info'] = 'user'
if o in ('-P', '--production'):
# production case
CFG['cn_url'] = CN_SRVR_BASEURL % CN_SRVR['production']
CFG['mode'] = 'production'
else:
# testing case
CFG['cn_url'] = CN_SRVR_BASEURL % CN_SRVR['testing']
CFG['mode'] = 'testing'
if o in ('-d', '--dump'):
# dump default json to file
utils.save_json(a, default_json())
exit(0)
if o in ('-l', '--load'):
# load from json file
CFG['info'] = 'json'
CFG['json_file'] = a
if o in ('-c', '--check'):
try:
CFG['check_files'] = int(a)
except ValueError:
if a == 'all': # this should probably not be used unless necessary!
CFG['check_files'] = a
else:
L.error('Option -c (--check) requires an integer number of files to check.')
print(HELP_TEXT)
exit(1)
if o in ('-L', '--local'):
CFG['local'] = True
L.info('Local mode (-L) will not scrape the remote site and will only test local files.')
L.info('running mnonboard in %s mode.\n\
data gathering from: %s\n\
cn_url: %s\n\
metadata files to check: %s' % (CFG['mode'],
CFG['info'],
CFG['cn_url'],
CFG['check_files']))
try:
run(CFG)
except KeyboardInterrupt:
print()
L.error('Caught KeyboardInterrupt, quitting...')
exit(1)

if __name__ == '__main__':
main()
Loading

0 comments on commit 718bd23

Please sign in to comment.