Skip to content

Latest commit

 

History

History
505 lines (381 loc) · 24 KB

CHANGELOG.md

File metadata and controls

505 lines (381 loc) · 24 KB

Change Log

3.4.3 (2019-06-13)

Full Changelog

  • DataONE Generic Member Node (GMN)

    • Fixed local filenames for BagIt ZIP archives and individually downloaded objects

      • MNRead.getPackage() now includes a Content-Disposition header which suggests a local filename to the browser.
      • The local filename is generated from the object's fileName, formatId, and identifier SysMeta values.
      • As the fileName and formatId fields are user-supplied without any requirements for uniqueness, filenames are modified as necessary in order to prevent the same filename to be used for multiple object.
    • Fixed filenames of the objects contained in BagIt ZIP archives

      • Filenames for the contained objects are generated using the same algorithm as for the package Content-Disposition.
    • Fixed CORS authentication issue that disabled the Download All button on the DataONE Search site.

    • Optimized MNRead.getLogRecords() by significantly reducing the number of DB joins

    • Fixed MNRead.describe() exception when called with non-existing PID

    • Management commands

      • New command that exports user-selected Science Metadata values to JSON

      • New commands that list objects and logs on a remote MN or CN

        • The commands can authenticate to remote MNs using the local DataONE-issued client side certificate, which may provide elevated permissions on the remote MN
      • New command that checks MN -> CN sync and MN object availability

        • Compares the CN's ObjectLists with locally managed object and reports on discrepancies
      • New command that issues CN sync requests for missing objects

        • For various reasons, it is possible objects on a MN to remain undiscovered by the CN. This command checks for local objects in this state and issues object-specific sync requests to the CN as needed.
      • GMN bulk importer

        • Reimplemented using the new async DataONE client

        • Performance is improved compared to the earlier multiprocessing version while also using much less CPU and RAM, especially when using larger numbers of concurrent connections to the remote MN.

        • New features:

          • Option to restrict the import by supplying a file continuing a list of PIDs and/or SIDs to import

          • Option to recursively import aggregated objects

            • Resource Maps can be nested to an arbitrary depth. The recursive import ensures that, as long as the objects are available on the source MN, imported packages will retain all their direct and indirect members
          • Support for importing from a CN

      • More standardized command line arguments across commands

      • Improved help text

    • Removed option to trust client's dateSysMetadataModified in System Metadata

      • Since dateSysMetadataModified is used as the optimistic locking value that must remain constant in a System Metadata read/update/write operation, trusting the client version could cause updates to be lost during concurrent updates from multiple clients.
  • DataONE Client Library

    • New API documentation for developers

    • New async client based on aiohttp

      • Currently supports a limited subset of DataONE CN and MN APIs
    • New async ObjectList and EventLog iterators

    • Added support for passing vendor specific header to DataONE API wrappers

    • Added support for overriding autogenerated MMP boundary string in client

    • Converted the multiprocessed SysMeta, ObjectList and EventLog iterators to classes

  • DataONE Common Library

    • New API documentation for developers

    • Automatically rename files in BagIt to avoid duplicates

    • Move from InvalidSystemMetadata to the less specific InvalidRequest when automatically translating 400 status code to DataONE Exception

    • Improve support for namespaces in the simple_xml wrapper

    • New stream iterator

    • Added examples for how to use the the JSON Web Tokens (JWT) module

    • New CA and CSR utilities

      • These utilities simplify generating CA certificates, certificate requests, and signing them
    • New objectFormatList cache module

      • Local cache of the DataONE ObjectFormatList for a given DataONE environment
      • Provides fast formatId based lookups
      • Supports manual and automatic synchronization from the CN
      • Includes an ObjectFormatList current as of when the package is built in order to support standalone installs and offline usage
    • New module that simplifies adding progress messages to scripts

    • Filesystem utilities are now in a separate module

    • New modules for manipulating and visualizing SubjectInfo trees

      • The DataONE SubjectInfo type is an arbitrarily nested tree of subjects, groups, equivalence and group membership assertions.

      • These modules abstract the SubjectInfo and allows higher level queries such as, "given this SubjectInfo, is this subject allowed to update this object?"

      • The SubjectInfo rendering is based on the ETE Toolkit for analysis and visualization of trees

    • Command line wrapper for the SubjectInfo renderer

      • In progress
    • Fixed and simplified the filesystem iterator, made possible due to move to Py3.

  • Tests and test framework

    • New API documentation for developers

    • Fixed rare random crash in the test sample library

    • Added tests covering more corner cases

    • Improved DB fixture and template system

    • New raw WSGI client for low level testing

      • The client allows specifying and submitting requests directly at the WSGI level
    • Added support for running pytest with --capture=sys

      • Avoids the console getting flooded with debug logs during test runs
    • Added support for updating samples in parallel test runs under xdist

    • Added support for continuing tests without having to close sample diff windows

    • Fixed issue that caused GMN DB fixture refresh to occasionally fail silently, causing test failures

    • New SlenderNode test client

      • The test client acts as counterpart for the adapter, simulating an MN by accepting requests such as MNStorage.create() and providing minimal responses without performing the requested actions.
      • This helps speed up development of SlenderNode adapters by allowing the adapter to perform the same actions repeatedly without having to clear out the objects on the MN between the runs.
    • New utility module for normalizing general XML docs

      • This module can normalize XML docs for which we haven't built PyXB bindings, such as ISO metadata. The normalized documents can then be compared with a regular diff utility.
    • Moved automatically generated and manually created test objects to separate directory trees based on object type and usage

      • Factored file handling out to separate modules
      • Previously, all test files were stored in a single directory
    • Misc smaller fixes and improvements

  • Misc

    • Updated the DataONE Command Line Client (CLI) to v2 APIs and current DataONE Python stack

    • Logging in all modules in d1_python has been updated to log to a module specific logger instead of the root logger

      • It is now possible to control logging individually for each module in the stack
    • Updated all dependencies to current as of 2019-06-11

    • Source code cleanup

      • Removed unused imports

      • Moved to Black + isort + docformatter formatting

      • Code cleanup and refactoring

      • Added more automated source code formatting

        • Based on RedBaron
        • The scripts are in d1_dev and have names starting with src-
    • Expanded README.md with more instructions on how to set up and use a development environment for d1_python.

3.2.0 (2018-10-15)

Full Changelog

  • DataONE Generic Member Node (GMN)

    • Added web interface for GMN

      • MN organization highlighted with logo and description
      • Opening any link at or below the BaseURL of GMN provides a way to reach all publicly available information on the MN. This includes links that do not go to valid endpoints, or are in other ways invalid
      • Functionality includes
        • Links to profile pages on Search for subjects and other MNs
        • Resolve objects on CN, download object from MN, view System Metadata
        • Browse object list and log records, paging forwards and backwards
        • Static links to areas of interest for DataONE
    • Added Apache, Postgres and Python version numbers to GMN status page

    • Reading from the GMN status page with without a browser now returns an XML doc. No need to scrape the page for version numbers and other status information

    • Added multiprocessing to GMN bulk importer

      • 10x speedup seems likely in latency bound systems
    • Added more validation of GMN settings

    • Added migrations to generate Postgres indexes for default sorting

      • Surprisingly, Django does not generate indexes for default sort ordering specified when using the ORM
    • Reconfigured logging to take advantage of Django's support for rotating logs, much like logrotate does. Maximum space to use for logging is now a config setting.

    • Other usability improvements (see log)

  • GMN deployment

    • Improved and streamline GMN install procedures, including

      • Refactored install so that it can be accomplished in two stages, where only the first stage needs to be performed by account with sudo access
    • Improved and cleaned up Apache conf file

      • Move from Rewrite to Redirect / Alias

      • All redirects are in the same configuration file

        • http -> https, / -> GMN "home", /mn -> v2/node
      • Factored repeated paths out

  • Documentation

    • Fixed documentation build warnings and formatting

    • Added SSL/TLS troubleshooting doc, misc other background information

    • Restructured doc layout

    • Refactored shell commands into larger blocks to reduce copy/paste

    • Added more checking and related logic to the blocks to automatically handle more filesystem / OS variations

    • Added Copy buttons for all shell blocks

    • Added docs for

      • How to set up GMN using APT instead of PyPI dependencies
      • How to set up multiple MNs on a single GMN instance
      • How to perform automatic upgrades within GMN 3.x
      • How to cluster tables by index in Postgres to improve GMN perf
    • Added missing glossary entries

    • Upgraded to Sphinx Better theme 0.1.5

  • DataONE Client and Common Libraries

    • Refactored to improve functionality and take advantage of new functionality in the underlying dependencies

    • Standardized logging in d1_python

    • Consistently use timezone aware datetime objects

    • Removed automatic caching (cachecontrol) in d1_client

    • Disabled retries by default in d1_client

      • Retries are still available, but they're opt-in instead of opt-out.
    • Changed default mimetype for XML from application/xml to text/xml

    • Changed default slice size from 1000 to 100

    • Upgrades of all dependencies

  • Tests and test framework

    • Improved support for debugging in PyCharm and factored it out to the test framework
    • Moved to using the PyCharm Diff & Merge tool for viewing sample mismatches
    • Fixed inconsistent normalization of sysmeta replica section
  • Developer, working on the DataONE code base

    • Improved documentation on how to work on our code base, such as

      • Building of GMN DB fixture
      • Using the Responses based DataONE client for debugging GMN
      • Types of test failures
      • Work with the pre-commit Git hooks
      • Authenticating on Postgres from PyCharm
    • Fixture generator updates, new fixture for Django 2.x

    • Better consistency in logging formats, more information logged

  • Developer, using the DataONE libraries

    • Add timezone aware current time functions based on the timezone support that was added in Python 3
    • Cleaner and more consistent interfaces in classes and methods
    • Documentation describing more corner cases, e.g.,for date-time objects
  • Ongoing:

    • Fixing and clean up the Solr client

    • Reworking the examples / utilities

      • Standardize code layout
      • Add standard set of command line arguments
      • Update to use currently available features in Python 3 and the D1 libs
    • Adding tests for GMN multithreaded bulk importer

    • Adding tests for GMN's XSLT based web UI

2.4.2 (2018-02-15)

Full Changelog

  • DataONE Generic Member Node (GMN)

    • Add SysMeta XML doc to each object returned in BagIt from getPackage()
    • Fix configuration checks performed at startup and improve messages
    • Add option to limit number of objects to import in bulk importer
    • Add tests for settings check performed at GMN startup
  • DataONE Client Library

    • Move updateSystemMetadata() to baseclient to make it available for CN calls
  • DataONE Common Library

    • Improve StringIterator to allow for more general usage
  • Tests and test framework

    • Add memory_limit context manager and associated test
      • Provides ability to fail unit tests that exceed a given memory usage target
      • Based on psutil, which is added as a new dependency
    • Add workaround for MultipartEncoder bug in Django test client
    • Improve performance in instance generator
      • Skip generating PyXB type bindings that will not be used in the final SysMeta object
    • Ensure unique media types in test objects
    • Check that dateSysMetadataModified of object obsoleted by update() is set to the GMN server's current datetime
  • Misc

    • Update dependencies to current as of 2018-02-15

2.4.1 (2017-12-14)

Full Changelog

  • DataONE Generic Member Node (GMN)

    • Add XML Schema (XSD) validation of incoming Science Metadata documents
      • Applies to calls to MNStorage.create() and MNStorage.update()
      • Controlled by new SCIMETA_VALIDATION_* settings
    • Update jQuery UI to latest and switch from static to hotlinked
    • Fix mistake in cron setup instructions
    • Fix bug that prevented process_refresh_queue from running
    • Improve error handling and logging in bulk importer
  • DataONE Common Library

    • Add new package for validating Science Metadata
      • The required schemas are included in the package
    • Improve error handling in multiprocessed iterators
    • Misc refactoring
  • Tests and test framework

    • Misc smaller test framework improvements
    • Add basic test for GMN home/status page
  • Misc

    • Update dependencies to current as of 2017-12-06
    • Remove accidentally added duplicates

2.4.0 (2017-11-16)

Full Changelog

  • DataONE Generic Member Node (GMN)

    • Fix ServiceFailure caused by some listObjects() and getLogRecords() slice requests
    • Fix datetimes in DataONE types sometimes returned as naive instead of in the UTC timezone
    • Add support for returning BagIt zip archives from getPackage()
      • BagIt objects are generated directly as streams, so arbitrarily sized BagIt archives can be requested and latency is the same as for regular get()
    • Reverse the ordering of ObjectList (listObjects()) and Log (getLogRecords())
      • Now ordered by modified timestamp, ascending, then identifier, ascending
    • Add limit to the maximum number of items that can be returned in a single slice
      • Adjustable via new MAX_SLICE_ITEMS setting, 5000 by default
    • Add fallback to default settings
      • Allows new settings to be added without requiring admins to add the new settings into their local settings.py after GMN upgrade
    • Add support for API calls without HTTP User Agent header in GMN
      • Such calls may be generated by custom, "one-off" clients
    • Fixes related to create() and update() of resource maps and their aggregated objects
      • "block" and "open" modes implemented as described in RESOURCE_MAP_CREATE in settings_template.py
      • Allow multiple Resource Maps to aggregate the same PID and each other
      • Allow Resource Maps to aggregate non-local and possibly non-existing objects
    • Rework internal representation and handling of revision chains
    • Update bulk importer for handling of out-of-order revisions
    • Refactoring
      • Split up input validation related functionality
      • identifier related functionality
      • Misc
  • DataONE Common Library

    • Add AccessPolicyWrapper, a wrapper for the AccessPolicy DataONE type adds type specific and intuitive methods directly on the object
      • Add SimpleXMLWrapper, similar functionality for XML
      • These abstract away the details of the types and provide for concise and intuitive code
      • We plan on implementing such wrappers for all relevant types
    • Add new subpackage for generators
      • string generator
      • file contents generator
    • Add module for generating and validating BagIt streams
    • Add utilities for basic XML parsing
    • Improve utilities for comparing and normalizing XML
    • New date-time functions to better handle timezones
  • DataONE Client Library

    • Improve error handling in multiprocessed iterators
  • Tests and test framework

    • Add around 300 tests since 2.3.8.
    • Add support for running tests in parallel with pytest.xdist
      • Run of test set, currently around 1100 tests, reduced from 7 to 2 min on dev machine
      • Each worker runs against a separate copy of the GMN database, instantiated from shared template
    • Add context managers for setting the CRUD whitelist
    • Add module that calls DataONE APIs in GMN without using PyXB
      • Gives the ability to generate and check response to invalid requests, such as requests with incorrectly formatted URLs, multipart documents, and DataONE types
    • Add ability to skip recently passed tests
      • Controlled with --skip and related test args
      • Default is to not skip
    • Speed up slice tests
    • Better support for timezones in date_time instance generator
    • Add misc functions for working with random lists in instance generator
    • Fix "flaky" test
      • Failed on some combinations of values in the underlying randomly generated type
    • Misc other smaller test framework improvements
  • Misc

    • Update dependencies to current as of 2017-11-16
    • Better timezone support throughout d1_python

2.3.8 (2017-10-20)

Full Changelog

  • DataONE Generic Member Node (GMN)

    • Add support for storing partial and out-of-order revision chains
      • Automatically combine chain fragments that are found to be part of the same chain
      • Tests for various revision and SID related corner cases
    • Add migrations to latest db
    • Improve progress information in management commands
    • Add diagnostics management command to migrate and repair revision chains
    • Add support for general migrations to bulk importer
    • Remove old migrate_v1_to_v2 command
    • Update database test fixtures and sample docs
  • DataONE Client Library

    • Add multiprocessed log record iterator
    • Refactor multiprocessed iterators to improve reliability
    • Add API v1.2 MN method wrappers (view and package methods)
  • DataONE Common Library

    • Add misc methods, docstrings and tests to access_policy module
    • Update PyXB bindings to PyXB 1.2.6 and update generator script
  • Misc

    • Update dependencies to current as of 2017-10-20

2.3.6 (2017-08-24)

Full Changelog

  • DataONE Generic Member Node (GMN)

    • Add extended listObjects() API
      • Fast method for retrieving large number of selected sysmeta values
      • Returns JSON
      • Quickly generated and parsed
      • No schema
      • Minimal document size
      • Part of a new API class, "ext", which will holds GMN specific APIs
    • Optimize slicing / paging of multi-page result sets
    • Add support for proxy objects in bulk importer
    • Add support for rejecting replication requests for non-public objects
    • Add Proxy, Obsoletes, ObsoletedBy and SeriesId to the custom headers returned by most D1 API methods
      • Proxy header allows clients to determine if an object is a proxy and, if so, where the original object resides
      • Obsoletes, ObsoletedBy headers allow clients to determine if object is part of a revision chain
    • Keep track of ownership and versioning of object filesystem store
    • Check every minute instead of every hour for new replication and sysmeta refresh tasks
    • Ongoing refactoring of diagnostics
  • DataONE Client Library

    • Add support for disabling timeout by passing timeout=None, 0 or 0.0
  • Tests and test framework

    • Add various small unit test improvements
    • Add automatic migration of test database
    • Update samples
    • Ensure that files deleted after previous build are not included in later releases
  • Misc

    • Update dependencies to current as of 2017-08-24

2.3.5 (2017-08-08)

Full Changelog

  • DataONE Generic Member Node (GMN)

    • Add general bulk importer management command
      • Allows upgrading from any earlier version of GMN or other MN stack
    • Add cleardb diag management command
      • Remove cleardb from the diags page
      • Start code for other "diag" management commands
    • Improve the way that chains are represented in the db
      • Less code and faster SID related queries
    • Update revision change related model name
      • After earlier modifications in how the chains are represented, the old names were misleading
    • Add SID filtering support
      • Add support for passing SID as the getLogRecords() idFilter and listObjects() identifier args
      • Note: We don't resolve SIDs for v1, so the v1 pidFilter argument cannot take a SID.
      • Add tests for SID filtering
    • Fix bug: Unable to run management commands
    • Update node registration doc to reflects updated manage.py commands
  • DataONE Client Library

    • Add support for v2 CNRead.synchronize()
    • Expose a ".total" attribute in iterators
      • Clients can read the value from .total to keep track of progress. Earlier, clients had to perform a separate query using filter parameters matching those used by the iterator. There was also a potential race, in that the total could change between query by the iterator and by the client.
    • Change iterator arguments:
      • ObjectListIterator: listObjects_args_dict -> list_objects_args
      • LogRecordIterator: getLogRecords_dict -> get_log_records_arg_dict
    • Add get_and_save() wrapper for MNRead.get()
      • This is a convenience method added because correctly saving the result from get() to a file is a bit tricky, while it is also the most common use of get().
    • Add option to create missing directories for MNRead.get_and_save()
    • Add description on how to use stream=True with MNRead.get()
  • DataONE Common Library

    • Add module for handling obsolescence chains / revisions
      • d1_common/revision.py
    • Rename methods for creating missing directory names
      • d1_common/util.py
    • Add default page size of 100 records for getLogRecords()
      • d1_common/const.py
    • Update default User-Agent to DataONE_Python/x.y.z +http://dataone.org/
    • Add misc type related utilities to d1_common
      • d1_common/type_conversions.py, etc.
  • Tests and test framework

    • Add section in README.md about db fixtures for GMN, how they're used, how to generate them
      • Improve procedure for regenerating db fixture
    • Add mock API handlers
      • MNCore.getCapabilities()
      • CNCore.listNodes()
    • Fix bug: SID did not resolve correctly
      • Add tests for SID resolve
    • Add handling of db where migrations are out of sync in fixture generator
    • Add script that checks scimeta indexing
  • Misc

    • Update dependencies to current as of 2017-08-07