Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

November 2016

Darren Hardy edited this page Nov 28, 2016 · 11 revisions

During November 2016 we did a lot of work on dor_indexing_app and its associated systems

Summary

  • Improved performance by ~70% in latency, went from ~11s to ~3s per request
  • Met throughput target of 3 days for full reindex (1.2M objects)
  • Implemented performance instrumentation via New Relic
  • Performed detailed analysis and exposed bottlenecks in code and systems during realistic workloads
  • Improved stability and documentation of pipeline and systems
  • Implemented monitoring of all systems

Performance

  • dor_indexing_app

    • Added third node to cluster to improve thoughput
    • Tuned concurrency parameters between ActiveMQ and reindexer
    • Installed New Relic for performance analysis and instrumentation
    • Added instrumentation metrics to indexing logs
    • Upgrade VM performance by improving hosting configuration
  • dor-services

    • Reduced redundant calls to the workflow services
    • Refactored to avoid unnecessary calls to retrieve collection information
    • Parse XSLT scripts only once, rather than on every request
    • Avoid unnecessary reload of collection objects
    • Reduce query traffic to Solr cloud
    • Cache the Fedora client certificate store to avoid unnecessary reinitialization
  • DOR (Fedora)

    • Removed traffic from fedora.apim.access messaging (unused but was high volume)
    • PENDING: Upgrade NFS storage appliance
    • PENDING: Upgrade VM performance by improving hosting configuration

Stability

  • dor_indexing_app

    • Full stack monitoring (Fedora, Workflow Service, Sul-Solr, SulMQ)
    • OkComputer-based monitoring
    • Clarified and documented API, deprecate GET routes
    • Upgrade to Rails 5, and use Honeybadger
  • dor-services

    • Refactored model hierarchy and upgraded ActiveFedora to 8.x
    • Use identityMetadata as authoritative model definition
    • Rely more on stanford-mods for metadata extraction
    • Removed unused and unmaintained code
  • Argo

    • Delegate reindexing from internals to dor-indexing-app services
    • Remove dead unused /dor routes and associated code
  • DOR (Fedora)

    • Direct monitoring
  • Workflow Service

    • Direct monitoring
    • Change performance configuration of Oracle database server
    • PENDING: Isolate services onto its own VMs
  • ActiveMQ

    • Direct monitoring
    • Background reindexing when idle -- takes ~3 days for 1.2M objects using this method

Current bottlenecks

  • ~70% of the time in the to_solr method running application logic
  • ~30% of the time is spent in external services to DOR, Solr, and Workflow, in that order
  • Sensitive to VM CPU availability and DOR performance
Clone this wiki locally