Skip to content

WeeklyTelcon_20160426

Geoff Paulsen edited this page Apr 26, 2016 · 8 revisions

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Jeff Squyres
  • Todd Kordenbrock
  • Sylvain Jeaugey
  • Ralph
  • Nysal
  • Nathan Hjelm
  • Joshua Ladd
  • Howard
  • Geoff Paulsen

Agenda

Review 1.10

  • Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.3
    • PR 1097 - for 1.10 may be mute.
    • PSM2 issue short version. PSM2 API - uses a fixed UUID - so all jobs across cluster use same UUID (bad)
    • Jeff will check 1.10.3 lib versions. Ralph already updated for 1.10.3, but jeff will check
    • 1.10 is hanging if it doesn't get enough slots. Ralph will look at.

Review 2.0.x

Review Master MTT testing (https://mtt.open-mpi.org/)

  • Widespread failure of mpool / rcache failure on usNIC last night.
  • Ralph is seeing a bunch of attribute failures on 1.10.
    • Jeff is passing in BTL parameters that limits him to a shared memory component, but it's going across nodes. So the attribute thinks it's failing, because some of them can't communicate.

MTT Dev status:

  • whatever happened to better faster
    • Ralph needed an interface to submit results. That's there today. They can send a JSON structured packet, and it will submit correctly.
      • The other side (webside) Still has a few values that would need to change.
    • Submitter side is there.
    • Reporter side would need some work.
      • Josh had a student write something here in javascript. Long term would be to port that work to framework.
    • Perl client uses REST storage.
    • Should Jeff change his client to use new interface?
      • Josh would recommend waiting until Ralph got his changes in.
    • Right now we don't yet do cross product expansion. This would be good for Howard's intern to look at.
      • This is the week for Howard / Josh / Ralph to sync up on this work.
    • Jeff would like to be able to filter out certain failure signatures to see other bugs. Probably not even in new reporter.
  • On Master, you don't need to do ENABLE_THREAD_MULTIPLE anymore.

Status Updates: (skip until next week)

  • Mellanox
  • Sandia
  • Intel

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM
  3. Cisco, ORNL, UTK, NVIDIA

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally