Skip to content

WorkShop12_WorkshopSummary

Vitalii Koshura edited this page Feb 26, 2023 · 2 revisions

Replacing the heartbeat mechanism

Problem: while an app is doing I/O-intensive stuff, other apps get no-heartbeat exits;

  • I changed client/API so that the client passes its PID to app, and the app periodically checks whether the client is alive, instead of using heartbeat messages. This mechanism will be used only with new (7.0.37+) clients and new app versions. Other combinations will continue to use heartbeats.
  • We discussed having the client send heartbeat messages in a separate thread. I propose not doing this because the problem should be solved by the above.

Handling long non-checkpoint jobs

Problem: need a mechanism for sending long jobs that don't checkpoint only to hosts that are likely to finish them.

  • Have client send its current uptime and the duration of its previous session's uptime in scheduler request message.
  • On server, allow flagging app versions as non-checkpointing.
  • Scheduler: if app version is non-checkpoint, send job to a host only if its expected runtime is less than the host's uptime or previous uptime.

Server software testing and release management

Goals include:

  • Increase the quality and frequency of server software releases.
  • Increase the stability of the server software in trunk.

We discussed the following:

  • Automated system-level testing of server software. We used to have frameworks for this (boinc/test/) but they're not maintained. We lack the manpower to do this; volunteers are needed.
  • How to test server software? When to do releases? Automated testing would help, but a large number of features can feasibly be tested only in live use. I think we need projects to help as follows:
    • Operate test projects for testing new server software.
    • Use these project to beta-test server software.
    • When have a release candidate, create a new branch, test it using these projects, release it when all bugs fixed.
  • Unit testing of server software. I'm not sure if this has good cost/benefit; few if any would be detected. But if a volunteer wants to write unit tests, I'd be happy to add them to the tree.
  • Automated nightly builds. Rom will look into this. How to do for Win, Mac? http://jenkins-ci.org supports build slaves running on any OS that supports Java.
  • Automated system testing of web software. We lack the manpower to do this; volunteer help is needed. Hint: take a look at http://seleniumhq.org.
  • Improved SCM workflow: We need to introduce code branches to isolate ongoing development from release and maintenance processes in order to stabilise the codebase and facilitate stable releases.
    • Develop new features in dedicated "feature branches", branching off master. Merge back into master when developer testing was successful (features can be pretty small, merge often)
    • Create a "next" or a "release candidate" branch for the upcoming release, branching off master. Test and fix release until ready for release, merge fixes back to master
    • Maintain each release in its dedicated branch to allow for maintenance. Merge fixes back to master.
    • Alternatively, go for "[thing(http://nvie.com/posts/a-successful-git-branching-model|the)]" using https://github.com/nvie/gitflow.

Remote job submission

Some changes were proposed but I forget what they were. Wenjing?

Francisco Sanz described the system developed by Ibercivis. Key features:

  • "Subproject": the unit of access control; a set of apps
  • "Scientist" and "batch" tables
  • Scientists submit/control jobs using "mini-shell"
  • WU generate limits outstanding WUs per batch (to limit DB size)

Server scheduling (user quotas, accelerated batch completion)

Several people expressed interest in these features. We will work on them, hopefully in the 2-3 month timeframe. Design docs are here: JobPrioritization, PortalFeatures

Comments (on boinc_dev) are welcome.

Python framework for validation and assimilation

David Coss worked on documentation for this. David, please add to the Wiki or send to me.

Support for job DAGs

David Coss presented this. I think it would be a useful feature, although no project other than David's had an immediate need for it. We should document it and add it to the source tree.

In David's system, the DAG is generated from a command file, with dependencies determined by the names of input/output files. We discussed the ideas of:

  • Deciding when some of these (small jobs) can be done on the server
  • Deciding when jobs that are small but have large intermediate files can be grouped together and done (using the wrapper) on a single client

Drupal/BOINC integration

Oliver demonstrated this. My impression is that it's about 90% complete. When done we can potentially add it to BOINC.

Locality scheduling

This is on hold until someone (e.g. Einstein@home) needs it.

More info: LocalityNew

BOINC on Android

Current work items:

  • Make sure that everything needed to build BOINC/Android, and test apps, is in the BOINC tree and documented (Rom).
  • Finish the GUI. Main items:
    • Add interface for adding/removing projects and account managers.
    • Show graphics of some sort (BOINC and/or project-specific)
  • Get some projects to add Android/ARM app versions.

VM Apps

Nils Hoimyr expressed a wish for including VBox in the BOINC installer.

Clone this wiki locally