-
Notifications
You must be signed in to change notification settings - Fork 36
Home
The qthreads API is designed to make using large numbers of threads convenient and easy, and to allow portable access to threading constructs used in massively parallel shared memory environments. The API maps well to both MTA-style threading and PIM-style threading, and we provide an implementation of this interface in both a standard SMP context as well as the SST context. The qthreads API provides access to full/empty-bit (FEB) semantics, where every word of memory can be marked either full or empty, and a thread can wait for any word to attain either state.
The qthreads library on an SMP (i.e. the POSIX implementation) is essentially a library for spawning and controlling tasks: user-level (non-kernel) threads with small (4k) stacks. The threads are entirely in user-space and use their blocked/unblocked status as part of their scheduling. The library's metaphor is that there are many qthreads and several "shepherds". Shepherds can be thought of as thread mobility domains; they map to specific processors or memory regions, and define where a qthread can, must, or would prefer to execute. Qthreads can be assigned to specific shepherds and do not migrate unless either directed to migrate or the shepherd is disabled or, if unassigned, stolen by another shepherd in search of work. This implementation supports both OpenMP (via the ROSE compiler) and Chapel, and can be used directly.
The Qthreads OpenMP implementation has proven to load-balance and scale better than both the GCC and Intel compiler OpenMP implementations for single-address-space computation (see paper below, Scheduling Task Parallelism on Multi-Socket Multicore Systems).
- Data Structures - information about C-based lock-free data structures provided by the Qthreads library.
- qtCnC - usage information for the Qthreads implementation of the Concurrent Collections model.
Add mailing list references too:
POSIX Qthreads supports most POSIX-style machines, including Linux, Solaris, and MacOS X, running on a variety of architectures. It has been tested on:
Architecture | Linux | Solaris | MacOS X | SST | Cygwin |
---|---|---|---|---|---|
PPC32 | + | + | + | ||
PPC64 | + | + | |||
IA32 | + | + | + | ||
IA64 | + | ||||
ARM | + | ||||
AMD64/x86_64 | + | + | |||
SparcV9+ | + | ||||
TilePro (MIPS) | + | ||||
TileGX (MIPS-like) | + |
Qthreads has been tested with:
Compiler | Status |
---|---|
GCC 3.x | Works (not on PPC) |
GCC 4.x | Works (PPC requires 4.2+) |
Apple Clang 3.0 | Works with pthread spinlocks, not built-in spinlocks; C++ support does not work |
Clang 2.9 | Works with pthread spinlocks, not built-in spinlocks; C++ support does not work |
Clang 3.0+ | Works ; C++ support does not work |
PGI 9.0 | Works |
PGI 10.0 | Works |
PGI 11.x | Works |
Intel ICC 11.1.x | Works ; does not support inline assembly on IA64 |
Intel ICC 12.x | Works |
Intel ICC 13.x | Works |
TileraMDE 2.0.0.77314 | Works; requires -O0
|
TileraMDE 4.0.alpha11.134874 | Works |
SunStudio 12 | Causes internal compiler errors ("Wasted space") |
To compile and run the POSIX Qthreads you will require:
- A UNIX-like shell (Qthreads uses the GNU Autotools)
- C Compiler (earlier than 1.5 requires either C++ or the [Cprops library](http://cprops.sf.net/ Cprops library)
To compile and run SST Qthreads you will also require:
- PPC C Compiler
- A full complement of static libraries ([here](http://www.cs.sandia.gov/sst/allStatLibs.tar.gz here))
Detailed installation directions are included in the INSTALL file in the distribution. Generally, we use GNU autotools and the standard configuration and installation behavior.
To cite qthreads, please use:
-
Qthreads: An API for Programming with Millions of Lightweight Threads
Kyle Wheeler, Richard Murphy, Douglas Thain
In the Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium (IPDPS '08, in the MTAAP '08 workshop), IEEE Press, 2008.
To cite sherwood, please use:
-
OpenMP Task Scheduling Strategies for Multicore NUMA Systems
Stephen Olivier, Allan Porterfield, Kyle Wheeler, Michael Spiegel, and Jan Prins
The International Journal of High Performance Computing Applications, 26(2):110–124, May 2012.
Additional related publications:
-
Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications
Dylan Stark, Richard Barrett, Ryan Grant, Stephen Olivier, Kevin Pedretti and Courtenay Vaughan
In the Proceedings of the 2014 Workshop on Exascale MPI (ExaMPI), IEEE Press, 2014. -
Adaptive Scheduling Using Performance Introspection
Allan Porterfield, Rob Fowler, Anirban Mandal, David O’Brien, Stephen L. Olivier, Michael Spiegel
RENCI Technical Report TR-12-02, December 2012. -
The Chapel Tasking Layer Over Qthreads
Kyle B. Wheeler, Richard C. Murphy, Dylan Stark, Bradford L. Chamberlain
In the Proceedings of the Cray User Group 2011, June 2011. -
Scheduling Task Parallelism on Multi-Socket Multicore Systems
Stephen Olivier, Allan Porterfield, Kyle Wheeler, and Jan Prins
In the Proceedings of the 25th International Conference on Supercomputing (ICS ‘11, in the ROSS ‘11 workshop), ACM Press, 2011. -
Implementing a Portable Multi-threaded Graph Library: the MTGL on Qthreads
Brian Barrett, Jonathan Berry, Richard Murphy, Kyle Wheeler
In the Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium (IPDPS '09, in the MTAAP '09 workshop), IEEE Press, 2009. -
Portable Performance from Workstation to Supercomputer: Distributing Data Structures with Qthreads
Kyle Wheeler, Douglas Thain, Richard Murphy
In the Proceedings of the First Workshop on Programming Models for Emerging Architectures (PMEA), IEEE Press, 2009.