Releases: SciCompKL/OpDiLib
Releases · SciCompKL/OpDiLib
v2.0.0
Interface Changes
Macros
OPDI_CRITICAL_NAMEmacro no longer accepts arguments other than name- new
OPDI_CRITICAL_NAME_ARGSmacro - new
OPDI_MASKED,OPDI_END_MASKEDmacros - new
OPDI_SINGLE_COPYPRIVATE,OPDI_SINGLE_COPYPRIVATE_NOWAITmacros
Layer Interfaces
- introduce
OmpLogicInterface::beginSkippedParallelRegion,OmpLogicInterface::endSkippedParallelRegion - refactor
OmpLogicInterface::resetTask->OmpLogicInterface::resetImplicitTask - changed offset in enums
OmpLogicInterface::ScopeEndpointandOmpLogicInterface::SyncRegionKind - introduce
BackendInterface::getCriticalIdentifier,BackendInterface::getReductionIdentifier,BackendInterface::getOrderedIdentifier - refactor
BackendInterface::getNestedLockIdentifier->BackendInterface::getNestLockIdentifier - refactor
BackedInterface::getTaskData->BackendInterface::getImplicitTaskData
Compiler Options
- introduce
OPDI_BACKEND_GENERATE_MASKED_EVENTS,OPDI_BACKEND_GENERATE_WORK_EVENTS,OPDI_OMPT_BACKEND_IMPLICIT_TASK_END_SOURCE,OPDI_OMPT_BACKEND_BARRIER_IMPLEMENTATION_BEHAVIOUR,OPDI_OMP_LOGIC_CLEAR_ADJOINTS,OPDI_SYNC_REGION_BARRIER_BEHAVIOUR,OPDI_SYNC_REGION_BARRIER_IMPLICIT_BEHAVIOUR,OPDI_SYNC_REGION_BARRIER_EXPLICIT_BEHAVIOUR,OPDI_SYNC_REGION_BARRIER_IMPLEMENTATION_BEHAVIOUR,OPDI_SYNC_REGION_BARRIER_REVERSE_BEHAVIOUR
Instrumentation
- all functions in
OmpLogicInstrumentInterfacereceive AD data as parameters - calls to instrumentation occur under the same conditions as AD treatment, like tape activity
Features
- support for custom mutexes
- support for masked construct
- support for broadcasts via
single copyprivate - provide
EmptyToolas a default tool layer implementation - support for disabling OpDiLib at runtime: set
opdi::toolto an instance ofEmptyTool(preferred) ornullptr - endpoints of sync regions that result in reverse barriers become configurable
- optionally use the implicit barriers at the end of parallel regions to produce ImplicitTaskEnd events of non-primary threads
- expose mechanism to skip AD treatment of parallel regions
- backend-independent solution for copy operations at the beginning of parallel regions
- optionally clear adjoints as part of tape resets
- the OMPT backend's implementation of
opdi_*functions can be included as a default runtime implementation - syntax checking supports local enablind/disabling via
// opdi-syntax-on,// opdi-syntax-off - additions to syntax definition
- modernize declaration of custom reductions
- faster tests, extended test suite and CI pipeline (see below)
- changes of
OmpLogicOutputInstrumentin accordance with other changes in OpDiLib
Performance Improvements
- implicit barriers on parallel regions do not produce reverse barriers
- generating events for masked constructs and worksharing constructs becomes optional
MutexOmpLogic::onMutexReleasedno longer acquires an internal mutex- reductions in the macro backend no longer require additional internal locks, address of parallel data as wait identifier
- ordered constructs in the macro backend no longer share their wait identifier, use address of parallel data as wait identifier
Refactoring and Code Quality
- refactor
MutexOmpLogic, more expressive names and reduced boilerplate, comments regarding thread-safety - refactor
ParallelOmpLogicandImplicitTaskOmpLogic, more expressive names and fewervoid*pointers - transition from master to masked throughout OpDiLib
- remove
ProbeScopeStatus, resolve order ofImplicitTaskProbeandReductionProbeviaomp_get_level - dedicated header file for macros related to thread sanitizer
- refactoring of further smaller aspects throughout OpDiLib
- add warnings, errors, and assertions throughout OpDiLib
Fixes
- ensure that
setAdjointAccessModehas no effect when called in the reverse pass - address compiler warnings
- fix bug in application of syntax checking to a single file
- fix ranges in
TestExternalFunctionLocalfor small data and/or many threads - fix memory leak: implicit tasks free allocated tape positions
Tests
Test Suite
- test suite moves from C++11 to C++17
- testing with strict warning levels
- updated image versions
- new tests:
TestCriticalNameHint,TestExternalFunctionLogicCalls,TestCustomMutex,TestSingleCopyprivate,TestSingleCopyprivateNowait,TestParallelCopyin,TestParallelFirstprivate,TestParallelFirstprivate2,TestParallelForLastprivate,TestOrderedMultiple,TestOrderedNowait,TestOrderedReduction,TestLock2 - new drivers
DriverPrimalfor primal computations in the presence of OpDiLibDriverFirstOrderForwardfor the tapeless forward mode of AD in the presence of OpDiLibDriverFirstOrderReverseVectorfor vector mode tests
- drivers specify associated tests explicitly
- introduce driver-specific flags
- where possible, prefer OpDiLib's tool interface over direct interaction with the OO AD tool
- faster tests
- smaller internal data sets
- default to
-O1 - automatic deduction of appropriate parallelism
- tests do not use atomics for reversal of serial parts
- instrumentation no longer tied to debug mode
- support for excluding specific tests and drivers from testing
- container file for thread sanitizer tests
CI Pipeline
- test different configurations of OpDiLib
- source of ImplicitTaskEnd events
- generation of events for masked constructs, worksharing constructs
- automated syntax checking
- tests with address sanitizer
- tests with thread sanitizer
Miscellaneous
- parallel regions suspend recording on the encountering task's tape
- barriers always produce events for both endpoints
- backend layer becomes responsible for producing events for both reduction-related barriers
- new newsletter address
- Zenodo publication
v1.7.1
v1.7
Restructure data of parallel regions and implicit tasks
- associate tapes, positions, and adjoint access modes with tasks instead of parallel regions
- extend data of parallel regions by pointers to the parent task and child tasks
- backend support for querying the current task's data
- create task data for initial implicit tasks
- track adjoint access mode entirely via tasks
- support resetting tasks according to a tape position, to recover previous adjoint access modes
Features
- track adjoint access mode of initial implicit task
- support additional sync region and worksharing types
- support use of mutexes before tool initialization (OMPT backend)
- support posting messages that are printed during the reverse pass
Instrument
- implement onSetAdjointAccessMode in OmpLogicOutputInstrument
- adapt OmpLogicInstrumentInterface and OmpLogicOutputInstrument according to the other changes
Tests
- run build tests with enabled output instrument
- tests treat non-empty output on stderr as error (can be disabled)
- add tests' error output files to gitignore
- add assertions to adjoint access control tests
- add TestAdjointAccessControlNested2, TestTaskReset
Fixes
- fix transport of adjoint access mode into and out of parallel regions
- fix using directive in second order test driver
- fix placement of skipParallelHandling decrement
Chore
- update license headers for 2025
- update OpDiLib-related publications in README
v1.6
Features
- Gitlab CI pipeline
- tracking of passive parallel regions
- tsan annotations for mutex synchronization in the reverse pass
postEvaluate()method in the logic interface
Fixes
- build flags in the test system for producing reference files
- memory leaks
- in the
OPDI_SINGLEmacro - in tests for external functions
- clearing the tape pool
- in the
Chore
- update links