-
Notifications
You must be signed in to change notification settings - Fork 1
Methodology
- Virtual Address Layout
- ASLR
- Stack Smashing
- Pointer Encryption
- Process failure
- Intro
- Failure Prediction
- Checkpoint [Recovery]
- Replication
- Redundant Computaion
- ULFM
- What is ULFM?
- What functionalities are supported?
- Explain different functions in detail
- How ULFM is used
- Adaptive Process Replication [talk about different mapped regions and how user segment is totally different]
-
Data Seg Replication
Data segment is the memory where global
initialized
anduninitialized
variables reside. Memory is allocated and mapped to a variable name during compilation and this mapping does not change during program execution.As already referred, this segment is divided into two parts
initialized
anduninitialized
segments. Variables initialized with a non zero value goes in theinitialized
segment and variables uninitialized or initialized with"0"
goes to uninitialized segment.On looking at the symbol table of an executable (using
nm
) one can find four symbols associated with the data segment which are of interest.__data_start
marks the start of initialized data section,_edata
marks the end of initialized data section,__bss_start
marks the start of uninitialized data section and_end
marks the end of uninitialized data section. These symbol cannot be used as variables instead they are only used to extract the boundaries of memory associated with them using&
operator. Ex:void *a = &__data_start
. -
Stack Seg Replication:
Stack segment is used for function calling and storing local variables only associated to a particular function.
-
Heap Seg Replication
- Pointers by value to functions
-
- Integrating ULFM
- Basic Design
- comm err handler
- MPI_* functions pseudocode [Algorithm design] [imp]
- Explain above in detail [imp]
- Handling Collectives
- Special case for send/recv
- One process failure [if job has more than 1 ranks]
- All process [rank] failure in a job [Fail Stop]
- More inter checkpointing time because of replication
- Fortran wrapper
- Process Manager
- Setup
- Computation configuration
- Failure Model [Fault Injection]
- Plain run with n order redundancy [n = 1, 2, 3]
- n order redundancy with fault injection
- random faults [kill any]
- kill the job with > 1 ranks [no fail stop]
- n order redundancy with replication map update
- n order redundancy with replication map update and fault injection
- Failure Predictions
- More intelligent process manager
- Handling communicator split
- Compatibility with more Async MPI_I* functions
- Optimizing heap segment container storage for fast insertion/deletion.
- Efficient MPI_ANY_SOURCE
- Terminologies
- Job
- Node
- Constraints
- ASLR [drawback]
- Stack Smashing problem [drawback]
- Out of the box support
- Compiler
How this framework support different Fault Tolerance mechanisms.
© Abhishek Upperwal