[TOC]
- Figure 1: High-Level Decoder Architecture
- Figure 2: Parse stage Flow
- Figure 3: Reconstruction stage Flow
- Figure 4: Loop Filter stage Flow
- Figure 5: LF Vertical Stage
- Figure 6: LF Horizontal Stage
- Figure 7: CDEF stage Flow
- Figure 8: CDEF Filter Flow
- Figure 9: Loop Restoration stage Flow
- Figure 10: Main Thread Flow Chart
- Figure 11: Worker Thread Flow Chart
- Figure 12: Tile Parallelism (L > T)
- Figure 13: Tile Parallelism (L < T)
- Figure 14: Illustration for Tile Row MT for 4 Tiles and 9 Threads
- Figure 15: Frame_i with 5 Threads
- Figure 16: Sample SB with Blocks numbered (BlockModeInfo)
- Figure 17: Map with offset for each Block (p_mi_offset)
- Table 1: Important Frame level buffers
This document describes the Intel SVT-AV1 decoder design. In particular, the decoder block diagram and multi-threading aspects are described. Besides, the document contains brief descriptions of the SVT-AV1 decoder modules such as parse, reconstruction, etc. This document is meant to be an accompanying document to the "C Model" source code, which contains the more specific details of the inner working of the decoder.
The high-level decoder pipeline is shown in Figure 1. Further details on individual stages are given in subsequent sections. The multi-threading aspect of the decoder is also explained in detail in a separate section. The details of high-level data structures and frame buffers used in the decoder are also covered briefly later.
The major modules of the SVT-AV1 decoder are:
- Parse
- Reconstruction
- Loop filter
- CDEF
- Loop Restoration
The Parse stage does all tasks related to OBU reading, arithmetic decoding and related mv prediction (such as find_warp_samples, etc.) that produce the necessary mode info and residual data info.
The header level parsing of sequence parameters and frame parameters happens separately via read_sequence_header_obu andread_frame_header_obufunctions. Parse Tile module does the parsing of tile group obu data. Figure 2 shows a typical flow of the parse stage.
Input : Bitstream buffer
Output : ModeInfo buffer, TransformInfo buffer, Coeff buffer Picture buffer with predicted values filled for blocks with palette mode Reconstructed picture buffer without applying post-processing filters for single-thread mode
parse_frame_tiles() is the function that will trigger the parsing module in SVT-AV1 decoder. Parsing for each tile will be called by start_parse_tile(). Then parsing happen for each superblock in a tile by calling the function parse_super_block().
Note: The prediction for palette mode happens during the Parse stage itself.
This stage performs prediction, coefficient inverse scan, residual inverse quantization, inverse transform and finally generated the reconstructed data without applying post-processing filters. Figure 3 shows a typical flow of the reconstruction stage.
Input : ModeInfo buffer, TransformInfo buffer, Coeff buffer
Output : Reconstructed picture buffer without applying the post-processing filters.
Figure 3: Reconstruction stage Flowdecode_frame_tiles() function starts reconstruction at frame level. Then decode_tile_job() is called for each tile.
For each Superblock in a tile, decode_super_block() function will be called.
The total number of blocks inside a superblock and their corresponding mode_info structure are stored while parsing. This helps to avoid calling of decode_block() function recursively.
Note: The prediction for palette mode happens during the Parse stage itself.
Note: In single thread mode, decode_super_block() will be called immediately after every parse_super_block() for better cache efficiency. In this mode decode_frame_tiles() will be completely avoided.
The loop filter function is to eliminate (or at least reduce) visually objectionable artifacts associated with the semi-independence of the coding of super blocks and their constituent sub-blocks as per section 7.14 of AV1 spec. This stage applies the loop filter for the entire frame. Flow diagram for Loop Filter is below in Figure 4.
Input : Reconstructed picture buffer
Output : Loop Filtered frame.
- eb_av1_loop_filter_frame_init() Initialization of loop filter parameters is performed here.
- Dec_loop_filter_sb()
- Apply dec_av1_filter_block_plane_vert()
* Loop through each block in SB.
- Loop through each TUs in a block.
- Calculate the LF params.
- Apply LF for each vertical TU edge.
- Loop through each TUs in a block.
- Then applydec_av1_filter_block_plane_horz().
* Loop through each block in SB.
- Loop through each TUs in a block.
- Calculate the LF params.
- Apply LF for each horizontal TU edge.
- Loop through each TUs in a block.
The CDEF performs deringing based on the detected direction of blocks as per section 7.15 of AV1 spec. This stage applies the CDEF for the entire frame. The flow diagram for the CDEF is shown in Figure 7.
Input: Output of Loop filtered buffer.
Output : CDEF filtered buffer.
Steps involved in CDEF:
- svt_cdef_frame()function will be called to start CDEF for a frame.
- For each 64x64 superblock function svt_cdef_block()will be called.
- The number of non-skip 8x8 blocks calculated.
- Store the 3-pixel rows of next SB's in line and column buffer to do CDEF in original pixels.
- Call eb_cdef_filter_fb() for each 8x8 non-skip block * Find the direction of each 8x8 block. * Filter the 8x8 block according to the identified direction- eb_cdef_filter_block_c(). * Store the results in the destination buffer
This stage applies the Loop Restoration for the entire frame and the process is defined as per section 7.17 of AV1 spec. The flow diagram for the LR is shown in Figure 9.
Input : CDEF filtered reconstructed buffer
Output : LR filtered reconstructed buffer
Loop Restoration for a frame starts from the function dec_av1_loop_restoration_filter_frame ().
The steps involved are:
- Call dec_av1_loop_restoration_filter_row() for each row of height sb_size.
- call eb_dec_av1_loop_restoration_filter_unit() for each LR unit of size 64x64
- Use the stored CDEF/ LF above/below boundaries form neighbor block based on processing row is outer or inner row respectively by calling the function setup_processing_stripe_boundary() .
- Apply the LR filter (stripe_filter) based on the type of unit_lrtype.
- Restore the LR filtered data back to stripe_buffer by function restore_processing_stripe_boundary().
Parallelism in the decoder could be achieved at multiple levels. Each thread could, for example, be performing a different task in the decoding pipeline. The decoder will use tile level parallelism for tile parsing jobs. Decoder reconstruction jobs will use tile row-level parallelism, whereas all the post-processing filter jobs will use frame row-level parallelism.
Let N be the number of threads configured for the decoder. The decoder library created (N-1) threads, which are called worker threads. The application thread which calls the decode process is called in the main thread. Together the decoder will have N working threads.
The main thread will perform the following inside the decoder:
- Parse all OBUs completely other than OBU_TILE_GROUP
- Do MV Projection Frame Row job if anything is pending
- Wait till all threads have finished the stage
- Parse Tile data if any tile parsing is pending
- Do Reconstruct Tile Row job if anything is pending
- Do Loop Filter Frame Row job if anything is pending
- Do CDEF Frame Row job if anything is pending
- If upscale is enabled, wait till all threads have finished the stage
- Do LR Frame Row job if anything is pending
- Wait till all threads have finished the stage
- Return the control back to the caller (application)
Figure 10 shows the flow chart of the main thread.
The worker thread will perform the following:
- Wait for the start frame processing flag
- Do MV Projection Frame Row job if anything is pending
- Wait till all threads have finished the stage
- Parse Tile data if any tile parsing is pending
- Do Reconstruct Tile Row job if anything is pending
- Do Loop Filter Frame Row job if anything is pending
- Do CDEF Frame Row job if anything is pending
- If upscale is enabled, wait till all threads have finished the stage
- Do LR Frame Row job if anything is pending
- Wait till all threads have finished the stage
Figure 11 shows the flow chart of the worker thread.
The decoder will use tile level parallelism for tile parsing jobs. Let T be the number of tiles present in Frame_i and let L be the number of threads working on this frame. Each thread will try to pick up a tile parsing job and execute it as shown in Figure 12 and Figure 13 below.
Please note that the thread number and tile number need not match. Each thread can pick any tile based on job availability. The pictures are just for understanding purpose only.
Decoder reconstruction uses tile row-level parallelism. Wavefront Processing (WPP) will be used to handle data dependencies. Figure 14 shows 9 threads reconstructing 4 Tiles in Frame_i with Tile Row-level parallelism. Each thread picks a Tile row MT job and works in a WPP manner.
Each thread will try to pick a unique tile that has not yet processed any row and continues to pick the tile-row jobs from the same tile until no more jobs are present in the same tile. If all the jobs in current tile are picked, it switches to the new tile with maximum number of jobs to be processed. If a unique tile that has not yet processed any row is not found, it picks the tile with maximum number of jobs to be processed.
All the post-processing filter jobs will use frame row-level parallelism. Wavefront Processing (WPP) will be used to handle data dependencies if required. LF, CDEF, and LR may work with different unit sizes depending on the available parallelism unit instead of SB.
Figure 15 shows 5 threads applying post-processing filters on Frame_i. Each thread picks a Frame row MT job and works in a WPP manner.
The job selection is controlled using shared memory and mutex. DecMtRowInfo (for Parse Tile, Recon Tile, CDEF Frame, LR Frame), DecMtMotionProjInfo (for Motion Projection), DecMtParseReconTileInfo (for Frame Recon) and DecMtlfFrameInfo (for LF Frame) data structures hold these memory for job selection.
The sync points are controlled using shared memory and mutex. The following are the shared memory used for various syncs inside the decoder stages, like top-right sync.
- sb_recon_row_parsed: Array to store SB Recon rows in the Tile that have completed the parsing. This will be used for sb decode row start processing. It will be updated after the parsing of each SB row in a tile finished. If the value of this variable is set, the recon of an SB row starts. This check is done before decoding of an SB row in a tile starts inside decode_tile().
- sb_recon_completed_in_row: Array to store SBs completed in every SB row of Recon stage. Used for top-right sync. It will be updated with number of SBs being reconstructed after a recon of SB finished. If recon of 'Top SB' and 'top right SB' is done in the previous row, then only decoding of current SB starts. This check is done before the decoding of SB starts inside the function decode_tile_row().
- sb_recon_row_map: This map is used to store whether the recon of SB row of a tile is finished. Its value is updated after recon of a tile row is done inside decode_tile() function. If the recon of 'top, top right, current and bottom SB row' is done, then only LF of current row starts. This check is done before starting LF inside the function dec_av1_loop_filter_frame_mt().
- lf_row_map: This is an array variable of SB rows to store whether the LF of the current row is done or not. It will be set after the LF of the current row is done. If the LF of the current and next row is done, then only we start CDEF of the current row. This check is done before CDEF of current row starts inside the function svt_cdef_frame_mt().
- cdef_completed_for_row_map: Array to store whether CDEF of the current row is done or not. It will be set after the CDEF of the current row is done. If the CDEF of current is done, then only we start LR of the current row. This check is done before LR of current row starts inside the function dec_av1_loop_restoration_filter_frame_mt().
- Hard-Syncs: The Following are the points where hard syncs, where all threads wait for the completion of the particular stage before going to the next stage, are happening in the decoder.
- Hard Sync after MV Projection. svt_setup_motion_field() is the function where this hard-sync happens.
- Hard Sync after CDEF only when the upscaling flag is present. svt_cdef_frame_mt() is the function where this hard-sync happens.
- Hard Sync after LR. Function where this hard-sync happens is dec_av1_loop_restoration_filter_frame_mt().
The following are some important buffers used in the decoder.
Structure Description | Granularity |
---|---|
BlockModeInfo | 4x4 |
SB info | SB |
TransformInfo | 4x4 |
Coeff | 4x4 |
Delta Q & Delta LF Params | SB |
cdef_strength | 64x64 |
p_mi_offset | 4x4 |
Table 1 Important Frame level buffers
This buffer contains block info required for Recon. It is allocated for worst-case every 4x4 block for the entire frame.
Even though the buffer is allocated for every 4x4 in the frame, the structure is not replicated for every 4x4 block. Instead, each block has associated with only one structure even if the block size is more than 4x4. A map with an offset from the start is used for neighbor access purposes. This reduced the need for replication of data structure and better cache efficient usage.
Figure 15 shows a sample superblock split to multiple blocks, numbered from 0 to 18. So 19 BlockModeInfo structures are continuously populated from SB start location, corresponding to each block (Instead of replicating the structures for all the 1024 4x4 blocks). Assume this is the first SB in the picture, then Figure 16 shows the map with offset for each location in the SB and stored in p_mi_offset buffer. This map will be used for deriving neighbor BlockModeInfo structure at any location if needed.
This buffer stores SB related data. It is allocated for each SB for the entire frame.
Transform info of a TU unit is stored in this buffer. It is allocated for each TU unit, the worst case for each 4x4 in a frame.
This buffer contains coeff of each mi_unit (4x4). Each mi_unit contains 16 coeffs. For ST, it is allocated for each 4x4 unit for an SB, whereas for MT it is at each 4x4 for the entire frame.
This buffer is used to store delat_q params and is allocated at the SB level for the entire frame.
This buffer is allocated at the SB level for the entire frame.
This is allocated at the 64x64 level for the entire frame.
The following are the high-level data structures in the decoder. Major elements in the structure are explained below.
- DecConfiguration :
- uint32_t active_channel_count
- uint32_t channel_id ID assigned to each channel when multiple instances are running within the same application.
- uint32_t compressed_ten_bit_format Offline packing of the 2bits: requires two bits packed input. Default is 0.
- EbBool eight_bit_output Outputs 8-bit pictures even if the bitstream has higher bit depth. Ignored if the bitstream is 8-bit. Default is 0.
- uint64_t frames_to_be_decoded Maximum number of frames in the sequence to be decoded. 0 = decodes the full bitstream. Default is 0.
- EbBitDepth max_bit_depth
- EbColorFormat max_color_format
- uint32_t max_picture_height Picture parameters -height
- uint32_t max_picture_width Picture parameters -width
- uint32_t num_p_frames Number of frames that can be processed in parallel. Default is 1.
- int32_t operating_point Default is -1, the highest operating point present in the bitstream. A value higher than the maximum number of operating points present returns the highest available operating point.
- uint32_t output_all_layers When set to 1, returns output pictures from all scalable layers present in the bitstream. Default is 0, only one output layer is returned, defined by operating_point parameter
- EbBool skip_film_grain Skip film grain synthesis if it is present in the bitstream. Can be used for debugging purpose. Default is 0.
- uint64_t skip_frames Skip N output frames in the display order. 0 : decodes from the start of the bitstream. Default is 0.
- uint32_t stat_report
- uint32_t threads Number of threads used by the decoder. Default is 1.
- SeqHeader :
- EbColorConfig color_config Colour Configuration structure
- DecoderModelInfo decoder_model_info Decoder Mode Information structure
- uint8_t decoder_model_info_present_flag Specifies whether decoder model information is present in the coded video sequence
- uint8_t delta_frame_id_length Specifies the number of bits used to encode delta_frame_id syntax elements
- uint8_t enable_cdef 1: Specifies that cdef filtering may be enabled. 0: specifies that cdef filtering is disabled
- uint8_t enable_dual_filter
- 1: Indicates that the inter prediction filter type may be specified independently in the horizontal and vertical directions.
- 0: Indicates only one filter type may be specified, which is then used in both directions.
- uint8_t enable_filter_intra
- 1: Specifies that the use_filter_intra syntax element may be present.
- 0: Specifies that the use_filter_intra syntax element will not be present
- uint8_t enable_interintra_compound
- 1: Specifies that the mode info for inter blocks may contain the syntax element interintra.
- 0: Specifies that the syntax element interintra will not be present
- uint8_t enable_intra_edge_filter Specifies whether the intra edge filtering process should be enabled
- uint8_t enable_masked_compound
- 1: Specifies that the mode info for inter blocks may contain the syntax element compound_type
- 0: Specifies that the syntax element compound_type will not be present
- uint8_t enable_restoration
- 1: Specifies that loop restoration filtering may be enabled.
- 0: Specifies that loop restoration filtering is disabled
- uint8_t enable_superres
- 1: Specifies that the use_superres syntax element will be present in the uncompressed header.
- 0: Specifies that the use_superres syntax element will not be present
- uint8_t enable_warped_motion
- 1: Indicates that the allow_warped_motion syntax element may be present
- 0: Indicates that the allow_warped_motion syntax element will not be present
- uint8_t film_grain_params_present Specifies whether film grain parameters are present in the coded video sequence
- uint8_t frame_height_bits Specifies the number of bits minus 1 used for transmitting the frame height syntax elements
- uint8_t frame_id_length Used to calculate the number of bits used to encode the frame_id syntax element.
- uint8_t frame_id_numbers_present_flag Specifies whether frame id numbers are present in the coded video sequence
- uint8_t frame_width_bits Specifies the number of bits minus 1 used for transmitting the frame width syntax elements
- uint8_t initial_display_delay_present_flag Specifies whether initial display delay information is present in the coded video sequence.
- uint16_t max_frame_height Specifies the maximum frame height minus 1 for the frames represented by this sequence header
- uint16_t max_frame_width Specifies the maximum frame width minus 1 for the frames represented by this sequence header
- EbAv1OperatingPoint operating_point[MAX_NUM_OPERATING_POINTS] Operating Point Param structure
- uint8_t operating_points_cnt_minus_1 Indicates the number of operating points minus 1 present in the coded video sequence
- OrderHintInfo order_hint_info Order Hint Information structure
- uint8_t reduced_still_picture_header Specifies that the syntax elements not needed by a still picture are omitted
- uint8_t sb_mi_size Superblock size in 4x4 MI unit
- BlockSize sb_size
- uint8_t sb_size_log2 Superblock size inlog2 unit
- uint8_t seq_force_integer_mv
- Equal to SELECT_INTEGER_MV indicates that the force_integer_mv syntax element will be present in the frame header (providing allow_screen_content_tools is equal to 1).
- Otherwise, seq_force_integer_mv contains the value for force_integer_mv
- uint8_t seq_force_screen_content_tools
- Equal to SELECT_SCREEN_CONTENT_TOOLS, indicates that the allow_screen_content_tools syntax element will be present in the frame header.
- Otherwise, seq_force_screen_content_tools contains the value for allow_screen_content_tools
- EbAv1SeqProfile seq_profile Specifies the features that can be used in the coded video sequence
- uint8_t still_picture
- 1: Specifies that the coded video sequence contains only one coded frame
- 0: Specifies that the coded video sequence contains one or more coded frames
- EbTimingInfo timing_info Timing Information structure
- uint8_t use_128x128_superblock
- 1: Indicates that superblocks contain 128x128 luma samples
- 0: Indicates that superblocks contain 64x64 luma samples.
- FrameHeader :
- uint8_t all_lossless Indicates that the frame is fully lossless at the upscaled resolution
- uint8_t allow_high_precision_mv
- 0: Specifies that motion vectors are specified to quarter pel precision
- 1: Specifies that motion vectors are specified to eighth pel precision
- uint8_t allow_intrabc
- 1: Indicates that the Intra block copy may be used in this frame.
- 0: Indicates that the Intra block copy is not allowed in this frame
- uint8_t allow_screen_content_tools
- 1: Indicates that intra blocks may use palette encoding
- 0: Indicates that palette encoding is never used
- uint8_t allow_warped_motion
- 1: Indicates that the syntax element motion_mode may be present
- 0: Indicates that the syntax element motion_mode will not be present
- uint32_t buffer_removal_time[MAX_NUM_OPERATING_POINTS] Specifies the frame removal time in units of DecCT clock ticks counted from the removal time of the last random access point for operating point op_num
- uint8_t buffer_removal_time_present_flag
- 1: Specifies that buffer_removal_time is present.
- 0: Specifies that buffer_removal_time is not present
- CdefParams cdef_params Constrained Directional Enhancement Filter
- uint8_t coded_lossless Indicates that the frame is fully lossless at the coded resolution of FrameWidth by FrameHeight
- uint32_t current_frame_id Specifies the frame id number for the current frame
- DeltaLfParams delta_lf_params Delta Loop Filter Parameters
- DeltaQParams delta_q_params Delta Quantization Parameters
- uint8_t disable_cdf_update Specifies whether the CDF update in the symbol decoding process should be disabled
- uint8_t disable_frame_end_update_cdf
- 1: Indicates that the end of frame CDF update is disabled
- 0: Indicates that the end of frame CDF update is enabled
- uint8_t error_resilient_mode
- 1: Indicates that error resilient mode is enabled
- 0: Indicates that error resilient mode is disabled
- AomFilmGrain film_grain_params Film Grain Parameters
- uint8_t force_integer_mv
- 1: Specifies that motion vectors will always be integers
- 0: Specifies that motion vectors can contain fractional bits
- uint32_t frame_presentation_time Specifies the presentation time of the frame in clock ticks DispCT counted from the removal time of the last random access point for the operating point that is being decoded
- int32_t frame_refs_short_signaling
- FrameSize frame_size Frame Size structure
- FrameType frame_type Specifies the type of the frame
- InterpFilter interpolation_filter Specifies the filter selection used for performing inter prediction
- uint8_t is_motion_mode_switchable
- 0: Specifies that only the SIMPLE motion mode will be used
- struct LoopFilter loop_filter_params Loop Filter Parameters
- uint8_t lossless_array[MAX_SEGMENTS] Indicates the flag to set coded_lossless variable
- LrParams lr_params[MAX_MB_PLANE] Loop Restoration Parameters
- uint32_t mi_cols
- uint32_t mi_rows
- uint32_t mi_stride
- uint32_t order_hint Used to compute OrderHint
- uint32_t order_hints[REF_FRAMES] Specifies the expected output order for each reference frame
- uint8_t primary_ref_frame Specifies which reference frame contains the CDF values and other states that should be loaded at the start of the frame
- QuantizationParams quantization_params Quantization Parameters
- uint8_t reduced_tx_set
- 1: specifies that the frame is restricted to a reduced subset of the full set of transform types
- uint8_t ref_frame_idx[REF_FRAMES] Specifies which reference frames are used by inter frames
- uint32_t ref_frame_sign_bias[TOTAL_REFS_PER_FRAME]
- 1: Indicates that the end of frame CDF update is disabled
- 0: Indicates that the end of frame CDF update is enabled
- uint32_t ref_order_hint[REF_FRAMES] Specifies the expected output order hint for each reference frame
- uint32_t ref_valid[REF_FRAMES] An array which is indexed by a reference picture slot number
- 1: Signifies that the corresponding reference picture slot is valid for use as a reference picture
- 0: Signifies that the corresponding reference picture slot is not valid for use as a reference picture
- ReferenceMode reference_mode Reference Mode structure
- uint8_t refresh_frame_flags Specifies the length of the buffer_removal_time syntax element
- SegmentationParams segmentation_params Segmentation Parameters
- uint8_t show_existing_frame
- 1: Indicates the frame indexed by frame_to_show_map_idx is to be output.
- 0: Indicates that further processing is required
- uint8_t show_frame
- 1: Specifies that this frame should be immediately output once decoded
- 0: Specifies that this frame should not be immediately output
- uint8_t showable_frame
- 1: Specifies that the frame may be output using the show_existing_frame mechanism
- 0: Specifies that this frame will not be output using the show_existing_frame mechanism
- SkipModeInfo skip_mode_params Skip Mode Parameters
- TilesInfo tiles_info Tile information
- TxMode tx_mode Specifies how the transform size is determined
- uint8_t use_ref_frame_mvs
- 1: Specifies that motion vector information from a previous frame can be used when decoding the current frame
- 0: Specifies that this information will not be used
- DecHandle :
- struct Av1Common cm
- EbDecPicBuf* cur_pic_buf[DEC_MAX_NUM_FRM_PRLL]
- uint32_t dec_cnt
- EbSvtAv1DecConfiguration dec_config
- EbHandle* decode_thread_handle_array
- FrameHeader frame_header
- uint8_t is_lf_enabled
- MainFrameBuf main_frame_buf Main Frame Buffer containing all frame level bufs like ModeInfo for all the frames in parallel
- int32_t mem_init_done Flag to signal decoder memory init is done
- EbMemoryMapEntry* memory_map
- uint32_t memory_map_index
- EbMemoryMapEntry* memory_map_init_address
- EbDecPicBuf* next_ref_frame_map[REF_FRAMES]
- int32_t num_frms_prll Num frames in parallel
- void* pv_dec_mod_ctxt
- void* pv_lf_ctxt
- void* pv_lr_ctxt
- void* pv_main_parse_ctxt
- void* pv_pic_mgr Pointer to Picture manager structure
- EbDecPicBuf* ref_frame_map[REF_FRAMES]
- struct ScaleFactors ref_scale_factors[REF_FRAMES]
- int32_t remapped_ref_idx[REF_FRAMES]
- uint8_t seen_frame_header
- SeqHeader seq_header
- int32_t seq_header_done Flag to signal seq_header done
- struct ScaleFactors sf_identity Scale of the current frame with respect to itself.
- uint8_t show_existing_frame
- uint8_t show_frame
- uint8_t showable_frame
- uint32_t size
- EbBool start_thread_process
- struct DecThreadCtxt* thread_ctxt_pa
- EbHandle thread_semaphore
- uint64_t total_lib_memory