Skip to content

Conversation

TheAssembler1
Copy link
Collaborator

@TheAssembler1 TheAssembler1 commented Aug 6, 2025

Adds the following enum so the storage strategy can be selected:

typedef enum pdc_region_writeout_strategy {
    /**
     * Store data as multiple regions inside a single file.
     * Overlapping writes that are not fully contained append new regions
     * to the end of the file, with metadata tracking region locations.
     * Supports incremental updates without rewriting large parts of the file.
     */
    STORE_REGION_BY_REGION_SINGLE_FILE = 0,

    /**
     * Store the entire object as a single flat file.
     * Reads and writes operate by seeking directly within the file.
     * No region metadata bookkeeping; simpler but less flexible for partial updates.
     */
    STORE_FLATTENED_SINGLE_FILE,

    /**
     * Store each flattened region in its own separate file.
     * Enables independent file management per region.
     */
    STORE_FLATTENED_REGION_PER_FILE
} pdc_region_writeout_strategy;

The STORE_REGION_BY_REGION_SINGLE_FILE is the default strategy. The STORE_FLATTENED_REGION_PER_FILE is the new strategy which stores each region of an object in a separate file. The region size the object is sliced into is decided in:

/**
 * Used decide how to split object into chunks each of which will be a file on disk
 */
static perr_t
PDC_shrink_file_dims(uint64_t *temp_file_dims, const uint64_t *obj_dims, uint8_t obj_ndim, size_t unit)

By default it will try to slice the object into regions that are 4 MB in size by halving the largest dimension of the object iteratively until within the <= 4 MB.

This is set here uint64_t max_bytes_per_file = 4ULL * 1024 * 1024; within the PDC_shrink_file_dims function.

@TheAssembler1
Copy link
Collaborator Author

We might want to compare the performance between the storage strategies before merging.

@TheAssembler1 TheAssembler1 marked this pull request as ready for review September 18, 2025 19:29
@TheAssembler1 TheAssembler1 requested a review from a team as a code owner September 18, 2025 19:29
@TheAssembler1 TheAssembler1 changed the title Draft: Region Per File Region Per File Sep 18, 2025
@TheAssembler1 TheAssembler1 changed the title Region Per File Region per file storage strategy Sep 18, 2025
@jeanbez jeanbez changed the title Region per file storage strategy Draft: Region per file storage strategy Oct 21, 2025
@jeanbez jeanbez assigned jeanbez and unassigned jeanbez Oct 21, 2025
@jeanbez jeanbez added the type: enhancement New feature or request label Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants