-
Notifications
You must be signed in to change notification settings - Fork 176
DFC requirements
In order to improve the DFC and have it satisfying for as many VOs as possible, this RFC aims at being an exhaustive requirements list. Some items of this list are already fulfilled by the current DFC, but it is important to have them listed.
The DFC should perform at least as well as the LFC LHCb is currently using:
- Operation time < 100 ms
- minimum 50 operations per second
- Horizontally scalable (multiple service instances against one DB)
- 100M files, 200M replicas and 20M directories
- No hard limit on the LFN depth (the current limit of 15 is already reached by LHCb)
It is important that under any circumstances, the DFC data remain consistent. Power cuts, network interruptions and so on should not alter the DFC content.
- It should be easy to move files and transfer their ownership to another user
- Proper permission management (detailed criteria to be detailed)
- Accounting functionality: files/user, size/user, files/SE, size/SE, files/directory, size/directory
- It should be possible to set a replica as temporarily unavailable
- Advanced search features: regex ( see https://github.com/DIRACGrid/DIRAC/issues/1793, https://github.com/DIRACGrid/DIRAC/issues/1098, https://github.com/DIRACGrid/DIRAC/issues/1037), all files (LFN) on a given SE
Some gain in performances in stability can already be easily obtained
- Avoid mixed engines and use only InnoDB
- Some primary keys are missing (FC_metaDatasetFiles, FC_metaSetNames, FC_metaSetsFileAncestors)
- Add foreign keys within managers (e.g. FC_Files and FC_FileInfo should be linked with a FK)
Putting this in place will actually be a consistency check for the data already stored in the DFC
Some changes in the code and in the DB will be necessary to address all the items of the previous list. The currently considered direction is the following:
- Define sets of managers meant to work together. This would allow to define foreign keys across managers.
- Create a directory manager based on the Closure table principle
- The Space usage should be updated using triggers
- Use of atomic low level procedures. This would allow to gain in performance by having the atomic and unitary operations compiled in the DB, but still keep all the logic in the code
- The current DB schema in the sql file is very messy and contains unused and unusable tables. We should start on a clean one
These proposals need of course to be tested, and are totally subject to discussion