Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DsWriter should create ZIP files directly in archive storage #58

Open
RKrahl opened this issue Oct 21, 2016 · 3 comments
Open

DsWriter should create ZIP files directly in archive storage #58

RKrahl opened this issue Oct 21, 2016 · 3 comments
Labels
enhancement New feature or request performance Issues related to poor performance of the program

Comments

@RKrahl
Copy link
Member

RKrahl commented Oct 21, 2016

Consider an ids.server to be configured as two level storage with storage unit Dataset. When the DsWriter creates a ZIP file to be stored into archive storage, it creates the ZIP file in the cache first and then copies this file to archive storage. The intermediate step in the cache is not needed, this causes all data to hit the disk once more then necessary. The DsWriter should write the ZIP file right into archive storage instead.

@fisherab
Copy link
Contributor

The DsWriter and DsRestorer both make use of the datasetCache. This is not really a cache but just some temporary storage. It used to be a real cache in a much earlier version of the code. It was then kept to reduce the risk of getting the main or archive storage in an inconsistent state due to I/O errors. If this code is removed then code will need to be added to clean up any half built structures if an exception is thrown.

What do you think?

@RKrahl
Copy link
Member Author

RKrahl commented Oct 31, 2016

Dealing with I/O errors is a somewhat broader issue then this one. The question is, what should be done in case of errors on the one hand and what does the current code in ids.server do on the other. I'm afraid, both is not the same in all cases.

Considering the particular case of the DsWriter, if I understand things correctly, in the case of an error, the DsWriter aborts, writes an error message to the log and that's it. The data from main storage will not be written to archive in this case. If the next action in the process queue concerning this dataset will be an ARCHIVE request, the data will be erased from main storage and are gone forever, without any notice. If the next action happens to be a WRITE request and the DsWriter succeeds this time, everything is fine and the previous error will have had no effect at all.

I agree that writing the ZIP file directly into archive storage doesn't make things any better in the case of an I/O error, because the half written ZIP file will most likely be corrupt then. But the intermediate step of writing into cache space isn't sufficient either to save the data. I guess, we should reconsider error handling more in general and postpone this Issue until we came to a conclusion there.

@RKrahl
Copy link
Member Author

RKrahl commented Jan 3, 2018

Depends on #79.

@RKrahl RKrahl added enhancement New feature or request performance Issues related to poor performance of the program labels Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Issues related to poor performance of the program
Projects
None yet
Development

No branches or pull requests

2 participants