copy_files should have an option to skip files if they already exist #17

JohannesWiesner · 2022-11-30T08:38:43Z

Use case: I use copy_files mostly for copying files from a server to my local pc. Since it often happens that the connection gets lost, I have to restart the copying multiple times. In this case, it would be nice if copy_files would have an option to check if the file already exists (+ optionally checking if it's not corrupted) and only copy files that haven't already been copied.

JohannesWiesner · 2023-03-09T20:04:59Z

Could make sense to use bash's rsync command here?

import subprocess
# copy files using rsync in order to only copy new files and not old ones (
# this spares us time as we are avoiding unnecessary overwriting)
bashCommand = f"rsync -av --exclude .* --copy-links {src_dataset_path}/ {dst_dataset_path}"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()

JohannesWiesner · 2023-03-09T20:12:54Z

But rsync is only available on Linux?

https://stackoverflow.com/questions/4260767/looking-for-cross-platform-rsync-like-functionality-in-python-such-as-rsync-py

This might help?
https://github.com/gchamon/sysrsync

JohannesWiesner · 2024-01-29T15:24:13Z

Should be possible to sync files with sysrsync. Specifiy option='copy' or option='sync' in nisupply.io.copy_files (the latter is only possible only linux systems and only if rsync is preinstalled).

This would be the code for sysrsync:

for file,dst_dir in zip(dti_df['filepath'],dti_df['dst_dir']):
    
    dst_dir = dst_dir + '//'
    sysrsync.run(source=file,
                  destination=dst_dir,
                  options=['-a','--mkpath'],
                  sync_source_contents=False)

Important: sysrsync removes the trailing slash of the dst_dir by default, but we need that in order to sync a file to a folder.

The expression would be: rsync /foo/bar.txt /dst/bar/

Basic rules
Syncs source contents by default, so it adds a trailing slash to the end of source, unless sync_source_contents=False is specified
Removes trailing slash from destination
Extra arguments are put right after rsync
Breaks if source_ssh and destination_ssh are both set

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

copy_files should have an option to skip files if they already exist #17

copy_files should have an option to skip files if they already exist #17

JohannesWiesner commented Nov 30, 2022

JohannesWiesner commented Mar 9, 2023

JohannesWiesner commented Mar 9, 2023

JohannesWiesner commented Jan 29, 2024 •

edited

Loading

copy_files should have an option to skip files if they already exist #17

copy_files should have an option to skip files if they already exist #17

Comments

JohannesWiesner commented Nov 30, 2022

JohannesWiesner commented Mar 9, 2023

JohannesWiesner commented Mar 9, 2023

JohannesWiesner commented Jan 29, 2024 • edited Loading

JohannesWiesner commented Jan 29, 2024 •

edited

Loading