Skip to content

A slimmed-down S3 copy utility similar to rclone, designed for efficient file copying between S3-compatible storage systems.

License

Notifications You must be signed in to change notification settings

DDNStorage/go-copy

Repository files navigation

go-copy

A slimmed-down S3 copy utility similar to rclone, designed for efficient file copying between S3-compatible storage systems.

Features

  • rclone compatibility: Reads credentials from ~/.config/rclone/rclone.conf
  • Same syntax: Uses config:bucket/path syntax like rclone
  • Intelligent copying: Compares ETag and size to avoid unnecessary transfers
  • Multipart ETag handling: Computes SHA1 hash for multipart uploads when needed
  • Concurrent workers: Configurable number of copy workers for performance
  • Progress tracking: Real-time progress bar showing copied/skipped/failed counts and current bandwidth
  • File list support: Copy specific files using --files-from parameter
  • Verification: Optional post-copy verification by downloading and comparing SHA1 hashes

Installation

go build -o go-copy .

Configuration

go-copy reads S3 credentials from your existing rclone configuration file at ~/.config/rclone/rclone.conf.

Example rclone.conf:

[myremote]
type = s3
access_key_id = AKIAIOSFODNN7EXAMPLE
secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
region = us-west-2
endpoint = https://s3.amazonaws.com
no_check_certificate = false

[minio]
type = s3
access_key_id = minioadmin
secret_access_key = minioadmin
endpoint = http://localhost:9000
region = us-east-1
no_check_certificate = true

Usage

go-copy [flags] <source> <target>

Flags

  • --files-from <file>: Read list of files to copy from specified file (required)
  • --worker-count <n>: Number of concurrent copy workers (default: 4)
  • --verbose, -v: Enable verbose output
  • --verify: Verify copied files by downloading and comparing SHA1 hashes

Examples

# Copy files listed in files.txt from source to target
go-copy --files-from files.txt myremote:mybucket/source/ myremote:mybucket/target/

# Use 10 workers for faster copying
go-copy --worker-count 10 --files-from list.txt remote1:bucket1/ remote2:bucket2/

# Verbose output for debugging
go-copy -v --files-from files.txt myremote:source-bucket/ myremote:target-bucket/

# Verify copied files by comparing SHA1 hashes
go-copy --verify --files-from files.txt myremote:source/ myremote:target/

File List Format

The --files-from file should contain one file path per line:

file1.txt
path/to/file2.txt
deep/directory/file3.txt
# Comments are supported
another-file.txt

Progress Bar Format

The progress bar displays comprehensive real-time information in this format:

C:15 S:3 F:0 2.5 MiB/s 45.2 MB/1.2 GB ETA:5m30s |████████████████████████████████████████| (18/100, 18 it/s)

Where:

  • C:15 - 15 files copied successfully
  • S:3 - 3 files skipped (already up-to-date)
  • F:0 - 0 files failed
  • 2.5 MiB/s - Current transfer speed (adaptive units: B/s, KiB/s, MiB/s, GiB/s)
  • 45.2 MB/1.2 GB - Data transferred / Total data size (adaptive units: B, KB, MB, GB, TB)
  • ETA:5m30s - Estimated time to completion (adaptive format: seconds, minutes, hours, days)
  • (18/100, 18 it/s) - 18 out of 100 files processed, 18 files per second

Note: The application calculates total data size using concurrent HEAD requests before starting the copy operation. This provides accurate progress tracking and ETA calculations. For large file lists (millions of files), this initial calculation shows its own progress bar.

How It Works

  1. Configuration Loading: Reads S3 credentials from rclone.conf
  2. File List Parsing: Loads the list of files to copy from the specified file
  3. Object Comparison: For each file, compares source and target objects:
    • Compares file size first (quick check)
    • Compares ETag if both objects exist
    • If source ETag contains a hyphen (multipart upload), downloads both objects and compares SHA1 hashes
  4. Concurrent Copying: Uses a worker pool to copy files concurrently
  5. Progress Tracking: Shows real-time progress with detailed statistics:
    • C:X - Files copied successfully
    • S:X - Files skipped (already up-to-date)
    • F:X - Files failed to copy
    • Current bandwidth in KiB/s, MiB/s, or GiB/s
  6. Summary Report: Displays final statistics including files copied, skipped, and failed
  7. Optional Verification: When --verify is used, downloads both source and target objects and compares SHA1 hashes

Comparison Logic

  • Size mismatch: File is copied
  • Target doesn't exist: File is copied
  • ETag match (non-multipart): File is skipped
  • ETag mismatch (non-multipart): File is copied
  • Multipart ETag: Downloads both files, computes SHA1, compares hashes

Verification Mode

When the --verify flag is used, go-copy performs an additional verification step after copying:

  1. Post-Copy Verification: After all copy operations complete, verification runs automatically
  2. Complete Download: Both source and target objects are downloaded in full
  3. SHA1 Comparison: SHA1 hashes are computed for both objects and compared
  4. Separate Progress Bar: Verification shows its own progress bar with statistics
  5. Detailed Results: Verification failures are reported with specific hash mismatches
  6. Performance Impact: Verification requires downloading all data twice, significantly increasing transfer time

When to Use Verification:

  • Critical data transfers where integrity is paramount
  • Suspected network issues or unreliable connections
  • Compliance requirements that mandate data verification
  • One-time migrations where you want absolute certainty

Performance

  • Concurrent operations: Uses configurable worker pools for all operations (copy, verify, size calculation)
  • Intelligent comparison: Avoids unnecessary transfers through ETag/SHA1 comparison
  • Memory efficient: Streams file content without storing in memory
  • Scalable: Tested with millions of files (6GiB memory usage for 15M files is acceptable)
  • Fast size calculation: Uses concurrent HEAD requests with progress tracking
  • Optimized for large datasets: Minimum 8 workers for size calculation, graceful error handling

Error Handling

  • Comprehensive error reporting with context
  • Failed files are tracked and reported in the summary
  • Non-zero exit code if any files fail to copy
  • Verbose mode provides detailed error information

Common Warnings

You may see this warning during operation:

SDK 2025/10/05 17:31:02 WARN Response has no supported checksum. Not validating response payload.

This is a harmless warning from the AWS SDK indicating that the S3 response doesn't include checksum headers for payload validation. This commonly occurs with:

  • Older S3-compatible storage systems (MinIO, Ceph, etc.)
  • Objects uploaded without checksum metadata
  • Custom S3 implementations

This warning has been automatically suppressed in go-copy v1.1+ - the application filters out these specific AWS SDK warnings while preserving other important log messages. Your data is still being transferred correctly and our application performs its own integrity checks using ETag comparison and SHA1 hashing when needed.

Limitations

  • Does not support multipart uploads (files are copied as single objects)
  • No resume capability for interrupted transfers
  • Limited to S3-compatible storage systems
  • Requires existing rclone configuration

License

Copyright 2025 Data Direct Networks

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

A slimmed-down S3 copy utility similar to rclone, designed for efficient file copying between S3-compatible storage systems.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages