Skip to content

Windows: git clone fails — invalid paths with trailing space in datascan/bulk-creation-scripts/dataquality  #26

@mtopps

Description

@mtopps

What happened

Cloning the repository on Windows fails at checkout because some files/folders contain an invalid trailing space in the path. Windows filesystems do not accept names that end with a space, so Git cannot create those paths and the checkout aborts.

Reproduction steps

  1. On a Windows machine (PowerShell or Git Bash), run:

git clone https://github.com/GoogleCloudPlatform/cloud-dataplex.git

  1. Observe clone succeeds but checkout fails with errors like:
error: invalid path 'datascan/bulk-creation-scripts/dataquality /datascan.py'
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
  1. Attempting to restore/checkout shows more invalid paths:
git restore --source=HEAD :/
error: invalid path 'datascan/bulk-creation-scripts/dataquality /datascan.py'
error: invalid path 'datascan/bulk-creation-scripts/dataquality /lib.py'
error: invalid path 'datascan/bulk-creation-scripts/dataquality /main.py'
error: invalid path 'datascan/bulk-creation-scripts/dataquality /readme.md'
error: invalid path 'datascan/bulk-creation-scripts/dataquality /sample_config.yaml'

Environment

OS: Windows 11 Pro 24H2 Build 26100.6584 (tested on PowerShell and Git Bash)
Git: 2.51.0
Repository: https://github.com/GoogleCloudPlatform/cloud-dataplex

Affected paths (examples)

datascan/bulk-creation-scripts/dataquality /datascan.py
datascan/bulk-creation-scripts/dataquality /lib.py
datascan/bulk-creation-scripts/dataquality /main.py
datascan/bulk-creation-scripts/dataquality /readme.md
datascan/bulk-creation-scripts/dataquality /sample_config.yaml

Why this is a problem

Windows filesystems (NTFS/FAT) disallow filenames/folder names with trailing spaces. When Git tries to check out these paths on Windows, it fails and prevents cloning the repo correctly for Windows users.
This also affects cross-platform contributors and CI that runs on Windows

clone without checking out files

git clone --no-checkout https://github.com/GoogleCloudPlatform/cloud-dataplex.git
Cloning into 'cloud-dataplex'...
remote: Enumerating objects: 586, done.
remote: Counting objects: 100% (212/212), done.
remote: Compressing objects: 100% (126/126), done.
remote: Total 586 (delta 137), reused 86 (delta 86), pack-reused 374 (from 1)
Receiving objects: 100% (586/586), 984.29 KiB | 4.45 MiB/s, done.
Resolving deltas: 100% (303/303), done.

cd cloud-dataplex

attempt to remove the folder from the index (fails)

git rm -r --cached "datascan/bulk-creation-scripts/dataquality "
fatal: pathspec 'datascan/bulk-creation-scripts/dataquality ' did not match any files

attempting to checkout still fails because of invalid paths

git checkout
error: invalid path 'datascan/bulk-creation-scripts/dataquality /datascan.py'
error: invalid path 'datascan/bulk-creation-scripts/dataquality /lib.py'
error: invalid path 'datascan/bulk-creation-scripts/dataquality /main.py'
error: invalid path 'datascan/bulk-creation-scripts/dataquality /readme.md'
error: invalid path 'datascan/bulk-creation-scripts/dataquality /sample_config.yaml'

On a non-windows machine

git clone https://github.com/GoogleCloudPlatform/cloud-dataplex.git
Cloning into 'cloud-dataplex'...
remote: Enumerating objects: 586, done.
remote: Counting objects: 100% (212/212), done.
remote: Compressing objects: 100% (126/126), done.
remote: Total 586 (delta 137), reused 86 (delta 86), pack-reused 374 (from 1)
Receiving objects: 100% (586/586), 985.03 KiB | 4.21 MiB/s, done.
Resolving deltas: 100% (302/302), done.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions