Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for 'git::' force token on local filepaths, both absolute and relative #269

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

salewski
Copy link

@salewski salewski commented Aug 28, 2020

New feature: allow git:: forced filepaths, both absolute and relative

Fixes #268

This series of changesets introduces a feature that allows the git:: forcing token to be used on local file system paths to reference Git repositories. Both absolute paths and relative paths are supported. For example:

    git::./some/relative/path/to/a/git-repo//some-subdir?ref=v1.2.3

or:

    git::../../some/relative/path/to/a/git-repo//some-subdir?ref=v1.2.3

or:

    git::/some/absolute/path/to/a/git-repo//some-subdir?ref=v4.5.6

Only filepaths that are prefixed with the git:: forcing token are considered for processing.

Internally, go-getter transforms the provided string into a file:// URI with an absolute filepath, with query string params and subdirectory retained.

The rationale for using a file:// URI internally is that the Git clone operation can already work with file:// URIs, and using them for this feature allows us to leverage the existing go-getter URI-handling machinery. That gets us support for query params (to clone a specific git ref (tag, commit hash, ...)) "for free".

The rationale for using an absolute filepath (even when the provided string is a relative filepath) is that (per RFC 1738 and RFC 8089) only absolute filepaths are legitimate in file:// URIs. But more importantly here, the Git clone operation only supports file:// URIs with absolute paths.

  • Q: Why support this functionality at all?

    Why not just require that a source location use an absolute path in a file:// URI explicitly if that's what is needed?

  • A: The primary reason is to allow support for relative filepaths to Git repos.

    There are use cases in which the absolute path cannot be known in advance, but a relative path to a Git repo is known.

    For example, when a Terraform project (or any Git-based project) uses Git submodules, it will know the relative location of the Git submodule repos, but cannot know the absolute path in advance because it will vary based on where the "superproject" repo is cloned. Nevertheless, those relative paths should be usable as clonable Git repos, and this mechanism would allow for that.

    Support for filepaths that are already absolute is provided mainly for symmetry. It would be surprising for the feature to work with relative file paths, but not for absolute filepaths.

For projects using Terraform, in particular, this feature (along with a small change in the Terraform code to leverage it) enables the non-fragile use of relative paths in a module "call" block, when combined with Git submodules:

    module "my_module" {
        source = "git::../git-submodules/tf-modules/some-tf-module?ref=v0.1.0"
        // ...
    }

In the above example "superproject" Git repo (the one "calling" the terraform module) knows the relative path to its own Git submodules because they are embedded in a subdirectory beneath the top-level of the "superproject" repo.

Two downstream Terraform issues that would require go-getter support for this feature (or something like it) are:

Design Notes

In order for this feature to work, additional contextual information is needed by the Git detector than can be provided using the existing Detector API.

Internally, the Detector's Detect method does not pass along to the Detector implementations all of the contextual information that it has available. In particular, the forcing token and go-getter subdir component are stripped out of the source string before invoking the implementation's Detect method. In the particular case of the Git detector, that means it cannot know that a git:: forcing token was provided on an input string that otherwise looks like a file system path. And that means that it is not correct or safe for it to identify any filepath string value as a Git repository.

Externally, callers (such as Terraform) already provide a value for the pwd parameter of Detect, but it is not (necessarily) the location from which a relative path in a git:: string should be resolved. In a Terraform module (which may be in an arbitrary subdirectory from the process current working directory), module "source" references that contain relative paths must be interpreted relative to the location of the module source file. Terraform has that information available, but in the existing Detect API there is no way to convey it to go-getter.

Constraints

Additional Detector methods cannot be added without burdening all existing detectors (both internal and in the wild) with the need to support them.

Additional Detect method params cannot be added without breaking all existing Detector implementations (internal, wild).

Additional parameters cannot be added to the Detect dispatching function without affecting all callers.

Approach

The goal is to provide the feature in a way that is as minimally invasive as possible. But above all else it needs to avoid breaking backward compatibility in any way.

Given that, the approach taken by this changeset series is to introduce the concept of a "Contextual Detector". It is structured in the same way as the current Detector framework, but works through a new CtxDetector interface that is not constrained by the existing API.

The only callers affected by this change would be those that wish to take advantage of the additional capabilities. And for those, the migration path straight-forward because the new API is structured like the existing one.

In particular, this changeset series introduces four new elements:

  1. CtxDetector interface

  2. CtxDetect dispatching function

  3. CtxDetect method on the CtxDetector interface

  4. Full suite of CtxDetector implementations that are analogues of the existing detectors (most of which (currently) just delegate to the existing Detector implementations).

There is also a global ContextualDetectors list that serves a function analogous to the existing Detectors list.

The centerpiece of the seris is GitCtxDetector, which provides a superset of the functionality of GitDetector, with which it is mainly backward compatible. In fact, GitCtxDetector still delegates Git SSH URL detection to GitDetector.

However, because CitCtxDetector has contextual information available to it, it provides the following advantages:

  • It can safely process git::<filepath> strings as Git repositories;

  • It can avoid attempting detection on strings that have force tokens intended for other detectors; and

  • It can provide error messages when it was unable to process a git:: forced string (it provides this last capability "around" the wrapped delegation to GitDetector, too).


Tentative changes that show how this change can be incorporated into Terraform are in the branch ads/github-issue-25488 here:

There is a related draft PR for those changes here:

While there is a fair amount of new code added in go-getter, the changes required in terraform are quite modest.

@salewski
Copy link
Author

In the wake of the discussion from earlier today, I've improved my testing setup for this issue and have one that demonstrates the main brokenness of the ads/git-forced-rel-path-to-abs branch as it currently exists (at commit: 31e197d84f27):

Here's the repo structure in which I've produced the problem:

  1    $ tree ads-exp-tf-parent/
  2    ads-exp-tf-parent/                <= 'terraform init' invoked here
  3    ├── README.md
  4    ├── git-submodules
  5    │   └── tf-modules
  6    │       └── ads-exp-tf-module     <= Git repo (git submodule)
  7    │           ├── README.md
  8    │           ├── main.tf
  9    │           ├── outputs.tf
 10    │           └── variables.tf
 11    ├── local-tf-modules
 12    │   └── foo
 13    │       ├── local-tf-modules
 14    │       │   └── bar
 15    │       │       ├── main.tf       <= FAILS: "git::../../../../git-submodules/tf-modules/ads-exp-tf-module?ref=v0.1.1"
 16    │       │       ├── outputs.tf
 17    │       │       └── variables.tf
 18    │       ├── main.tf
 19    │       ├── outputs.tf
 20    │       └── variables.tf
 21    ├── main.tf                       <= WORKS: "git::./git-submodules/tf-modules/ads-exp-tf-module?ref=v0.1.0"
 22    ├── outputs.tf
 23    ├── terraform.tfstate
 24    ├── terraform.tfstate.backup
 25    └── variables.tf

The directory ads-exp-tf-parent is a Git repository. It is a "superproject" that contains Git submodules (see git-submodule(1)).

The Git repo reference on line 15 fails because the specified relative path is correct relative to the location of the bar/main.tf file, but incorrect relative the $PWD of the terraform process.

The Git repo reference on line 21 works because the specified path happens to be correct relative to the $PWD of the terraform process.

For reference, the specific error emitted by (my locally modified) terraform is:

    e:
    Error: Failed to download module

    Could not download module "bar_breaker"
    (local-tf-modules/foo/local-tf-modules/bar/main.tf:7) source code from
    "git::../../../../git-submodules/tf-modules/ads-exp-tf-module?ref=v0.1.1":
    error downloading
    'file:///git-submodules/tf-modules/ads-exp-tf-module?ref=v0.1.1': /usr/bin/git
    exited with 128: Cloning into
    '.terraform/modules/aljunk_sub1.aljunk_sub2.bar_breaker'...
    fatal: '/git-submodules/tf-modules/ads-exp-tf-module' does not appear to be a
    git repository
    fatal: Could not read from remote repository.

    Please make sure you have the correct access rights
    and the repository exists.

@salewski salewski force-pushed the ads/git-forced-rel-path-to-abs branch 2 times, most recently from eeb7134 to d790d28 Compare August 31, 2020 08:56
salewski added a commit to salewski/terraform that referenced this pull request Aug 31, 2020
The go-getters library now has support for local file system paths to
Git repositories, specified with the 'git::' forcing token. The feature
works for both absolute and relative filepaths, and supports all the
usual go-getter goodies including '//' delimited subdirs and URI-style
query parameters.[0][1]

We incorporate that capability into Terraform, which allows users to
specify paths to locally present Git repositories from which to clone
other Terrform modules on which they are dependent. When coupled with
Git submodules, this creates a powerful way to manage Terraform modules
at specific versions without requiring those modules to be available on
the network (e.g., on GitHub):

    module "my_module" {
        source = "git::../git-submodules/tf-modules/some-tf-module?ref=v0.1.0"
        // ...
    }

From the perspective of Terraform, such Git repositories are "remote" in
the same way that repositories on GitHub are.

Note that within a Terraform module "call" block, the filepaths
specified are relative to the directory in which the *.tf file lives,
not relative to the current working directory of the Terraform
process. In order to support this feature, Terraform needs to supply
that contextual information to go-getter to allow relative filepath
resolution to work. In order to do so, we needed to switch over to using
go-getter's new "Contextual Detector" API. It works in the same basic
way as the traditional Detector API, but allows us to provide this
additional information.

In keeping with the "keep things simple" comment in the commit message
of 2b2ac1f, we are here maintaining our custom go-getter detectors
in two places. Only now each is called FooCtxDetector rather than
FooDetector. Nevertheless, all except the GitCtxDetector do little more
than "pass through" delegation to its analogous FooDetector counterpart.

Fixes hashicorp#25488
Fixes hashicorp#21107

[0] hashicorp/go-getter#268
[1] hashicorp/go-getter#269
@salewski salewski marked this pull request as ready for review August 31, 2020 11:19
@mdeggies mdeggies changed the base branch from master to main October 23, 2020 00:53
@lae
Copy link

lae commented Jun 15, 2021

Any chance we could get a review of this soon?

@lorengordon
Copy link

I love this idea!

@hashicorp-cla
Copy link

hashicorp-cla commented Mar 12, 2022

CLA assistant check
All committers have signed the CLA.

@salewski salewski marked this pull request as draft June 5, 2023 00:15
@salewski salewski force-pushed the ads/git-forced-rel-path-to-abs branch from d790d28 to 9a40be6 Compare June 5, 2023 01:36
salewski added 7 commits June 5, 2023 19:58
…nd relative

This series of changesets introduces a feature that allows the 'git::'
forcing token to be used on local file system paths to reference Git
repositories. Both absolute paths and relative paths are supported. For
example:
    git::./some/relative/path/to/a/git-repo//some-subdir?ref=v1.2.3
or:
    git::../../some/relative/path/to/a/git-repo//some-subdir?ref=v1.2.3
or:
    git::/some/absolute/path/to/a/git-repo//some-subdir?ref=v4.5.6

Only filepaths that are prefixed with the 'git::' forcing token are
considered for processing.

Internally, go-getter transforms the provided string into a 'file://'
URI with an absolute filepath, with query string params and subdirectory
retained.

The rationale for using a 'file://' URI internally is that the Git clone
operation can already work with 'file://' URIs, and using them for this
feature allows us to leverage the existing go-getter URI-handling
machinery. That gets us support for query params (to clone a specific
git ref (tag, commit hash, ...)) "for free".

The rationale for using an absolute filepath (even when the provided
string is a relative filepath) is that (per RFC 1738 and RFC 8089) only
absolute filepaths are legitimate in 'file://' URIs. But more
importantly here, the Git clone operation only supports 'file://' URIs
with absolute paths.

Q: Why support this functionality at all?

   Why not just require that a source location use an absolute path in a
   'file://' URI explicitly if that's what is needed?

A: The primary reason is to allow support for relative filepaths to Git
   repos.

   There are use cases in which the absolute path cannot be known in
   advance, but a relative path to a Git repo is known.

   For example, when a Terraform project (or any Git-based project) uses
   Git submodules, it will know the relative location of the Git
   submodule repos, but cannot know the absolute path in advance because
   it will vary based on where the "superproject" repo is
   cloned. Nevertheless, those relative paths should be usable as
   clonable Git repos, and this mechanism would allow for that.

   Support for filepaths that are already absolute is provided mainly
   for symmetry. It would be surprising for the feature to work with
   relative file paths, but not for absolute filepaths.

For projects using Terraform, in particular, this feature (along with a
small change in the Terraform code to leverage it) enables the
non-fragile use of relative paths in a module "call" block, when
combined with Git submodules:

    module "my_module" {
        source = "git::../git-submodules/tf-modules/some-tf-module?ref=v0.1.0"
        // ...
    }

In the above example "superproject" Git repo (the one "calling" the
terraform module) knows the relative path to its own Git submodules
because they are embedded in a subdirectory beneath the top-level of the
"superproject" repo.

Two downstream Terraform issues that would require go-getter support for
this feature (or something like it) are at [0] and [1].

This first changeset in the series updates the README.md documentation
to note the new feature and provide examples.

[0] "Unable to use relative path to local Git module"
    hashicorp/terraform#25488

[1] "In 0.12, modules can no longer be installed from local git repositories at relative paths"
    hashicorp/terraform#21107

Design Notes
------------
In order for this feature to work, additional contextual information is
needed by the Git detector than can be provided using the existing
Detector API.

Internally, the Detector's Detect method does not pass along to the
Detector implementations all of the contextual information that it has
available. In particular, the forcing token and go-getter subdir
component are stripped out of the source string before invoking the
implementation's Detect method. In the particular case of the Git
detector, that means it cannot know that a 'git::' forcing token was
provided on an input string that otherwise looks like a file system
path. And /that/ means that it is not correct or safe for it to identify
any filepath string value as a Git repository.

Externally, callers (such as Terraform) already provide a value for the
'pwd' parameter of Detect, but it is not (necessarily) the location from
which a relative path in a 'git::' string should be resolved. In a
Terraform module (which may be in an arbitrary subdirectory from the
process current working directory), module "source" references that
contain relative paths must be interpreted relative to the location of
the module source file. Terraform has that information available, but in
the existing Detect API there is no way to convey it to go-getter.

Constraints
-----------
Additional Detector methods cannot be added without burdening all
existing detectors (both internal and in the wild) with the need to
support them.

Additional Detect method params cannot be added without breaking all
existing Detector implementations (internal, wild).

Additional parameters cannot be added to the Detect dispatching function
without affecting all callers.

Approach
--------
The goal is to provide the feature in a way that is as minimally
invasive as possible. But above all else it needs to avoid breaking
backward compatibility in any way.

Given that, the approach taken by this changeset series is to introduce
the concept of a "Contextual Detector". It is structured in the same way
as the current Detector framework, but works through a new CtxDetector
interface that is not constrained by the existing API.

The only callers affected by this change would be those that wish to take
advantage of the additional capabilities. And for those, the migration path
straight-forward because the new API is structured like the existing one.

In particular, this changeset series introduces four new elements:

    1. CtxDetector interface

    2. CtxDetect dispatching function

    3. CtxDetect method on the CtxDetector interface

    4. Full suite of CtxDetector implementations that are analogues of
       the existing detectors (most of which (currently) just delegate
       to the existing Detector implementations).

There is also a global 'ContextualDetectors' list that serves a function
analogous to the existing 'Detectors' list.

Signed-off-by: Alan D. Salewski <ads@salewski.email>
Once the Detect(...) function finds a detector for a source string it
combines the obtained result string with other data bits to produce the
result returned to the caller. That processing is here extracted into a
new handleDetected(...) function, which will be called from an
additional context in an upcoming commit.

The new handleDetected(...) function lives in a new file:

    detect_common.go

The move emphasizes the function's slightly wider use now by both the
Detect(...) and (soon-to-be-introduced) CtxDetect(...) dispatch
functions. With the introduction of CtxDetect, the logic in
handleDetected is no longer used exclusively by Detect.

This changeset provides an implementation modification, but no
behavioral change.

Signed-off-by: Alan D. Salewski <ads@salewski.email>
The existing Detector interface cannot be extended in a backward
compatible way to support new features desired for the GitDetector
implementation[0]. The new CtxDetector interface is introduced to allow
such extension without breaking backward compatibility with existing
users.

The new interface also avoids adding new methods to the existing
Detector interface that would then need to be supported by all Detector
implementations, both in-library and in the wild.

A CtxDetector is slightly more cumbersome to use than Detector. Callers
can (and should) continue to use Detector unless the enhanced
capabilities of one or more of the CtxDetector implementation modules is
needed. At the time of writing (2020-08), the only CtxDetector with such
extra mojo is the forthcoming GitCtxDetector.

Existing Detector implementations can easily be wrapped by CtxDetector
implementations. The information available to a CtxDetector impl. is a
strict superset of the information provided to a Detector. Where there
is no need for the additional context info provided by the CtxDetect
dispatch function, impls. can simply pass through the common subset to
the Detect method of the analogous Detector impl.

CAVEAT: In this changeset the list of ContextualDetectors is commented-
        out. This is intended to make a clear introduction of the
        interface type prior to introducing any implementations of it.

A forthcoming change will provide such wrapping for all in-tree Detector
impls, followed by the introduction of specialization for the
GitCtxDetector impl.

[0] C.f., hashicorp#268

Signed-off-by: Alan D. Salewski <ads@salewski.email>
...and uncomment the 'ContextualDetectors' defult list of them in
CtxDetector.

The CtxDetector implementations satisfy the CtxDetector interface
for all of the built-in detectors (bitbucket, github, gitlab, file,
git, GCS, and S3), but do not (yet) take advantage of the the
additional contextual information available. These implementations
all just pass-through a subset of their arguments to the existing
Detector impl. analogue.

These are intended to make it comfortable to swap-in use of the new
"Contractual Detector" without having to change much code, and also
provide a place to hang enhancement code that can take advantage of
the contextual information made available by the CtxDetect dispatch
function.

Signed-off-by: Alan D. Salewski <ads@salewski.email>
...except for GitCtxDetector, which will be getting special treatment
since it is the module behind the motivation for this series of changes.

The unit tests committed here are all just minimal adaptations of the
existing detector unit tests to satisfy the CtxDetect method (which
takes three more params than Detect). These tests basically test the
"pass through" behavior of their respective implementations, but that's
useful to demonstrate that they do not interfere with the traditional
behaviors.

Signed-off-by: Alan D. Salewski <ads@salewski.email>
GitCtxDetector is able to process 'git::' forced filepath strings, when
specified with both absolute and relative filepaths. The following all work
now:
    Absolute:
        git::/path/to/some/git/repo
        git::/path/to/some/git/repo//some/subdir
        git::/path/to/some/git/repo//some/subdir?ref=v1.2.3

    Relative (subdir):
        git::./path/to/some/git/repo//some/subdir?ref=v1.2.3

    Relative (parent dir):
        git::../../path/to/some/git/repo//some/subdir?ref=v1.2.3

GitCtxDetector provides a superset of the functionality of GitDetector, with
which it is mainly backward compatible. In fact, GitCtxDetector still
delegates Git SSH URL detection to GitDetector.

However, because CitCtxDetector has contextual information available to it, it
provides the following advantages:

    * It can safely process 'git::<filepath>' strings as Git repositories;

    * It can avoid attempting detection on strings that have force tokens
      intended for other detectors; and

    * It can provide error messages when it was unable to process a 'git::'
      forced string (it provides this last capability "around" the wrapped
      delegation to GitDetector, too).

Signed-off-by: Alan D. Salewski <ads@salewski.email>
Signed-off-by: Alan D. Salewski <ads@salewski.email>
@aellwein
Copy link

any progress here??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add support for 'git::' force support on local filepaths, both absolute and relative
5 participants