Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: remote taskfiles (HTTP) #1152

Merged
merged 17 commits into from
Sep 12, 2023
Merged

feat: remote taskfiles (HTTP) #1152

merged 17 commits into from
Sep 12, 2023

Conversation

pd93
Copy link
Member

@pd93 pd93 commented May 5, 2023

#770 is now the most requested feature in Task by quite some margin and there have been frequent requests/questions around it in recent weeks. Previous requests have been rejected based on complexity, but I believe that the way this draft is implemented might actually reduce the Task reader complexity as it abstracts the logic for dealing with files away from the recursive reader.

This PR is intended to open a dialogue about how this feature might look if it is added to Task. Please note that this is first draft and is open to significant changes or rejection entirely.

The PR

The first commit only contains changes to the underlying code to allow it to be more extensible via a new Node interface. This interface allows for multiple implementations of the Taskfile reader. This change also reimplements the existing filesystem based reader as a FileNode.

The second commit adds a very crude example of how a new HTTPNode might look. I've made a (temporary) change in this commit to the root Taskfile which changes the docs include to use https:// (it just points to the same file on GitHub). You should be able to run task --list and see no difference as it will now download the remote file into memory and run exactly the same way as if the file was local!

Discussion points

  1. API design - Currently, the way this is implemented will allow you to specify any scheme by adding myscheme:// to the start of the Taskfile URI when including a file. Is this flexible enough for everyone's needs? It should be easy enough to add ssh:// or git:// schemes etc. in the future.
  2. CLI - It should be trivial to allow running a remote Taskfile from the CLI. i.e. where your root Taskfile is not on your filesystem. How should this look in terms of args/flags?

Copy link
Member

@andreynering andreynering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good initial proposal.

To be complete, we'll need to address topics like:

  1. Caching. Fetching the file on every run is probably not a good idea. The right place to store it is probably on .task/something/. (Storing there has the benefit of allowing the user to see what exactly was downloaded).
  2. Security. On the previous proposals, some users suggested that we should allow (require?) the user to inform the sha256 checksum of the file, so execution won't happen if it doesn't match.

Perhaps we should take some inspiration from go itself and have a task --download flag (analog to go download) that will download the remote Taskfiles? If they weren't downloaded, Task won't run and will output an error. If the checksum changed, it means the file was changed and task --download will need to be ran again. This would prevent stuff from download or run with the user explicit permission.

As an alternative, we could store checksums on a .task-sum.yml file on --download which would download new files, but without that flag the execution would error if the checksum doesn't match. (Inspired by go.sum).

I may be overthinking. All this is just some thoughts dumping so we can discuss how to handle this, but we may end up doing something different...

This extra complexity is part of why this was delayed a couple of times in the past. 🙂

Comment on lines 39 to 57
resp, err := http.Get(node.URL.String())
if err != nil {
return nil, errors.TaskfileNotFoundError{URI: node.URL.String()}
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
return nil, errors.TaskfileNotFoundError{URI: node.URL.String()}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps consider adding more information to the error? (err.Error() or the status code).

Otherwise, if the request fails, the user will have a hard time debugging it.

@pd93 pd93 force-pushed the remote-taskfiles branch 2 times, most recently from 9907896 to 6d6b119 Compare May 16, 2023 00:31
@pd93
Copy link
Member Author

pd93 commented May 16, 2023

@andreynering Thanks for the comments. I've given this some thought and done a bit of investigating.

  1. Caching. Fetching the file on every run is probably not a good idea. The right place to store it is probably on .task/something/. (Storing there has the benefit of allowing the user to see what exactly was downloaded).

Caching initially seems like a good idea, but when combined with fingerprinting, it can also cause a ton of desync issues. Suppose the following scenario:

  1. A user writes a Taskfile.yml with the following contents and merges it to their project's main branch:
    version: "3"
    
    includes:
      docs:
        taskfile: https://raw.githubusercontent.com/some-remote-repo/Taskfile.yml
        checksum: abcdefg
  2. The user then runs task docs and the remote Taskfile is downloaded and stored in .task/abcdefg/.
  3. The remote Taskfile is updated by some 3rd party.
  4. The user will continue to be able to run task docs without any issues, because the checksum in the Taskfile.yml matches the checksum of the cached file.
  5. However, any new user that now clones the project, or does not have the cached file, will not be able to run task docs because the checksum in the Taskfile.yml does not match the checksum of the new remote file. This will remain broken until a PR is merged into the project that updates the checksum in the Taskfile.yml.

I think that making users regularly update a hash every time a remote file is updated is going to get extremely tedious. Removing the checksum also doesn't solve the issue, because we then have the problem of cache invalidation (i.e. how do we know when to update the cached file?). For this reason, I think the best solution is to download the remote file on every run. This might not be efficient, but I think its better than having to constantly maintain a checksum. I also think that by using a remote file, a user is accepting that they will need an internet connection to run their tasks.

  1. Security. On the previous proposals, some users suggested that we should allow (require?) the user to inform the sha256 checksum of the file, so execution won't happen if it doesn't match.

Security, on the other hand, is super important. However, my suggestion would be to draw inspiration from SSH. When you first SSH into a server, you are prompted to verify the fingerprint of the server. If you accept the fingerprint, it is stored in your known_hosts file. If the fingerprint changes, you are warned that the fingerprint has changed and that you should verify the fingerprint before proceeding.

I have implemented something similar in my last commit(s) where the first time a remote file is downloaded, the user is prompted to verify the fingerprint. If the user accepts the fingerprint, it is stored in the cache and the user is not prompted again until the fingerprint changes. This has the advantage that it requires zero maintenance from the user, while still providing a reasonable level of security and some degree of familiarity to command line users. Very interested to hear your thoughts.

@sivy
Copy link

sivy commented May 31, 2023

As someone looking into Task for the second time, this is looking very promising. A couple of notes:

  • I don't see any tests for the new features
  • I'd like to see examples of the new syntax added to the docs as part of the PR
  • Remote files using http is a great first step; but I suspect 75% - 90% of folks wanting to use this will want a git solution. I would consider not worrying about caching or checksum-ing files downloaded over http, and then let git be the natural next step, where you can pin dependencies using features built into git.

@lorengordon
Copy link

Would there be interest in using a library that supports retrieving files from many different types of remote hosts? I've used a number of Hashicorp products over the years, and they've built up a library they all seem to rely on for this kind of thing, hashicorp/go-getter ...

@sivy
Copy link

sivy commented May 31, 2023

@lorengordon this makes a huge amount of sense to me and address my third point above.

@pd93
Copy link
Member Author

pd93 commented May 31, 2023

  • I don't see any tests for the new features
  • I'd like to see examples of the new syntax added to the docs as part of the PR

@sivy This PR is still in a proposal phase. Tests and docs will be written once we have agreed on a design as there is not much point in writing them up if things are going to potentially change.

Remote files using http is a great first step; but I suspect 75% - 90% of folks wanting to use this will want a git solution

This is indeed a first step and by no means the only type of remote file we will support. This PR is more about the making the implementation extensible than it is about the actual functionality. More functionality can be easily added later if the design is well thought out. The nice thing about the approach in this PR is that new implementations of the Node interface should be very easy to add in the future.

I would consider not worrying about caching or checksum-ing files downloaded over http

It is super important that we are not enabling/encouraging our users to run remote scripts on their machines that they have not inspected and/or are unaware of what they do. Making sure that we have a sane approach to security is far more important to me than getting an MVP out. This is why this topic is being discussed now rather than later.

Edit:

where you can pin dependencies using features built into git

This seems like a great idea for the Git implementation and might mean that we don't need to do checksums for pinned commits. However, we need to be careful what reference types we allow as branch names and tags could still change.

@sivy
Copy link

sivy commented May 31, 2023

@sivy This PR is still in a proposal phase. Tests and docs will be written once we have agreed on a design as there is not much point in writing them up if things are going to potentially change.

Fair point! Really glad someone is looking into this.

This is indeed a first step and by no means the only type of remote file we will support. This PR is more about the making the implementation extensible than it is about the actual functionality. More functionality can be easily added later if the design is well thought out. The nice thing about the approach in this PR is that new implementations of the Node interface should be very easy to add in the future.

I understand this is a "proposal" PR; perhaps this PR should be restricted to the Node refactor, and a followup PR (possibly using go-getter, see @lorengordon 's comments) could bring in support for actual remote Nodes/Taskfiles?

It is super important that we are not enabling/encouraging our users to run remote scripts on their machines that they have not inspected and/or are unaware of what they do. Making sure that we have a sane approach to security is far more important to me than getting an MVP out. This is why this topic is being discussed now rather than later.

Agreed, another reason I like go-getter - it supports checksums out of the box for most if not all URL types.

@pd93
Copy link
Member Author

pd93 commented May 31, 2023

Would there be interest in using a library that supports retrieving files from many different types of remote hosts?

Agreed, another reason I like go-getter - it supports checksums out of the box for most if not all URL types.

@sivy @lorengordon Definitely worth looking into this. It looks like they've done a really good job of making a very flexible getter. It could remove of the heavy lifting from Task. I'll definitely have a look into whether this is a viable option over the current custom implementation. Thanks for raising it.

@pd93 pd93 added state: wip A work in progress. type: proposal A ticket that proposes a new feat/enhancement. labels May 31, 2023
@caphrim007
Copy link

I'd +1 go-getter not because I have much familiarity with how it is used from within code, but how it's been useful from the tool-user/operator perspective.

Some of the features that are exceptionally useful to me include things like this (from terragrunt)

terraform {
  source = "git::https://gitlab.foo.com/org1/repo1.git//my-tf-module/wrappers?ref=v3.16.2"
}

or something like this from helmfile

I can't stress enough how much of a killer feature this is in terms of allowing me as an operator/tool-user or a member of a team that provides tooling to other engineering staff to "just" be able to cherry pick content from numerous different data sources without putting a feature request in for "can you please add the X data source to your product".

In our environments, i'm able to offer remote libraries to my team members without them needing to install anything. It feels, to them, like its all local even though "magic" is happening via go-getter.

If task supported such a feature-set, I could remove a lot of documentation that otherwise goes un-read and improve the burden of support I carry by "just" having users bump a tag version as needed.

Looking forward to any progress this PR makes!

@ThomasSanson
Copy link

ThomasSanson commented Jul 16, 2023

Hello @pd93 and @andreynering,

I am writing to propose an enhancement, inspired by the 'include' feature utilised by GitLab (https://docs.gitlab.com/ee/ci/yaml/includes.html). This proposal is a continuation of the discussion initiated by @shadiramadan regarding the use of Kustomize (#770 (comment))

The feature I am suggesting would allow us to incorporate a versioning system, amalgamating the options remote, local, template, and particularly project

The primary advantage of this proposal is its ability to facilitate the incremental progression of our project's evolution. For instance, we could initially create a pull request for the 'remote' component, followed by subsequent requests for other components. This approach would allow for a more systematic and manageable development process

In the final iteration of Project A, we could implement an 'include' with a remote URL, as illustrated below :

---
include:
  - remote: https://{github.com,gitlab.com,etc}/project_ref/-/raw/{tag}/.config/taskfiles/main.yml

This would enable us to freeze the version via the tag, providing a stable reference point

Moreover, in the 'main.yml' file, we could incorporate 'includes' with 'project', 'ref', and 'file', as demonstrated below :

 include:
  - project: project_ref
    ref: {tag}
    file: .config/taskfiles/ansible/Taskfile.yml
  - project: project_ref
    ref: {tag}
    file: .config/taskfiles/ci/Taskfile.yml
...

@campbel
Copy link

campbel commented Jul 29, 2023

Hey folks, this feature is extremely exciting for me as I think there is a lot of potential in using task for a platform akin to GitHub actions where sharing components is simple and easy.

I've got a working prototype of a design pattern that enables this today utilizing go-getter and a couple of infrastructure tasks that can be seen here https://github.com/campbel/task-lib-example. This prototype fetches dependencies as a "pre-flight" task, and then recursively calls back into task with the desired task. The end result is dynamically fetching dependencies and then executing the tasks on a single call from the end user, e.g. task run -- setup.

I'm curious what folks think about this and if the pattern could help inform design decisions on the native implementation in task.

@Gowiem
Copy link

Gowiem commented Jul 30, 2023

@campbel just chiming in to say that I like your idea -- there is something there and I would love to see something like a "pre-flight" setup task implemented natively. That way we could avoid executions like task run -- <TASK NAME> since that will cause downstream negative impacts if other tasks want to use CLI_ARGS.

@andreynering
Copy link
Member

@pd93 Some thoughts.

Firstly, I found a small bug:

open /Users/andrey/Developer/andrey/task/tmp/.task/vRma_iPJM55JsckfKEXuArHXXHF4Gs_5xPGRgceeWao=: no such file or directory

We need to call os.MkdirAll to ensure the .task directory is created. Also, I think this should be in a subdirectory like .task/remote.


I'm assuming this feature will take a bit longer than usual to be ready, still requiring a couple of iterations, considering it has some decisions to be taken here. I'm willing to make it happen, though. 🙂

We should probably list the decisions we need to make here:

  • Should we use go-getter or or just allow basic HTTP for now?

No strong opinion, I think. As some said, referencing Git would be really interesting. The fact is, though, that assuming public repositories, that's already possible. I think maybe we should release it with basic HTTP support first and evaluate the decision to use go-getter later. It shouldn't be a breaking change, given the prefix https:// is common between both.

  • How to handle cache, versioning and security?

That's probably the trickiest part to decide. I think having a way to vendor and run stuff offline would be interesting. Some users may want to avoid the scenario of a Taskfile being removed remotely or changed unexpectedly. That would also help with debugging, in case the user wants to do a quick edit on that Taskfile locally. On CI, people may want to avoid downloading and running stuff from remote as well, for example.

That's why I think getting some inspiration from Go could be interesting. task --download would fetch remote Taskfiles and place them on, say, .task/remote/{url_sha1}.yml. Users can optionally decide to commit this directory if they want to ensure to have a copy of remotes. Running tasks would actually read from the local copy.

I think the option above is good enough as I wouldn't expect remotes to change that often. We could have a flag to automatically download before running. (If we thought downloading automatically would be desirable we could have an --offline flag instead, so users could at least force that if they want).

think that making users regularly update a hash every time a remote file is updated is going to get extremely tedious. Removing the checksum also doesn't solve the issue, because we then have the problem of cache invalidation (i.e. how do we know when to update the cached file?)

I think to solve this it will help if we assume the URL content to be immutable. We can explain this in the documentation, and even have a warning if the user uses github.com/blob/{branch}/{file} instead of github.com/blob/{commit}/{file}. If we can assume the content won't change, we just need to detect new URLs that weren't downloaded yet and ask the user to run task --download. We won't need to worry about the content potentially changing.

If we wanted the security of avoiding remote files to change, we could store the checksums in .task/sum.yml and show an error if that changes, but this may be a bit of over engineering, just an idea for now. I think we could consider this feature an experiment and then we'll have the flexibility to do changes along the way.

Again, nothing about this is written in stone or a final plan. It's more like a brain dump and I want to get some opinions on this (also from users!). Sorry if I'm making things a bit more complex. I'm open to change opinions or find a middle ground here if needed.

@pd93 pd93 mentioned this pull request Aug 29, 2023
@pd93 pd93 changed the base branch from main to node-refactor August 29, 2023 10:05
@pd93 pd93 mentioned this pull request Aug 29, 2023
15 tasks
@pd93
Copy link
Member Author

pd93 commented Aug 29, 2023

@andreynering Thanks for taking the time to look at this!

I found a small bug

Good catch, I've pushed a fix

I'm assuming this feature will take a bit longer than usual to be ready, still requiring a couple of iterations

If we want to be able to release something to test it, gather community feedback and potentially roll it back if it doesn't work then this sounds like a good candidate for an experiment then IMO. This takes the pressure off us to make the perfect implementation first time.

We should probably list the decisions we need to make here

If we're going to make this an experiment, then we need an issue to track it. I've created #1317 to do this and closed #770. I'll create a decision log over there since this might take multiple PRs.

Go-Getter vs Native implementation

I think maybe we should release it with basic HTTP support first and evaluate the decision to use go-getter later.

I agree with this. I've come back to this PR a couple of times since my last comment and both times I have run into issues with go-getter not working how I want it to. I can go into the specifics of this down the line if we decide to look at it again, but at a very high level:

It makes a bunch of decisions about how to handle certain things (like caching and file IO). This means we have less control over our implementation. Perhaps the go-getter way is the best way? but I actually find it less usable than some of the other suggestions in this thread. It's a nice "out-the-box" solution, but maybe we want something a bit more customised to Task and perhaps more lightweight too. A custom solution isn't actually that hard of a thing for us to create.

Vendoring

I think having a way to vendor and run stuff offline would be interesting

I agree that vendoring is a useful feature and a --download flag makes a lot of sense. It would still be my preference that everything runs remotely unless a local file is detected. An --offline flag could be used to stop it ever trying to download files and this would error if the local files are not found.

Security

I think to solve this it will help if we assume the URL content to be immutable. We can explain this in the documentation, and even have a warning if the user uses github.com/blob/{branch}/{file} instead of github.com/blob/{commit}/{file}.

I don't think this is a good idea. While it is pretty much guaranteed that github.com/blob/{commit}/{file} will not change, we cannot assume that our users will use an immutable URL like this and it's not feasible to keep a list of all possible mutable/immutable URLs in order to produce warnings. For example, would we support VCS other than GitHub? (GitLab/Bitbucket etc). What about S3/GCS buckets etc? What about (god forbid) a Dropbox URL? Users do weird things sometimes and I would rather always warn a user when downloading a remote file if it has changed.

If we wanted the security of avoiding remote files to change, we could store the checksums in .task/sum.yml and show an error if that changes

This is essentially what this PR currently does. The first time you run a remote Taskfile, it will ask if you trust it. If you say yes, it will store the checksum. The next time it will run without a prompt unless it has changed. In which case, it will prompt you to say that it has changed and if you accept, the checksum will be updated.

I think that this in combination with the vendoring changes you mentioned would be a great first implementation. If you agree then I can see about adding these changes and we can get something out there for people to try.

@btpemercier
Copy link

Hi @pd93 ,

To manage dependencies and versioning with git, you can do it like Terraform does with modules, it's easy to use

module "my_module" {
  source = "git@github.com:my-orga/ma-shared-repo.git//my-folder/my-file?ref=mytagOrBranch"
}

@ryancurrah
Copy link

ryancurrah commented Aug 30, 2023

I feel like a lot of people would be happy or looking for a solution that uses git. Here's a rough draft of my thoughts...

Enhanced Task Importing Feature for go-task using go-git

Overview:

The feature enhancement for go-task allows users to seamlessly import and execute tasks from external Git repositories. This includes both public and private repositories with a streamlined authentication process based on the URL type and environment variables.

Configuration:

Within the primary Taskfile.yml, an imports section has been added:

imports:
  - url: https://github.com/user/repo.git
    ref: branch_or_tag_or_commit
    files:
      - path_to_Taskfile.yml
    auth:
      https_username: your_username
      https_password_env: ENV_VARIABLE_NAME

Implementation:

1. URL Type Determination:

To decide the authentication type based on the provided URL:

var authType string
if strings.HasPrefix(url, "https://") {
    authType = "https"
} else if strings.HasPrefix(url, "git@") || strings.HasPrefix(url, "ssh://") {
    authType = "ssh"
}

2. Handling Authentication:

Based on the URL type, derive the necessary authentication credentials:

For HTTPS:

var httpsAuth *http.BasicAuth
if authType == "https" {
    password := os.Getenv(https_password_env)  // Fetch password from the environment variable
    httpsAuth = &http.BasicAuth{
        Username: https_username,
        Password: password,
    }
}

For SSH use the ssh-agent:

import (
	"http://github.com/go-git/go-git/v5"
	"http://github.com/go-git/go-git/v5/plumbing/transport/ssh"
	"os"
)
// ...
sshAuth, err := ssh.NewSSHAgentAuth("git")
if err != nil {
    log.Fatalf("Failed to create ssh auth: %v", err)
}

3. Cloning with go-git:

Once you have the authentication details, you can proceed with cloning the repository:

cloneOpts := &git.CloneOptions{
    URL:      url,
    Progress: os.Stdout,
    ReferenceName: plumbing.ReferenceName(ref), // Default to `main`
}
if authType == "https" {
    cloneOpts.Auth = httpsAuth
} else if authType == "ssh" {
    cloneOpts.Auth = sshAuth
}
_, err := git.PlainClone(directory, false, cloneOpts)
if err != nil {
    if err == git.ErrRepositoryAlreadyExists {
        // The repository has been previously cloned; pull the updates
        repo, err := git.PlainOpen(directory)
        if err != nil {
            log.Fatalf("Failed to open the repository: %v", err)
        }
        worktree, err := repo.Worktree()
        if err != nil {
            log.Fatalf("Failed to get worktree: %v", err)
        }
        pullOpts := &git.PullOptions{
            RemoteName: "origin",
            Auth:       cloneOpts.Auth, // Reuse the auth from cloneOpts
        }
        if err := worktree.Pull(pullOpts); err != nil && err != git.NoErrAlreadyUpToDate {
            log.Fatalf("Failed to pull updates: %v", err)
        }
    } else {
        log.Fatalf("Failed to clone repository: %v", err)
    }
}

4. Caching:

Cache the cloned repositories in a dedicated directory:

import (
    "crypto/sha256"
    "encoding/hex"
)
// ...
cacheDir := path.Join(os.Getenv("HOME"), ".go-task/cache/")
hash := sha256.Sum256([]byte(url + ref + file))
cacheKey := hex.EncodeToString(hash[:])
clonedDir := path.Join(cacheDir, cacheKey)
if _, err := os.Stat(clonedDir); os.IsNotExist(err) {
    _, err := git.PlainClone(clonedDir, false, cloneOpts)
    if err != nil {
        log.Fatalf("Failed to clone and cache repository: %v", err)
    }
}

5. Task Integration:

Post cloning, locate the Taskfile and integrate its tasks:

import (
    "os"
    "path/filepath"
    "io"
)
for _, taskFile := range files {
    sourcePath := filepath.Join(clonedDir, taskFile)
    destinationPath := filepath.Join(currentDir, taskFile) // assuming you want the tasks in the current directory
    sourceFile, err := os.Open(sourcePath)
    if err != nil {
        log.Fatalf("Failed to open source file: %v", err)
    }
    defer sourceFile.Close()
    destinationFile, err := os.Create(destinationPath)
    if err != nil {
        log.Fatalf("Failed to create destination file: %v", err)
    }
    defer destinationFile.Close()
    _, err = io.Copy(destinationFile, sourceFile)
    if err != nil {
        log.Fatalf("Failed to copy task file: %v", err)
    }
}
// The integration mechanism will depend on `go-task`'s internal structures.

Benefits of Using go-git:

Pure Go Implementation: Since go-git is entirely written in Go, there's no need to have Git installed on the system. This makes the deployment and setup process easier and more streamlined.
Flexibility: go-git offers a high level of customization. Its modular design allows for specialized usage, custom storage backends, and even alternative protocols.
Performance: With native Go implementations, go-git can be optimized for performance and resource usage tailored for specific use cases.
Extensibility: The library has a comprehensive set of functionalities, and developers can easily extend it with additional features if required.
Cross-platform Compatibility: Being written in Go, go-git can run on multiple platforms without modification, offering consistent behavior.
Embedded Use: For applications that want to embed Git functionalities without invoking external processes or relying on the presence of Git, go-git is an ideal choice.


Conclusion:

Incorporating go-git into go-task not only streamlines the task importing feature but also leverages the robust, flexible, and performance-oriented benefits of a pure Go Git implementation. This fosters an environment of modular task sharing across projects and teams, emphasizing simplicity, security, and flexibility. The native Go advantage reduces external dependencies and ensures consistent, reliable behavior across diverse platforms.

@caphrim007
Copy link

thought I'd add my feedback here because @andreynering was soliciting opinions from users.

My use cases are, first, for remote git repositories that I would be looking to pull content from, and second for web-server-like http resources. I realize that there is some conflation there because some git offerings speak http. For my use cases in particular, it would be git+ssh:// and (to a lesser extent, but non-zoero) git+https:// (lingo that i've seen used in helmfile and the hashi tools).

I understand the attraction of go-getter, as I've used it in various situations, but if it's a non-starter and the team is looking for concrete use cases, consider the two that I've mentioned above.

The two niceties that the go-getter URLs have provided, and which I would enjoy seeing in any solutions for task is the ability to pull git tagged content, and the ability to specify subdirectories.

In practice I think it looks like this

# tags/refs
github.com/hashicorp/go-getter?ref=abcd1234

# subdirs; note the //
https://github.com/hashicorp/go-getter.git//testdata

I mention the first because all of my work centers around semantic-versioned git tags. My human colleagues intuit them quicker than they do the SHA equivalent (even considering that my team and I understand that tags are mutable).

I mention the second only because its a creature-comfort when I want to have a monorepo instead of managing the overhead of many repos.

Hope this adds some flavor to the discussion. Thanks for all the effort that has gone into the existing tool.

@pd93
Copy link
Member Author

pd93 commented Aug 30, 2023

@ryancurrah, @caphrim007 thanks for the feedback. Your thoughts are always appreciated.

To those who want the remote Git/SSH feature, I want to be very clear. Task will support this at some point. However, I think it's reasonable that we initially focus on a single implementation and as such, this PR will only add support for HTTP/HTTPS URLs. Focussing on a single (and simple) implementation means that we can iterate quickly if changes to the interface are needed. Getting things like caching, vendoring and security right are my top priority.

Now that we have marked this as an experiment, we are able to release it without having to worry about changing the design and so we can make changes later if things aren't quite right.

I know that a lot of you have been extremely patient waiting for this feature, but it's a significant change and we want to get it right, so please don't expect this rollout to be fast! As with all experiments, the use of this in production while it's an experiment is strongly discouraged as things may change/break between releases.

I'll be pushing some more changes later today and will update again once I do. Stay tuned :)

@ryancurrah
Copy link

ryancurrah commented Aug 30, 2023

That's fair my only real concern is the fact there will be no caching of the downloaded task files due to the fact developers won't be able to work offline in that case. Having to provide a different command line argument for saving task files for working offline may be hard to discover/remember thus a bad developer experience. I really feel in order for this feature to be somewhat useful for local development you should consider a implementing caching strategy. Maybe not in this PR but in a follow-up soon.

Base automatically changed from node-refactor to main September 2, 2023 20:24
Copy link
Member

@andreynering andreynering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work so far! 👏

I did some non-extensive testing here, but I think we're closing to ship as a first iteration. 🚀 🙂

Let me know if there's something in specific that you want an opinion, either behavior or code.

Ask my review on the PR once you think it's ready to ship, for a final check.

taskfile/read/cache.go Outdated Show resolved Hide resolved
cmd/task/task.go Show resolved Hide resolved
taskfile/read/node.go Outdated Show resolved Hide resolved
@pd93
Copy link
Member Author

pd93 commented Sep 4, 2023

Let me know if there's something in specific that you want an opinion, either behavior or code.

@andreynering I think I'm pretty happy with this as a first iteration. The only thing I'm not sure about is what we do when a file has been downloaded. At the moment, we use the local copy by default. This means that there is no way to run the remote copy without clearing out the .task/remote cache (which we might want to add a command for later). It also means that the --offline flag doesn't really do very much. The only time it has an effect is when you try to run a remote file offline and it doesn't exist in your cache (causing an error).

Maybe a nicer solution is to always run the remote file by default and then allow users to prefer the local file when they specify the --offline flag. I can see users being divided on this, so perhaps an environment variable setting or a new field under includes to dictate a preference on local/remote first might help? Any and all ideas welcome.


I don't think this decision needs to block this PR though. It's getting pretty big, so it would be nice to get it merged and let the community start kicking the tyres!

@pd93 pd93 marked this pull request as ready for review September 4, 2023 11:14
@c-ameron
Copy link

c-ameron commented Sep 4, 2023

Thank you so much for working on this!!

Maybe a nicer solution is to always run the remote file by default and then allow users to prefer the local file when they specify the --offline flag. I can see users being divided on this, so perhaps an environment variable setting or a new field under includes to dictate a preference on local/remote first might help? Any and all ideas welcome.

I would agree with that. In my use case, I'm wanting to share a taskfile that would be used by multiple developers in different teams, however sharing the same commands. I think by default it should be remote, always sourcing the latest version specified. The --offline flag would be used to only use the local cached copy (or maybe even a --no-download flag?).

Thank you again!

Copy link
Member

@andreynering andreynering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great for this iteration! With the exception of one small bug (mentioned below) we're ready to ship.

Maybe a nicer solution is to always run the remote file by default and then allow users to prefer the local file when they specify the --offline flag. I can see users being divided on this, so perhaps an environment variable setting or a new field under includes to dictate a preference on local/remote first might help? Any and all ideas welcome.

We can do change on the next iterations, but I think that a setting as a global key in the root Taskfile (and maybe an ENV as an alternative) will be needed, no matter what we decide as the default behavior, as users will want different behaviors.

version: '3'

remote_mode: online/offline

tasks:
  ...

(Ugly key name just as an example. We need to find a better one).

taskfile/read/cache.go Outdated Show resolved Hide resolved
@pd93 pd93 merged commit 22ce67c into main Sep 12, 2023
11 checks passed
@pd93 pd93 deleted the remote-taskfiles branch September 12, 2023 21:42
@pd93
Copy link
Member Author

pd93 commented Sep 12, 2023

Big thank you to everyone for all the comments. This will be available to try out in the next release!

A casual reminder that this is an experiment and so you will need to set the TASK_X_REMOTE_TASKFILES environment variable to 1 in order to access the functionality. This is very much a first version and there are likely to be changes to the feature in future releases. For this reason, it is not recommended that you use this in production.

For more information on experiments, check out the docs.

If you're trying out the feature and want to give feedback, please go to the Remote Taskfiles Experiment Issue and leave a comment there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: wip A work in progress. type: proposal A ticket that proposes a new feat/enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Feature: A Compromise Solution For Reusing common Taskfiles In Different Projects