-
-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: remote taskfiles (HTTP) #1152
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good initial proposal.
To be complete, we'll need to address topics like:
- Caching. Fetching the file on every run is probably not a good idea. The right place to store it is probably on
.task/something/
. (Storing there has the benefit of allowing the user to see what exactly was downloaded). - Security. On the previous proposals, some users suggested that we should allow (require?) the user to inform the sha256 checksum of the file, so execution won't happen if it doesn't match.
Perhaps we should take some inspiration from go
itself and have a task --download
flag (analog to go download
) that will download the remote Taskfiles? If they weren't downloaded, Task won't run and will output an error. If the checksum changed, it means the file was changed and task --download
will need to be ran again. This would prevent stuff from download or run with the user explicit permission.
As an alternative, we could store checksums on a .task-sum.yml
file on --download
which would download new files, but without that flag the execution would error if the checksum doesn't match. (Inspired by go.sum
).
I may be overthinking. All this is just some thoughts dumping so we can discuss how to handle this, but we may end up doing something different...
This extra complexity is part of why this was delayed a couple of times in the past. 🙂
taskfile/read/node_http.go
Outdated
resp, err := http.Get(node.URL.String()) | ||
if err != nil { | ||
return nil, errors.TaskfileNotFoundError{URI: node.URL.String()} | ||
} | ||
defer resp.Body.Close() | ||
|
||
if resp.StatusCode != http.StatusOK { | ||
return nil, errors.TaskfileNotFoundError{URI: node.URL.String()} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps consider adding more information to the error? (err.Error()
or the status code).
Otherwise, if the request fails, the user will have a hard time debugging it.
9907896
to
6d6b119
Compare
@andreynering Thanks for the comments. I've given this some thought and done a bit of investigating.
Caching initially seems like a good idea, but when combined with fingerprinting, it can also cause a ton of desync issues. Suppose the following scenario:
I think that making users regularly update a hash every time a remote file is updated is going to get extremely tedious. Removing the checksum also doesn't solve the issue, because we then have the problem of cache invalidation (i.e. how do we know when to update the cached file?). For this reason, I think the best solution is to download the remote file on every run. This might not be efficient, but I think its better than having to constantly maintain a checksum. I also think that by using a remote file, a user is accepting that they will need an internet connection to run their tasks.
Security, on the other hand, is super important. However, my suggestion would be to draw inspiration from SSH. When you first SSH into a server, you are prompted to verify the fingerprint of the server. If you accept the fingerprint, it is stored in your I have implemented something similar in my last commit(s) where the first time a remote file is downloaded, the user is prompted to verify the fingerprint. If the user accepts the fingerprint, it is stored in the cache and the user is not prompted again until the fingerprint changes. This has the advantage that it requires zero maintenance from the user, while still providing a reasonable level of security and some degree of familiarity to command line users. Very interested to hear your thoughts. |
As someone looking into Task for the second time, this is looking very promising. A couple of notes:
|
Would there be interest in using a library that supports retrieving files from many different types of remote hosts? I've used a number of Hashicorp products over the years, and they've built up a library they all seem to rely on for this kind of thing, hashicorp/go-getter ... |
@lorengordon this makes a huge amount of sense to me and address my third point above. |
@sivy This PR is still in a proposal phase. Tests and docs will be written once we have agreed on a design as there is not much point in writing them up if things are going to potentially change.
This is indeed a first step and by no means the only type of remote file we will support. This PR is more about the making the implementation extensible than it is about the actual functionality. More functionality can be easily added later if the design is well thought out. The nice thing about the approach in this PR is that new implementations of the
It is super important that we are not enabling/encouraging our users to run remote scripts on their machines that they have not inspected and/or are unaware of what they do. Making sure that we have a sane approach to security is far more important to me than getting an MVP out. This is why this topic is being discussed now rather than later. Edit:
This seems like a great idea for the Git implementation and might mean that we don't need to do checksums for pinned commits. However, we need to be careful what reference types we allow as branch names and tags could still change. |
Fair point! Really glad someone is looking into this.
I understand this is a "proposal" PR; perhaps this PR should be restricted to the Node refactor, and a followup PR (possibly using
Agreed, another reason I like |
@sivy @lorengordon Definitely worth looking into this. It looks like they've done a really good job of making a very flexible getter. It could remove of the heavy lifting from Task. I'll definitely have a look into whether this is a viable option over the current custom implementation. Thanks for raising it. |
I'd +1 go-getter not because I have much familiarity with how it is used from within code, but how it's been useful from the tool-user/operator perspective. Some of the features that are exceptionally useful to me include things like this (from terragrunt)
or something like this from helmfile I can't stress enough how much of a killer feature this is in terms of allowing me as an operator/tool-user or a member of a team that provides tooling to other engineering staff to "just" be able to cherry pick content from numerous different data sources without putting a feature request in for "can you please add the X data source to your product". In our environments, i'm able to offer remote libraries to my team members without them needing to install anything. It feels, to them, like its all local even though "magic" is happening via go-getter. If Looking forward to any progress this PR makes! |
Hello @pd93 and @andreynering, I am writing to propose an enhancement, inspired by the 'include' feature utilised by GitLab (https://docs.gitlab.com/ee/ci/yaml/includes.html). This proposal is a continuation of the discussion initiated by @shadiramadan regarding the use of Kustomize (#770 (comment)) The feature I am suggesting would allow us to incorporate a versioning system, amalgamating the options The primary advantage of this proposal is its ability to facilitate the incremental progression of our project's evolution. For instance, we could initially create a pull request for the 'remote' component, followed by subsequent requests for other components. This approach would allow for a more systematic and manageable development process In the final iteration of Project A, we could implement an 'include' with a remote URL, as illustrated below : ---
include:
- remote: https://{github.com,gitlab.com,etc}/project_ref/-/raw/{tag}/.config/taskfiles/main.yml This would enable us to freeze the version via the tag, providing a stable reference point Moreover, in the 'main.yml' file, we could incorporate 'includes' with 'project', 'ref', and 'file', as demonstrated below : include:
- project: project_ref
ref: {tag}
file: .config/taskfiles/ansible/Taskfile.yml
- project: project_ref
ref: {tag}
file: .config/taskfiles/ci/Taskfile.yml
... |
Hey folks, this feature is extremely exciting for me as I think there is a lot of potential in using task for a platform akin to GitHub actions where sharing components is simple and easy. I've got a working prototype of a design pattern that enables this today utilizing go-getter and a couple of infrastructure tasks that can be seen here https://github.com/campbel/task-lib-example. This prototype fetches dependencies as a "pre-flight" task, and then recursively calls back into task with the desired task. The end result is dynamically fetching dependencies and then executing the tasks on a single call from the end user, e.g. I'm curious what folks think about this and if the pattern could help inform design decisions on the native implementation in task. |
@campbel just chiming in to say that I like your idea -- there is something there and I would love to see something like a "pre-flight" setup task implemented natively. That way we could avoid executions like |
@pd93 Some thoughts. Firstly, I found a small bug: open /Users/andrey/Developer/andrey/task/tmp/.task/vRma_iPJM55JsckfKEXuArHXXHF4Gs_5xPGRgceeWao=: no such file or directory We need to call os.MkdirAll to ensure the I'm assuming this feature will take a bit longer than usual to be ready, still requiring a couple of iterations, considering it has some decisions to be taken here. I'm willing to make it happen, though. 🙂 We should probably list the decisions we need to make here:
No strong opinion, I think. As some said, referencing Git would be really interesting. The fact is, though, that assuming public repositories, that's already possible. I think maybe we should release it with basic HTTP support first and evaluate the decision to use go-getter later. It shouldn't be a breaking change, given the prefix
That's probably the trickiest part to decide. I think having a way to vendor and run stuff offline would be interesting. Some users may want to avoid the scenario of a Taskfile being removed remotely or changed unexpectedly. That would also help with debugging, in case the user wants to do a quick edit on that Taskfile locally. On CI, people may want to avoid downloading and running stuff from remote as well, for example. That's why I think getting some inspiration from Go could be interesting. I think the option above is good enough as I wouldn't expect remotes to change that often. We could have a flag to automatically download before running. (If we thought downloading automatically would be desirable we could have an
I think to solve this it will help if we assume the URL content to be immutable. We can explain this in the documentation, and even have a warning if the user uses If we wanted the security of avoiding remote files to change, we could store the checksums in Again, nothing about this is written in stone or a final plan. It's more like a brain dump and I want to get some opinions on this (also from users!). Sorry if I'm making things a bit more complex. I'm open to change opinions or find a middle ground here if needed. |
@andreynering Thanks for taking the time to look at this!
Good catch, I've pushed a fix
If we want to be able to release something to test it, gather community feedback and potentially roll it back if it doesn't work then this sounds like a good candidate for an experiment then IMO. This takes the pressure off us to make the perfect implementation first time.
If we're going to make this an experiment, then we need an issue to track it. I've created #1317 to do this and closed #770. I'll create a decision log over there since this might take multiple PRs. Go-Getter vs Native implementation
I agree with this. I've come back to this PR a couple of times since my last comment and both times I have run into issues with It makes a bunch of decisions about how to handle certain things (like caching and file IO). This means we have less control over our implementation. Perhaps the Vendoring
I agree that vendoring is a useful feature and a Security
I don't think this is a good idea. While it is pretty much guaranteed that
This is essentially what this PR currently does. The first time you run a remote Taskfile, it will ask if you trust it. If you say yes, it will store the checksum. The next time it will run without a prompt unless it has changed. In which case, it will prompt you to say that it has changed and if you accept, the checksum will be updated. I think that this in combination with the vendoring changes you mentioned would be a great first implementation. If you agree then I can see about adding these changes and we can get something out there for people to try. |
Hi @pd93 , To manage dependencies and versioning with git, you can do it like Terraform does with modules, it's easy to use module "my_module" {
source = "git@github.com:my-orga/ma-shared-repo.git//my-folder/my-file?ref=mytagOrBranch"
} |
I feel like a lot of people would be happy or looking for a solution that uses git. Here's a rough draft of my thoughts... Enhanced Task Importing Feature for
|
thought I'd add my feedback here because @andreynering was soliciting opinions from users. My use cases are, first, for remote git repositories that I would be looking to pull content from, and second for web-server-like http resources. I realize that there is some conflation there because some git offerings speak http. For my use cases in particular, it would be I understand the attraction of go-getter, as I've used it in various situations, but if it's a non-starter and the team is looking for concrete use cases, consider the two that I've mentioned above. The two niceties that the go-getter URLs have provided, and which I would enjoy seeing in any solutions for In practice I think it looks like this
I mention the first because all of my work centers around semantic-versioned git tags. My human colleagues intuit them quicker than they do the SHA equivalent (even considering that my team and I understand that tags are mutable). I mention the second only because its a creature-comfort when I want to have a monorepo instead of managing the overhead of many repos. Hope this adds some flavor to the discussion. Thanks for all the effort that has gone into the existing tool. |
@ryancurrah, @caphrim007 thanks for the feedback. Your thoughts are always appreciated. To those who want the remote Git/SSH feature, I want to be very clear. Task will support this at some point. However, I think it's reasonable that we initially focus on a single implementation and as such, this PR will only add support for HTTP/HTTPS URLs. Focussing on a single (and simple) implementation means that we can iterate quickly if changes to the interface are needed. Getting things like caching, vendoring and security right are my top priority. Now that we have marked this as an experiment, we are able to release it without having to worry about changing the design and so we can make changes later if things aren't quite right. I know that a lot of you have been extremely patient waiting for this feature, but it's a significant change and we want to get it right, so please don't expect this rollout to be fast! As with all experiments, the use of this in production while it's an experiment is strongly discouraged as things may change/break between releases. I'll be pushing some more changes later today and will update again once I do. Stay tuned :) |
That's fair my only real concern is the fact there will be no caching of the downloaded task files due to the fact developers won't be able to work offline in that case. Having to provide a different command line argument for saving task files for working offline may be hard to discover/remember thus a bad developer experience. I really feel in order for this feature to be somewhat useful for local development you should consider a implementing caching strategy. Maybe not in this PR but in a follow-up soon. |
2bac808
to
41a716b
Compare
41a716b
to
6ca7bfc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work so far! 👏
I did some non-extensive testing here, but I think we're closing to ship as a first iteration. 🚀 🙂
Let me know if there's something in specific that you want an opinion, either behavior or code.
Ask my review on the PR once you think it's ready to ship, for a final check.
@andreynering I think I'm pretty happy with this as a first iteration. The only thing I'm not sure about is what we do when a file has been downloaded. At the moment, we use the local copy by default. This means that there is no way to run the remote copy without clearing out the Maybe a nicer solution is to always run the remote file by default and then allow users to prefer the local file when they specify the I don't think this decision needs to block this PR though. It's getting pretty big, so it would be nice to get it merged and let the community start kicking the tyres! |
Thank you so much for working on this!!
I would agree with that. In my use case, I'm wanting to share a taskfile that would be used by multiple developers in different teams, however sharing the same commands. I think by default it should be remote, always sourcing the latest version specified. The Thank you again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great for this iteration! With the exception of one small bug (mentioned below) we're ready to ship.
Maybe a nicer solution is to always run the remote file by default and then allow users to prefer the local file when they specify the --offline flag. I can see users being divided on this, so perhaps an environment variable setting or a new field under includes to dictate a preference on local/remote first might help? Any and all ideas welcome.
We can do change on the next iterations, but I think that a setting as a global key in the root Taskfile (and maybe an ENV as an alternative) will be needed, no matter what we decide as the default behavior, as users will want different behaviors.
version: '3'
remote_mode: online/offline
tasks:
...
(Ugly key name just as an example. We need to find a better one).
Big thank you to everyone for all the comments. This will be available to try out in the next release! A casual reminder that this is an experiment and so you will need to set the For more information on experiments, check out the docs. If you're trying out the feature and want to give feedback, please go to the Remote Taskfiles Experiment Issue and leave a comment there. |
#770 is now the most requested feature in Task by quite some margin and there have been frequent requests/questions around it in recent weeks. Previous requests have been rejected based on complexity, but I believe that the way this draft is implemented might actually reduce the Task reader complexity as it abstracts the logic for dealing with files away from the recursive reader.
This PR is intended to open a dialogue about how this feature might look if it is added to Task. Please note that this is first draft and is open to significant changes or rejection entirely.
The PR
The first commit only contains changes to the underlying code to allow it to be more extensible via a new
Node
interface. This interface allows for multiple implementations of the Taskfile reader. This change also reimplements the existing filesystem based reader as aFileNode
.The second commit adds a very crude example of how a new
HTTPNode
might look. I've made a (temporary) change in this commit to the root Taskfile which changes the docsinclude
to usehttps://
(it just points to the same file on GitHub). You should be able to runtask --list
and see no difference as it will now download the remote file into memory and run exactly the same way as if the file was local!Discussion points
myscheme://
to the start of the Taskfile URI when including a file. Is this flexible enough for everyone's needs? It should be easy enough to addssh://
orgit://
schemes etc. in the future.