Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support tailscale configuration via Caddy configuration #30

Merged
merged 10 commits into from
May 2, 2024

Conversation

clly
Copy link
Contributor

@clly clly commented Feb 12, 2024

As described in #24, this PR creates a new Caddy module registering a global configuration
option and a module fulfilling the required interface. This is an initial implementation using a
global caddy.UsagePool to share the configuration between the application parsed from the
caddyfile and the standalone listener functions. This will also fallback to the environment if the
configuration wasn't parsed from the Caddyfile.

In the future this can support ephemeral servers by marking the servers as ephemeral based
on the Caddy config and then stopping them in the Stop function.

@mholt
Copy link
Contributor

mholt commented Mar 6, 2024

I haven't had a chance to scrutinize it, but at a glance this looks like a good start 👍

@clly
Copy link
Contributor Author

clly commented Apr 2, 2024

Hey @willnorris @mholt, just curious if you had, had a chance to look at this PR yet.

If preferred, I could start stacking new PR's against this one and incorporate feedback into them as necessary.

@willnorris
Copy link
Member

I did start reviewing this back when you initially sent it, but never got to the point that I sent the comments 😞

I recall that some of my initial thoughts were whether we needed to use a UsagePool for app (since I think it's always a singleton, right?). I'm also a little worried about the complexity of configuring options at two different levels... the default values and then the per-server values. Even though it would be a little more verbose to always require config at the server level, I wonder if that would be easier to reason about what exactly is happening?

(I'm sure I had some other thoughts as well, but these are the ones that come to mind)

And thank you by the way for doing this... I absolutely love this approach, and am honestly really excited to have this in, even if my lack of attention says otherwise!

@mholt
Copy link
Contributor

mholt commented Apr 3, 2024

Oof, sorry, it's still in my notifications inbox and I promise I'll get around to it but I've been hustling for a new Caddy release -- and I'm traveling this week 😬

I'll try to get to this soon! 😃 I am also excited about it. Leaving it unread in my inbox so I don't forget.

@clly
Copy link
Contributor Author

clly commented Apr 6, 2024

@willnorris from what I remember I thought the UsagePool would be helpful for caddy reloads (although there's a lot of assumptions I'm making in that process).

Assuming caddy does partial tear down and setup where a server doesn't get torn down and re-setup on reload, we wouldn't want to tear down a listener if it's used in multiple HTTP servers. The UsagePool would handle all of that for us because it handles the reference tracking and only calls Destruct on the TSServer when the last reference is deleted.

It's not implemented as part of this PR in an effort to keep it small, but I think we'd wrap the Close function in the listeners we currently return to Caddy with a call to Delete the reference from the UsagePool. On the last Delete call, the UsagePool should call Destruct on TSServer and shut it down.

Even though it would be a little more verbose to always require config at the server level, I wonder if that would be easier to reason about what exactly is happening?

Potentially. Personally, I prefer more flexible configuration, less typing in configuration, but I'm not against making it all explicit. The environment variables are still supported in this PR which already has the two tier configuration functionality. My suggestion would be to remove the environment variable support and bump to the next major version to indicate the breaking change.

How would you like to proceed?

module.go Outdated
@@ -136,6 +132,26 @@ func getServer(_, addr string) (*tsnetServerDestructor, error) {
return s.(*tsnetServerDestructor), nil
}

func getAuthKey(host string) string {
hostAuthKey, loaded := app.LoadOrStore(host, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you wouldn't just pass the auth key in here? You have 2 LordOrStore operations which my intuition tells me breaks the atomicity...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the global key is semi-static I should be able to fetch it outside this function and use it if the host specific key doesn't exist.

My assumption was the Caddy App setup happens semi-serially and that these would all be already loaded which would mean we should still get back the same values? Or am I misunderstanding the atomicity issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, so the storedAuthKey and hostAuthKey are different values? In that case, it's probably fine. Just make sure to call Delete() in your module's Cleanup() method so that it can free up unused memory.

@clly
Copy link
Contributor Author

clly commented Apr 27, 2024

I recall that some of my initial thoughts were whether we needed to use a UsagePool for app (since I think it's always a singleton, right?). I'm also a little worried about the complexity of configuring options at two different levels... the default values and then the per-server values. Even though it would be a little more verbose to always require config at the server level, I wonder if that would be easier to reason about what exactly is happening?

@willnorris looking at this some more you're right. The apps themselves have no state, just the actual tailscale servers so this should switch to a singleton of some kind. I'm not a fan but it's probably for the best.

clly added 8 commits April 30, 2024 21:38
Register a parseGlobal function to parse the caddy configuration file.
The parse function creates a new TSApp which will hold our configuration
and create the CaddyModule.

Signed-off-by: Connor Kelly <connor.r.kelly@gmail.com>
Signed-off-by: Connor Kelly <connor.r.kelly@gmail.com>
Signed-off-by: Connor Kelly <connor.r.kelly@gmail.com>
Signed-off-by: Connor Kelly <connor.r.kelly@gmail.com>
Initially see if the provisioned server or global key exist in the app
usage pool. If not then fallback to the already existing environment
variables

Signed-off-by: Connor Kelly <connor.r.kelly@gmail.com>
Signed-off-by: Connor Kelly <connor.r.kelly@gmail.com>
Signed-off-by: Connor Kelly <connor.r.kelly@gmail.com>
Everytime we get a listener we increment the refernce on a caddy
UsagePool. With this change, wrap the listener in a custom type that
will decrement the UsagePool when we close the listener. When all
listeners are closed we will call Destruct on the tsnet Server which
will Close the server and should cleanup potential references in the
tailscale control plane

Signed-off-by: Connor Kelly <connor.r.kelly@gmail.com>
@clly clly force-pushed the clly/caddy-config branch from b35ddf8 to 9fc8b14 Compare May 1, 2024 02:38
@clly
Copy link
Contributor Author

clly commented May 1, 2024

@mholt @willnorris I've switched from a UsagePool to just a global which made the references much simpler.

I've also wrapped the Listener's with a small type to ensure we call Delete on the servers UsagePool so that we automatically Close the tsnet server when all Listeners are closed.

I'm happy with the state of this PR and think it's a good jumping off point for more changes in the future. Let me know if you have strong opinions and I'm happy to make them. @willnorris I know that you thought that more verbose but explicit config would be better and if it would make you more comfortable with this change I'm happy to take care of it.

Thanks!

module.go Outdated
}

func (t *tsnetServerListener) Close() error {
fmt.Println("Delete", t.hostname)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this and close the actual listener as well

Copy link
Member

@willnorris willnorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks so much for hanging in there for so long! I've got a few minor comments below, but I've already made them in my local client, so I'll just push them to your branch, then squash and merge. I agree, this is a great new foundation that we can keep building on.

app.go Outdated

func (t *TSApp) Validate() error {
if t.DefaultAuthKey == "" {
return errors.New("auth_key must be set")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to require that a default auth key is set. A user could instead choose to specify only per-server key. That was actually how I initially tested this, and was surprised by the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

}

type TSServer struct {
AuthKey string `json:"auth_key,omitempty" caddy:"namespace=auth_key"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've honestly never used these caddy struct tags before. I guess they're used in doc generation? This seems to be the only one that doesn't have "tailscale." in the namespace. Is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was, I think this is the translation for marshaling the caddy file format or going to/from JSON. It was modeled on the dynamicdns caddy app.

return err
}

_, err := servers.Delete(t.hostname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely confused me at first. It looks like it will destruct the server on the first listener to close. I didn't realize this is really just a counter decrement on the usage pool. Maybe add a comment here that this decrements the usage pool for the server, and will close it only when it reaches zero?

app.go Outdated
}
}

app = t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the global app value shouldn't be overwritten as part of Provisioning, since the new apps are provisioned and started before the old one is stopped. So this should probably be inside of Start(). We should probaby also make app an atomic.Pointer since we'll have multiple TSApps potentially trying to write to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I didn't realize these apps were started and stopped at the same time.

app.go Outdated
}

func (t *TSApp) Stop() error {
app = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make app an atomic.Pointer, then this should probably be app.CompareAndSwap(t, nil) so that it only gets set to nil if it is still pointing to the current t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the apps are started and stopped at the same time would we do this at all?

Could we have a race condition where stop on an old app is called after start on a new one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought. For updating configs this isn't really necessary because the new TSApp's Start will overwrite the value before Stop is called. But if the new config removes the Tailscale configuration, we still want to clean this up. The use of atomic and CompareAndSwap handle race conditions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong but atomic will solve concurrent access but not an ordering error where the calls are not guaranteed to always be in the same order?

Copy link
Member

@willnorris willnorris May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're absolutely right that atomic is focused on handling concurrent access. It's the CompareAndSwap that helps us here. On Start, we always update the value of our singleton to the new value. But on Stop, we only set it to nil if the value is still the old value. So depending on the order they run...

old.Stop() // this will update tsapp to nil
new.Start() // this will update tsapp to new

new.Start() // this will update tsapp to new
old.Stop() // this won't do anything because tsapp no longer points to old, which is what CompareAndSwap checks for

In the case of the new caddy config not having a tailscale config at all, then only old.Stop() gets called, which will set tsapp to nil and free up the memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks for the explanation!

app.go Outdated
}

func (t *TSApp) Provision(ctx caddy.Context) error {
for _, svr := range t.Servers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is doing anything. Ranging over the t.Servers map is returning copies of the servers, so the changes aren't being stored back. You would need to assign the updated svr value back to the t.Servers map. But you don't really need this at all, since you check the default values in the getAuthKey and getEphemeral funcs.

So I think you can just remove the for loop entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this to try to make the usage pool for the app easier before I threw it out and never came back to update this here

Thanks!

Signed-off-by: Will Norris <will@tailscale.com>
@willnorris willnorris merged commit 25b816a into tailscale:main May 2, 2024
1 check failed
@willnorris
Copy link
Member

Thanks for your review help @mholt !

@mholt
Copy link
Contributor

mholt commented May 3, 2024

I feel like I dropped the ball on this one, been very busy with the 2.8 release and such. But you both did a great job!

@unixfox
Copy link

unixfox commented May 5, 2024

I'm late to the party but I guess now we can more easily add the functionality requested in #22 and #18?

Giving the ability to use a hosted login server for tailscale --login-server=<url>: https://tailscale.com/kb/1080/cli#login

@willnorris
Copy link
Member

I'm late to the party but I guess now we can more easily add the functionality requested in #22 and #18?

Yes, that was exactly my thought as well! We now have an obvious place to set these things. We won't add support for any new Tailscale-specific environment variables, since you can just use caddy's built-in support for env vars.

@unixfox
Copy link

unixfox commented May 6, 2024

since you can just use caddy's built-in support for env vars.

Interesting. How does it works? I didn't really understood how to do that.

I can help to improve the documentation if you can direct me into the right direction :)

@mholt
Copy link
Contributor

mholt commented May 6, 2024

Like placeholders? {env.*} where * is the name of the env var.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants