Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modal deployment does not maintain stable name #151

Open
dmvinson opened this issue Nov 26, 2024 · 7 comments
Open

Modal deployment does not maintain stable name #151

dmvinson opened this issue Nov 26, 2024 · 7 comments

Comments

@dmvinson
Copy link

Hi, I tested out the modal deployment for the golinks server. I noticed that the modal server does not seem to maintain a stable hostname. After a while, I think because Modal restarts the container, the machine reregisters itself as go-1 instead of go for example. This continues to happen.

I know the Modal deployment may not be officially supported, but is there a good way to fix this so I can use Modal hosting?

@willnorris
Copy link
Member

cc @pawalt who added the Modal instructions.

Typically, you would resolve this by storing the Tailscale state directory on some kind of persistent volume, so it doesn't register a new device. I'm not sure if that's possible with Modal? Otherwise, registering as an ephemeral node might help clean up the old registered device more quickly, but I'm not sure how quickly Modal is re-registering nodes, to know if that would work.

@pawalt
Copy link
Contributor

pawalt commented Nov 27, 2024

@willnorris where does this data live? The example uses a persistent volume to persist the data in sqlite, so I'm surprised to hear about this:

@app.cls(
  image=image,
  secrets=[modal.Secret.from_name("golinks")],
  volumes={"/root/.config": vol},
  keep_warm=1,
  concurrency_limit=1,
)
class Golinks:
  @modal.enter()
  def start_golinks(self):
      subprocess.Popen(
          [
              "golink",
              "-verbose",
              "--sqlitedb",
              "/root/.config/golink.db",
          ]
      )

We'll only spin up a new container if the old one crashes, so there shouldn't be a case where two are alive simultaneously.

@dmvinson In your Modal dashboard, you can go to the golinks app and see the containers. Do you see two containers which were alive at the same time? That would explain this behavior. Otherwise I'm not sure what's going on.

@tiesmaster
Copy link

Isn't the location that golink stores its sqlite database, and state in the directory the container starts up in: /home/nonroot. I have deployment in a kubernetes cluster running and mount a PVC in /home/nonroot, and that survives pod kills (I just tested that).

@pawalt The fact that /root/.config works for you, is obviously that you override the location of the sqlite database in your command line args, but otherwise golink would also loose its content.

You might have fs issues, but that also depends on how Modal works, and starts up the container. I believe the container starts as non-root user for me, and I had to set the override the securityContext for the Pod:

      securityContext:
        # This is needed as its a "non-root" image, and otherwise /home/nonroot cannot be written to
        # https://github.com/tailscale/golink#running-in-production
        # https://github.com/tailscale/golink/issues/6
        # https://github.com/tailscale/golink/pull/12
        runAsUser: 65532
        runAsGroup: 65532
        fsGroup: 65532

Good luck!

@pawalt
Copy link
Contributor

pawalt commented Nov 27, 2024

@tiesmaster This is because my example runs as root, and [os.UserConfigDir](https://pkg.go.dev/os#UserConfigDir) is /root/.config for the root user. I think it's more likely that this is a result of two containers being alive simultaneously. Strangely, though, we haven't hit this issue internally.

I also am pretty confident the tailscale auth data is being persisted properly. The key we minted for our golinks deployment isn't reusable, so restarts/reschedules would make the auth fail if we didn't have proper persistence.

@Miladiir
Copy link

As of today, the modal setup instructions in the readme do not work at all. First, you need atleast go 1.23.1 to build golinks. Then, when starting, the container crash loops with:

Terminating task due to error: cannot mount volume on non-empty path: "/root/.config"
Runner failed with exception: task exited with failure, status = exit status: 1

Upon inspection, /root/.config contains go telemetry.

I tried to change the mount and the sqlite path to /data, which works, but then I got the same issue as op, or atleast a similar one. When I stop a machine and deploy a new app, the ts state is apparently not persisted. I guess only the sqlite carries over.

@pawalt
Copy link
Contributor

pawalt commented Jan 6, 2025

@Miladiir should we just lock the golinks version when we go install in these instructions? Since these installation instructions aren't maintained the same way the Dockerfile will be maintained, I'd rather not break the guide when breaking changes are released in the go.mod.

I'm still quite confused as to why the Tailscale data isn't persisting. From the readme:

tailscale data files in the tsnet-golink directory inside os.UserConfigDir

For the root users, os.UserConfigDir, should be $HOME/.config unless $XDG_CONFIG_HOME was somehow populated in the new version of the bookworm Docker image.

@Miladiir
Copy link

Miladiir commented Jan 7, 2025

¯\_(ツ)_/¯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants