Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent user storage accross containers and container-recreates #35

Open
TimoRoth opened this issue Aug 7, 2020 · 8 comments
Open

Comments

@TimoRoth
Copy link
Contributor

TimoRoth commented Aug 7, 2020

Right now, a users /home/jovyan is stored inside their respective container.
What that means is that if a user wants to update their base image, or use an entirely different one, they're going to lose all their data.

The most obvious solution for this to me is:
For every user, create a "storage-$USERNAME" container from a configured, very basic, nearly empty image, that has /home/jovyan as a volume.
Then create every users actual containers with "--volumes-from storage-$USERNAME".

TimoRoth added a commit to TimoRoth/tljh-repo2docker that referenced this issue Aug 7, 2020
@jtpio
Copy link
Member

jtpio commented Sep 1, 2020

Thanks @TimoRoth.

Yes the tljh-repo2docker plugin doesn't give any storage to the users, and sessions are ephemeral.

Other plugins can enable storage, which is for example the case of the plasma plugin:

https://github.com/plasmabio/plasma/blob/7883a6a1266b69ab49f353bdf9974be408bf0709/tljh-plasma/tljh_plasma/__init__.py#L70-L76

This could also be done by adding a jupyterhub_config.py snippet to TLJH.

Do you think there would be value in having a default storage as part of the tljh-repo2docker plugin?

@TimoRoth
Copy link
Contributor Author

TimoRoth commented Sep 1, 2020

I have come up with this in my custom config to achieve pretty much what I wanted by now:

c.DockerSpawner.mounts.append({
    'source': 'jupyter-storage-{username}',
    'target': '/home/jovyan',
    'type': 'volume',
    'driver_config': {
        'Options': {
            'size': '16G'
        }
    }
})
c.DockerSpawner.extra_host_config.update({
    'storage_opt': {
        'size': '16G'
    }
})

The mounts option depend on a PR to dockerspawner: jupyterhub/dockerspawner#373
The volume size option needs a PR to docker: moby/moby#41330

@jtpio
Copy link
Member

jtpio commented Sep 3, 2020

Thanks @TimoRoth for sharing your solution 👍

Do you think we should keep #36 open?

@TimoRoth
Copy link
Contributor Author

TimoRoth commented Sep 3, 2020

Since this seems like a case a lot of people would want, it might be worth it to document that option somewhere, so someone else does not need to go on a long search for it like I did.
For a more simple setup without size quota, the existing volume code is enough, and all one needs to do is to use a named volume with the {username} in it and mount it at /home or /home/jovyan.

But generally it's possible already, so the issue can probably be closed.

Edit: Just noticed that's my other PR, not this issue. No, that PR is entirely unneeded. I found out after that PR that you can use templates in volumes and mounts, making everything the PR does unnecessary.

@jtpio
Copy link
Member

jtpio commented Oct 9, 2020

Thanks @TimoRoth.

Since this seems like a case a lot of people would want, it might be worth it to document that option somewhere, so someone else does not need to go on a long search for it like I did.

Would you like to open a PR to add that to the README?

@guiwitz
Copy link

guiwitz commented Jun 7, 2021

I used tljh-repo2docker for a course (mainly on git) and first I'd like to say that this plugin for TLJH is terrific! It makes it so easy to set up TLJH with multiple environments and in particular it's the simplest way I found to provide access to RStudio for multiple people. So thanks for your work!

Now to my point on this issue: my course is in two parts over two weeks and I was surprised to see that the changes made last week were ephemeral, hence my tumbling across this issue. In this case it's not a problem and ok to start from scratch, but it might be an issue for others and I guess it should be stated clearly in the README. Also I'd appreciate some explanations in the README on how to provide permanent storage. I don't think it makes sense to make it default because it can also be useful to start with a clean slate every time, but having directions might help! Unfortunately I don't have the required competence to add such information, but maybe by reviving this issue, one of you will provide an update. I could just try some of the info provided above but it would be nice to understand a bit more. Thanks!

@quy-ng
Copy link

quy-ng commented Dec 16, 2021

I have come up with this in my custom config to achieve pretty much what I wanted by now:

c.DockerSpawner.mounts.append({
    'source': 'jupyter-storage-{username}',
    'target': '/home/jovyan',
    'type': 'volume',
    'driver_config': {
        'Options': {
            'size': '16G'
        }
    }
})
c.DockerSpawner.extra_host_config.update({
    'storage_opt': {
        'size': '16G'
    }
})

The mounts option depend on a PR to dockerspawner: jupyterhub/dockerspawner#373 The volume size option needs a PR to docker: moby/moby#41330

can I know where is jupyter-storage-{username} in the machine ?

@TimoRoth
Copy link
Contributor Author

TimoRoth commented Dec 17, 2021

It's a docker volume. So normally, docker will store it somewhere in its data dir, which is normally at /var/lib/docker.
But you're not supposed to access it directly like that, and doing so easily breaks things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants