Skip to content

Conversation

@huikang
Copy link

@huikang huikang commented Aug 19, 2015

Reuse the endpoint of the checkpointed container when restore.
Pass veth pair name to ciur when restore a checkpointed container.

Signed-off-by: Hui Kang hkang.sunysb@gmail.com

Saied Kazemi and others added 13 commits August 17, 2015 08:46
Methods for checkpointing and restoring containers were added to the
native driver.  The LXC driver returns an error message that these
methods are not implemented yet.

Signed-off-by: Saied Kazemi <saied@google.com>

Conflicts:
	daemon/execdriver/native/create.go
	daemon/execdriver/native/driver.go
	daemon/execdriver/native/init.go
Docker-DCO-1.1-Signed-off-by: Ross Boucher <rboucher@gmail.com> (github: boucher)
Support was added to the daemon to use the Checkpoint and Restore methods
of the native exec driver for checkpointing and restoring containers.

Signed-off-by: Saied Kazemi <saied@google.com>

Conflicts:
	api/server/server.go
	daemon/container.go
	daemon/daemon.go
	daemon/networkdriver/bridge/driver.go
	daemon/state.go
	vendor/src/github.com/docker/libnetwork/ipallocator/allocator.go
Restore failed if network resource not released during checkpoint,
e.g., a container with port open with -p

Signed-off-by: Hui Kang <hkang.sunysb@gmail.com>

Conflicts:
	daemon/container.go
Docker-DCO-1.1-Signed-off-by: Ross Boucher <rboucher@gmail.com> (github: boucher)
Add a basic test for checkpoint/restore to the integration tests

Docker-DCO-1.1-Signed-off-by: Ross Boucher <rboucher@gmail.com> (github: boucher)
Docker-DCO-1.1-Signed-off-by: Ross Boucher <rboucher@gmail.com> (github: boucher)
Docker-DCO-1.1-Signed-off-by: Ross Boucher <rboucher@gmail.com> (github: boucher)
Docker-DCO-1.1-Signed-off-by: Ross Boucher <rboucher@gmail.com> (github: boucher)
Docker-DCO-1.1-Signed-off-by: Ross Boucher <rboucher@gmail.com> (github: boucher)
Docker-DCO-1.1-Signed-off-by: Mark Oates fl0yd@me.com (github: fl0yd)
Docker-DCO-1.1-Signed-off-by: Ross Boucher <rboucher@gmail.com> (github: boucher)
Reuse the endpoint of the checkpointed container when restore.
Pass veth pair name to ciur when restore a checkpointed container.

Signed-off-by: Hui Kang <hkang.sunysb@gmail.com>
@huikang
Copy link
Author

huikang commented Aug 19, 2015

@boucher @SaiedKazemi The solution is not perfect, but the restored network can ping "8.8.8.8".

@boucher
Copy link
Owner

boucher commented Aug 19, 2015

Awesome, I'll try it out in a couple of hours and report back.

On Wednesday, August 19, 2015, huikang notifications@github.com wrote:

@boucher https://github.com/boucher @SaiedKazemi
https://github.com/SaiedKazemi The solution is not perfect, but the
restored network can ping "8.8.8.8".


Reply to this email directly or view it on GitHub
#15 (comment).

Sent from Gmail Mobile

@boucher
Copy link
Owner

boucher commented Aug 19, 2015

The good news is that I'm able to use the network after checkpointing and restoring with this patch! The bad news is that we're going to need to split it into 3 separate patches: one for runc, one for libnetwork, and one for docker (plus, we need to wait for the first two patches to be merged, and then upstreamed in docker, in order to use the third).

@huikang
Copy link
Author

huikang commented Aug 19, 2015

@boucher Good to know it works for you. I will split the patch and try pushing them to runc and libnetwork.

@boucher
Copy link
Owner

boucher commented Aug 19, 2015

Excellent. Let me know if I can help.

On Wednesday, August 19, 2015, huikang notifications@github.com wrote:

@boucher https://github.com/boucher Good to know it works for you. I
will split the patch and try pushing them to runc and libnetwork.


Reply to this email directly or view it on GitHub
#15 (comment).

Sent from Gmail Mobile

@klesgidisold
Copy link

Hi guys! I also tested @huikang's version on a couple of my containers.. I was able to C/R successfully a tomcat container on the same host and then I was able to connect to it. I also noticed two problems...

  1. Trying to C/R a mysql container, after restore I get again
ERROR 2003 (HY000): Can't connect to MySQL server on '172.17.0.2' (111)
  1. Trying to Checkpoint tomcat and then restore it to a new container. The restore had no errors, but I couldn't connect to tomcat.

As I noticed, when you are restoring the same container, it keeps its old IP and you can restore successfully, but when you 're restoring to a "fresh" one you've got a new IP. Maybe, this is the problem??..

Thanks

@fl0yd
Copy link

fl0yd commented Aug 20, 2015

@huikang did you forget to include your patch for writing state to disk from #14

?

@huikang
Copy link
Author

huikang commented Aug 20, 2015

@fl0yd I think @boucher merged #14. So I did not duplicate it in this patch.

@fl0yd
Copy link

fl0yd commented Aug 20, 2015

What's interesting is that this has a ReleaseNetwork(true). If you do that, the tcp connection restore on your fixed 1.9 for libnetwork, tcp connectivity is lost. However if you leave it as () then you get a compilation error.

Mark

On Aug 20, 2015, at 11:00 AM, huikang notifications@github.com wrote:

@fl0yd https://github.com/fl0yd I think @boucher https://github.com/boucher merged #14 #14. So I did not duplicate it in this patch.


Reply to this email directly or view it on GitHub #15 (comment).

@huikang
Copy link
Author

huikang commented Aug 20, 2015

@fl0yd
I added a parameter to the function, which I think causes the compilation error.

@fl0yd
Copy link

fl0yd commented Aug 20, 2015

Understood. I'm pointing out that I think the 2 patches are going to need to be modified slightly to not be in conflict due to req'd parameters.

Mark

On Aug 20, 2015, at 11:08 AM, huikang notifications@github.com wrote:

@fl0yd https://github.com/fl0yd
I added a parameter to the function, which I think causes the compilation error.


Reply to this email directly or view it on GitHub #15 (comment).

@huikang
Copy link
Author

huikang commented Aug 20, 2015

@fl0yd Do you think if I rebase this patch on #14 will kind of resolve the conflict as #14 is already merged? Thanks.

@fl0yd
Copy link

fl0yd commented Aug 20, 2015

@klesgidis
I was able to checkpoint a container running sshd with FileLocks and TCPConnections set to true and restore it into a surrogate container on the same while keeping TCP connectivity in place. Maybe you should retry?

@fl0yd
Copy link

fl0yd commented Aug 20, 2015

No, I don't think it will. You modified daemon/container_unix.go
to add the "is_checkpoint bool" to the func (container *Container) ReleaseNetwork(is_checkpoint bool) parameters and an if inside the function to return if the value is true.

presumably you did that for a reason and want to keep that inclusion.

@huikang
Copy link
Author

huikang commented Aug 20, 2015

@boucher Not at all. Please go ahead and submit them. Thanks.

@klesgidisold
Copy link

@fl0yd After I put --allow-tcp flag, mysql container C/R successfully, but still I've got problem restoring to a surrogate container. The command s I use are

$ docker run -d --name tomcat1 tomcat:8.0
$ docker checkpoint --allow-tcp --image-dir ~/Desktop/cr_migrations/tomcat/memory --work-dir ~/Desktop/logs/tomcat tomcat1
$ docker export tomcat1 > ~/Desktop/cr_migrations/tomcat/fs.tar

$ docker import --change "ENV PATH /usr/local/tomcat/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" --change "ENV LANG C.UTF-8" --change "ENV JAVA_VERSION 7u79" --change "ENV JAVA_DEBIAN_VERSION 7u79-2.5.5-1~deb8u1" --change "ENV CATALINA_HOME /usr/local/tomcat" --change "ENV TOMCAT_MAJOR 8" --change "ENV TOMCAT_VERSION 8.0.24" --change "ENV TOMCAT_TGZ_URL https://www.apache.org/dist/tomcat/tomcat-8/v8.0.24/bin/apache-tomcat-8.0.24.tar.gz" --change "CMD [\"catalina.sh\", \"run\"]" --change "EXPOSE 8080" --change "WORKDIR /usr/local/tomcat" - tomcat:kyriakos < ~/Desktop/cr_migrations/tomcat/fs.tar
$ docker create --name tomcat2 tomcat:kyriakos
$ docker restore --force --allow-tcp --image-dir ~/Desktop/cr_migrations/tomcat/memory --work-dir ~/Desktop/logs/tomcat tomcat2

The restore completes but I cannot connect from the browser.. Did you run the similar commands or can you see any stupid mistake? :P Thanks..

@boucher
Copy link
Owner

boucher commented Aug 20, 2015

@klesgidis how are you trying to connect? One (unfortunate) thing to note is that the restore will force the new container's IP address to be the same as the old container. You may need to either go in and change its IP, or do some other magic, to make things actually connectable.

@klesgidisold
Copy link

I am trying to connect with mysql -h container_ip -u root -p. The thing is that if I restore to the same container I get the same ip, but if I restore to a surrogate, the restored container has a new IP..

@boucher
Copy link
Owner

boucher commented Aug 20, 2015

Yes. It's worse though: docker thinks the IP is the new IP, and inside the container it will be set to the old IP. So, what I do is use the new IP and docker exec into the container to set its IP to the new IP.

@klesgidisold
Copy link

Yes.. I tried this solution with the previous version of docker (without the network fix), but it didn't work. I 'll give a try in a couple of hours ( i am not home right now :p) with the new one and I 'll inform you. Thanks, I apreciate your help.. :)

@klesgidisold
Copy link

@boucher , I was able to Restore and to reconnect to a surrogate tomcat and a mysql container using the old IP. I confirm what you said. Docker thinks that the new container has a new IP, but I was still able to connect to the old one.
e.g.

$ docker run -d --name tomcat1 my-tomcat
$ docker inspect -f {{.NetworkSettings.IPAddress}} tomcat1
$ 172.17.0.1

Connecting from browser to 172.17.0.1:8080 successfully

$ docker checkpoint --allow-tcp --image-dir ~/Desktop/cr_migrations/tomcat/memory --work-dir ~/Desktop/logs/tomcat tomcat1
$ docker export tomcat1 > ~/Desktop/cr_migrations/tomcat/fs.tar
$ docker import --change "ENV PATH /usr/local/tomcat/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" --change "ENV LANG C.UTF-8" --change "ENV JAVA_VERSION 7u79" --change "ENV JAVA_DEBIAN_VERSION 7u79-2.5.5-1~deb8u1" --change "ENV CATALINA_HOME /usr/local/tomcat" --change "ENV TOMCAT_MAJOR 8" --change "ENV TOMCAT_VERSION 8.0.24" --change "ENV TOMCAT_TGZ_URL https://www.apache.org/dist/tomcat/tomcat-8/v8.0.24/bin/apache-tomcat-8.0.24.tar.gz" --change "CMD [\"catalina.sh\", \"run\"]" --change "EXPOSE 8080" --change "WORKDIR /usr/local/tomcat" - tomcat:kyriakos < ~/Desktop/cr_migrations/tomcat/fs.tar
$ docker create --name tomcat2 tomcat:kyriakos
$ docker restore --force --allow-tcp --image-dir ~/Desktop/cr_migrations/tomcat/memory --work-dir ~/Desktop/logs/tomcat tomcat2
$ docker inspect -f {{.NetworkSettings.IPAddress}} tomcat2
$ 172.17.0.2

Cannot connect from browser to 172.17.0.2:8080, but I can connect to 172.17.0.1:8080 successfully persisting the old session Id!

@huikang
Copy link
Author

huikang commented Aug 21, 2015

@klesgidis Semantically checkpoint/restore is different than stop/start. The regular stop/start will change the IP, while checkpoint/restore should not. However, the current C/R implementation has the limitation of adding new NAT for a new IP, which actually should not happen.

@huikang
Copy link
Author

huikang commented Sep 7, 2015

Closed this PR because I rebased the changed on the recent cr-combined branch #17

@huikang huikang closed this Sep 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants