Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition away from Java serialization for storing state on disk or in Zookeeper #625

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

mrflip
Copy link
Collaborator

@mrflip mrflip commented Jul 13, 2013

This is all @hausdorff's work -- it addresses #419 and #525, and supersedes #497

I've rebased his commits into the current master, and validated that the tests pass.

Hopefully this helps grease the path for the PR to go in...

hausdorff and others added 13 commits July 20, 2013 23:07
config and nimbus currently use the core Java serialization to store
StormTopology instances. This commit will change this to use Thrift
serialization instead. StormTopology is a Thrift struct so this
basically involves finding all the places we call `Utils/serialize` and
`Utils/deserialize` and replace them with a call to a method that
serializes with Thrift instead.
config.clj and nimbus serialize Storm configuration using the default
Java implementation. This commit will phase out Java default
serialization in favor of Clojure default serialization. There are
obviously many reasons to phase out Java serialization, but the main
rationale is that Java serialization will complain about version
mismatch even if the change is semantically backward compatible.
TSerializer is not threadsafe. In `Utils` we instantiate a static final
Tserializer, but this can (and will) cause odd bugs if we start calling
`serialize()` in different threads. Thus, every time we call
`Utils/serializeTopology`, we create a new TSerializer.

Another way to do this would be to lock it, which performance may or may
not merit.
Serialization of configuration is handled in config, but it is not
different from a generic method for Clojure form serialization. This
commit will move this method to utils so that we can use it for other
things, like serialization in cluster.
cluster.clj uses the stock Java serialization implementation. There are
obviously many reason to not use standard Java serialization, but our
main motivation is that Java will complain about serialized state
when the versions don't match even if they're semantically backwards
compatible.
Supervisor currently just uses stock jvm serialization to communicate
LocalState. This is undesirable for many reasons, so this commit
will introduce a serialization interface which makes code cleaner,
letting us specify which type of serializer to use without populating
Supervisor with unnecessary boxing/unboxing behavior, or LocalState
with too much knowledge about what's happening in the Supervisor.

Also this commit will introduce a basic implementation sketch for a
serializer for LocalState (though it will just use the jvm serialization
at this point).
To build the serialization interface in Java, we need to put these
constants in a Java file. This commit will put them in Constants.java
@ptgoetz
Copy link
Collaborator

ptgoetz commented Sep 25, 2013

@mrflip @hausdorff
Have you smoke tested this in a clustered environment?

If so I'm +1.

@@ -230,7 +230,7 @@
(assignment-info [this storm-id callback]
(when callback
(swap! assignment-info-callback assoc storm-id callback))
(maybe-deserialize (get-data cluster-state (assignment-path storm-id) (not-nil? callback)))
(maybe-deserialize-clj-bytes (get-data cluster-state (assignment-path storm-id) (not-nil? callback)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we renaming these functions? If we stick the original func name, we will have less code to worry about. No?

@hausdorff
Copy link

Hey, cool -- good to see this finally getting resolved.

I'll be happy to take a look at this in a couple hours, after I get off work.

@ptgoetz
Copy link
Collaborator

ptgoetz commented Oct 25, 2013

@hausdorff @mrflip
Any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants