Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I make fat jars with scala_binary? #1673

Open
grepwood opened this issue Dec 19, 2024 · 6 comments
Open

How do I make fat jars with scala_binary? #1673

grepwood opened this issue Dec 19, 2024 · 6 comments

Comments

@grepwood
Copy link
Contributor

grepwood commented Dec 19, 2024

Hi everyone,

I'm in the process of migrating an in-house setup that integrates rules_scala, rules_docker, and rules_k8s into a cohesive set of rules for producing Scala-based microservices. As part of this migration, I've updated from rules_docker to rules_oci and moved to a fork of rules_k8s that works with rules_oci. However, I'm now facing a significant challenge with the final piece: creating functional Docker images for the microservices.

The documentation at rules_oci and rules_scala doesn't mention fat jars, which I believe could simplify the process. If I could produce a fat jar, it would mitigate much of the complexity—like managing dependencies for 83 separate jars—and allow me to avoid reimplementing scala_image just to make this work.

Currently, I’ve written a reimplementation of scala_image that generates two tar files:

  1. One with the anorexic jar from scala_binary.
  2. Another with a reduced set of 27 jars that were explicitly declared in the deps parameter of the anorexic JAR.

However, I suspect the remaining 66 jars are transitive dependencies of those 27, which means I'll need to recursively trace dependencies until I’ve accounted for everything. This feels like a daunting task and makes me wonder if there's a more efficient approach.

I’d really appreciate any guidance, suggestions, or examples for addressing this issue. Is there a better way to handle fat jar creation or dependency resolution in this context? Has anyone tackled something similar when using rules_oci with Scala?

Thanks in advance for any help!

@srdo
Copy link

srdo commented Dec 19, 2024

If you just want a fat jar, Bazel tends to call those "deploy jars", I believe scala_binary provides one

"deploy_jar": "%{name}_deploy.jar",

If you have a target like

scala_binary(
  name = "example"
)

you should be able to get the deploy jar by doing

bazel build example_deploy.jar

If you don't want to use a fat jar (they have some minor drawbacks, mainly relating to deduplication and JPMS modules), an alternative is to write an aspect to walk through the deps recursively. This can be used to collect all the jars so you can dump them in a directory in the container.

@grepwood
Copy link
Contributor Author

Thanks a ton @srdo! I've bridged the gap left after there being no scala_image function in rules_oci, but one problem remains. The fat jar includes, but can't initialize the json plugin for log4j:

ERROR Unable to locate plugin type for JsonTemplateLayout
ERROR UNable to locate plugin for JsonTemplateLayout
ERROR Could not create plugin of type class org.apache.logging.log4j.core.appender.ConsoleAppender for element Console: java.lang.NullPointerException java.lang.NullPointerException

This doesn't happen with the slim jar, but it receives absolutely everything in the -cp parameters of java incantation, including things that aren't jar files cause that's how overzealous scala_image is.

Is there something that I'm missing? I've tried asking around and there's as many opinions as people I've asked:

  • I'm missing META-INF/org.apache.logging.log4j.plugins file in the jar, but that doesn't exist in any of the slim jars
  • I'm missing -Dlog4j2.pluginPackages=org.apache.logging.log4j.layout.template.json in the java command
  • I'm missing -Dlog4j2.pluginLocation=... in the java command

Another solution I saw flying around was to force initialization of this class:

import org.apache.logging.log4j.core.layout.JsonTemplateLayout

val dummy = classOf[JsonTemplateLayout]

I'm sorry but I'm not a Scala developer and I'm just utterly lost and confused at this point due to choice fatigue :(

@srdo
Copy link

srdo commented Dec 30, 2024

I can't say for sure, but my guess would be that the fat jar is messing up the log4jplugins.dat file. Each log4j-related jar contains one of those files, and they need to be merged in order to work. Likely the fat jar is just keeping one and discarding the others (this is the "deduplication" drawback to fat jars I mentioned above).

I see someone else had an issue with this here bazelbuild/bazel#22581. So maybe using a rules_java including that change will fix it.

Fat jars have enough downsides that I don't use them. If upgrading rules_java doesn't fix it for you, I'll see if we can share the aspect for collecting jars transitively we use internally. It's a fairly simple bit of code.

@grepwood
Copy link
Contributor Author

grepwood commented Dec 30, 2024

Oh dear... you've got it. There are 2 different META-INF/org/apache/logging/log4j/core/config/plugins/Log4j2Plugins.dat:

  1. 3247 bytes large from JsonTemplateLayout
  2. 21343 bytes large from log4j core - the fat jar uses this

I shall try upgrading rules_java as per your recommendation. It's currently sitting at 5.0.0

@grepwood
Copy link
Contributor Author

The upgrade has not succeeded thus far: bazelbuild/rules_java#256

@srdo-humio
Copy link
Contributor

srdo-humio commented Jan 3, 2025

Here's the rule we use for collecting jars (I misremembered, it's not an aspect)

def _collect_transitive_java_runtime_deps(ctx):
    jar_depsets = []
    for dep in ctx.attr.deps:
        java_info = dep[JavaInfo]
        jar_depsets += [java_info.transitive_runtime_jars]
    all_jars_depset = depset(transitive = jar_depsets)
    jars_seen = []
    for jar in all_jars_depset.to_list():
        jar_name = jar.basename
        for already_seen_jar in jars_seen:
            if already_seen_jar.basename == jar_name:
                fail("Duplicate file name " + already_seen_jar.short_path + " and " + jar.short_path)
        jars_seen.append(jar)
    return [DefaultInfo(files = all_jars_depset)]

collect_transitive_java_runtime_deps = rule(
    implementation = _collect_transitive_java_runtime_deps,
    attrs = {
        "deps": attr.label_list(providers = [JavaInfo]),
    },
)

The way you'd use it is something like

collect_transitive_java_runtime_deps(
    name = "collect-deps",
    deps = ["example"],
)

where example is the scala_binary/scala_library that contains your main class. You can then pass collect-deps to whatever target you have that puts files into your Docker image. As an example, you could create a tar by doing

pkg_tar(
    name = "all-jars",
    srcs = ["collect-deps"],
    package_dir = "/libs",
)

Note that this is a bit quick and dirty, and it has one slightly awkward constraint: You can't have two jars with the same name in your dependencies. This is something you could probably solve if you need to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants