Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Use in Scala 2.12 Spark project #347

Closed
borgoat opened this issue Oct 24, 2023 · 3 comments
Closed

[BUG] Use in Scala 2.12 Spark project #347

borgoat opened this issue Oct 24, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@borgoat
Copy link

borgoat commented Oct 24, 2023

What is the bug?

I have a project configured with sbt, Scala 2.12, and Spark 3.3 (marked as Provided, to run on AWS Glue). I need this connector to write into a AOS cluster. However, when configuring it as a dependency, I get a conflict with the Spark lib.

This is my build.sbt:

ThisBuild / organization := "com.yeekatee"
ThisBuild / scalaVersion := "2.12.18"

name := "shared-feature-glue"
version := "0.1.0"
idePackagePrefix := Some("com.yeekatee.analytics.spark")

val sparkMinor = "3.3"
val sparkVersion = "3.3.2"
val opensearchHadoopVersion = "1.0.1"
val glueVersion = "4.0.0"
val icebergVersion = "1.3.0"
val sparkTestingBaseVersion = "1.4.3"

resolvers += "AWS Glue ETL" at "https://aws-glue-etl-artifacts.s3.amazonaws.com/release"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion % Provided,
  "org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
  "org.apache.spark" %% "spark-mllib" % sparkVersion % Provided,
  "org.apache.spark" %% "spark-streaming" % sparkVersion % Provided,
  "com.amazonaws" % "AWSGlueETL" % glueVersion % Provided,
  "org.apache.iceberg" %% s"iceberg-spark-runtime-$sparkMinor" % icebergVersion,
  "org.scalatest" %% "scalatest" % "3.2.16" % Test,
  "com.holdenkarau" %% "spark-testing-base" % s"${sparkVersion}_${sparkTestingBaseVersion}" % Test,


  "org.opensearch.client" % "opensearch-hadoop" % opensearchHadoopVersion
)

And I'm getting the following error:

[error] Modules were resolved with conflicting cross-version suffixes in ProjectRef(uri("file:/Users/borgoat/Workspace/gitlab.com/yeekatee/yeekatee/"), "sharedFeatureGlue"):
[error]    org.apache.spark:spark-streaming _2.12, _2.11
[error]    org.apache.spark:spark-sql _2.12, _2.11
[error]    org.apache.spark:spark-core _2.12, _2.11
[error]    org.apache.spark:spark-yarn _2.11, _2.12
[error] stack trace is suppressed; run 'last sharedFeatureGlue / update' for the full output
[error] stack trace is suppressed; run 'last sharedFeatureGlue / ssExtractDependencies' for the full output
[error] (sharedFeatureGlue / update) Conflicting cross-version suffixes in: org.apache.spark:spark-streaming, org.apache.spark:spark-sql, org.apache.spark:spark-core, org.apache.spark:spark-yarn
[error] (sharedFeatureGlue / ssExtractDependencies) Conflicting cross-version suffixes in: org.apache.spark:spark-streaming, org.apache.spark:spark-sql, org.apache.spark:spark-core, org.apache.spark:spark-yarn
[error] Total time: 4 s, completed 24 Oct 2023, 11:54:15
[info] shutting down sbt server

I changed the import for the connector as follows to avoid bringing in the conflicting transitive dependency:

libraryDependencies ++= Seq(
   // [...]
  "org.opensearch.client" % "opensearch-hadoop" % opensearchHadoopVersion excludeAll (
    ExclusionRule("org.apache.spark")
  )
)

How can one reproduce the bug?

  • Configure a Scala 2.12 sbt importing at least 1 org.apache.spark dependency
  • Add opensearch-hadoop as dependency

What is the expected behavior?

I believe I should be able to import the library in a Scala 2.12 project, it is marked as compatible in the docs1.

What is your host/environment?

macOS 14.0, Amazon Corretto 17.0.7, sbt 1.9.0, Scala 2.12.18

Do you have any additional context?

Maybe I'm doing something completely wrong... not sure.

Footnotes

  1. https://github.com/opensearch-project/opensearch-hadoop/blob/main/COMPATIBILITY.md

@borgoat borgoat added bug Something isn't working untriaged labels Oct 24, 2023
@Xtansia Xtansia removed the untriaged label Oct 30, 2023
@Xtansia
Copy link
Collaborator

Xtansia commented Oct 30, 2023

Hi @borgoat, I'm new to Hadoop & Spark space so forgive me if I'm misunderstanding, but I believe you want the specific "org.opensearch.client" % "opensearch-spark-30_2.12" % "1.0.1", dependency rather than the general -hadoop one for your use case

@borgoat
Copy link
Author

borgoat commented Oct 30, 2023

Thanks for the heads up @Xtansia! You're right, this configuration works properly:

libraryDependencies ++= Seq(
  "org.opensearch.client" %% "opensearch-spark-30" % "1.0.1"
)

Somehow I completely missed the existence of that package - I'll maybe submit a PR with the recommended configuration for Spark.. Thanks!

@borgoat borgoat closed this as completed Oct 30, 2023
@Xtansia
Copy link
Collaborator

Xtansia commented Oct 31, 2023

@borgoat That'd be awesome if you could PR a change to the docs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants