[BUG] Use in Scala 2.12 Spark project #347

borgoat · 2023-10-24T10:01:41Z

What is the bug?

I have a project configured with sbt, Scala 2.12, and Spark 3.3 (marked as Provided, to run on AWS Glue). I need this connector to write into a AOS cluster. However, when configuring it as a dependency, I get a conflict with the Spark lib.

This is my build.sbt:

ThisBuild / organization := "com.yeekatee"
ThisBuild / scalaVersion := "2.12.18"

name := "shared-feature-glue"
version := "0.1.0"
idePackagePrefix := Some("com.yeekatee.analytics.spark")

val sparkMinor = "3.3"
val sparkVersion = "3.3.2"
val opensearchHadoopVersion = "1.0.1"
val glueVersion = "4.0.0"
val icebergVersion = "1.3.0"
val sparkTestingBaseVersion = "1.4.3"

resolvers += "AWS Glue ETL" at "https://aws-glue-etl-artifacts.s3.amazonaws.com/release"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion % Provided,
  "org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
  "org.apache.spark" %% "spark-mllib" % sparkVersion % Provided,
  "org.apache.spark" %% "spark-streaming" % sparkVersion % Provided,
  "com.amazonaws" % "AWSGlueETL" % glueVersion % Provided,
  "org.apache.iceberg" %% s"iceberg-spark-runtime-$sparkMinor" % icebergVersion,
  "org.scalatest" %% "scalatest" % "3.2.16" % Test,
  "com.holdenkarau" %% "spark-testing-base" % s"${sparkVersion}_${sparkTestingBaseVersion}" % Test,


  "org.opensearch.client" % "opensearch-hadoop" % opensearchHadoopVersion
)

And I'm getting the following error:

[error] Modules were resolved with conflicting cross-version suffixes in ProjectRef(uri("file:/Users/borgoat/Workspace/gitlab.com/yeekatee/yeekatee/"), "sharedFeatureGlue"):
[error]    org.apache.spark:spark-streaming _2.12, _2.11
[error]    org.apache.spark:spark-sql _2.12, _2.11
[error]    org.apache.spark:spark-core _2.12, _2.11
[error]    org.apache.spark:spark-yarn _2.11, _2.12
[error] stack trace is suppressed; run 'last sharedFeatureGlue / update' for the full output
[error] stack trace is suppressed; run 'last sharedFeatureGlue / ssExtractDependencies' for the full output
[error] (sharedFeatureGlue / update) Conflicting cross-version suffixes in: org.apache.spark:spark-streaming, org.apache.spark:spark-sql, org.apache.spark:spark-core, org.apache.spark:spark-yarn
[error] (sharedFeatureGlue / ssExtractDependencies) Conflicting cross-version suffixes in: org.apache.spark:spark-streaming, org.apache.spark:spark-sql, org.apache.spark:spark-core, org.apache.spark:spark-yarn
[error] Total time: 4 s, completed 24 Oct 2023, 11:54:15
[info] shutting down sbt server

I changed the import for the connector as follows to avoid bringing in the conflicting transitive dependency:

libraryDependencies ++= Seq(
   // [...]
  "org.opensearch.client" % "opensearch-hadoop" % opensearchHadoopVersion excludeAll (
    ExclusionRule("org.apache.spark")
  )
)

How can one reproduce the bug?

Configure a Scala 2.12 sbt importing at least 1 org.apache.spark dependency
Add opensearch-hadoop as dependency

What is the expected behavior?

I believe I should be able to import the library in a Scala 2.12 project, it is marked as compatible in the docs¹.

What is your host/environment?

macOS 14.0, Amazon Corretto 17.0.7, sbt 1.9.0, Scala 2.12.18

Do you have any additional context?

Maybe I'm doing something completely wrong... not sure.

https://github.com/opensearch-project/opensearch-hadoop/blob/main/COMPATIBILITY.md ↩

The text was updated successfully, but these errors were encountered:

Xtansia · 2023-10-30T22:52:54Z

Hi @borgoat, I'm new to Hadoop & Spark space so forgive me if I'm misunderstanding, but I believe you want the specific "org.opensearch.client" % "opensearch-spark-30_2.12" % "1.0.1", dependency rather than the general -hadoop one for your use case

borgoat · 2023-10-30T23:45:28Z

Thanks for the heads up @Xtansia! You're right, this configuration works properly:

libraryDependencies ++= Seq(
  "org.opensearch.client" %% "opensearch-spark-30" % "1.0.1"
)

Somehow I completely missed the existence of that package - I'll maybe submit a PR with the recommended configuration for Spark.. Thanks!

Xtansia · 2023-10-31T01:43:54Z

@borgoat That'd be awesome if you could PR a change to the docs!

borgoat added bug Something isn't working untriaged labels Oct 24, 2023

Xtansia removed the untriaged label Oct 30, 2023

borgoat closed this as completed Oct 30, 2023

Xtansia mentioned this issue Oct 31, 2023

[BUG] OpenSearch-Hadoop 1.0.1 RELEASE artifacts are not compatible with Spark 3 #304

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Use in Scala 2.12 Spark project #347

[BUG] Use in Scala 2.12 Spark project #347

borgoat commented Oct 24, 2023

Xtansia commented Oct 30, 2023 •

edited

Loading

borgoat commented Oct 30, 2023

Xtansia commented Oct 31, 2023

[BUG] Use in Scala 2.12 Spark project #347

[BUG] Use in Scala 2.12 Spark project #347

Comments

borgoat commented Oct 24, 2023

What is the bug?

How can one reproduce the bug?

What is the expected behavior?

What is your host/environment?

Do you have any additional context?

Footnotes

Xtansia commented Oct 30, 2023 • edited Loading

borgoat commented Oct 30, 2023

Xtansia commented Oct 31, 2023

Xtansia commented Oct 30, 2023 •

edited

Loading