Spark Connector InfluxDB Data Source

A library for writing and reading data from InfluxDB using Spark SQL Streaming (or Structured streaming).

Linking

Install package By Maven

mvn clean install -Dmaven.test.skip=true

If you need to deploy private Nexus Repository:

mvn clean deploy -Dmaven.test.skip=true

Import POM to your project:

<dependency>
    <groupId>tech.odes</groupId>
    <artifactId>spark-connector-influxdb</artifactId>
    <version>{{site.SPARK_VERSION}}</version>
</dependency>

Unlike using --jars, using --packages ensures that this library and its dependencies will be added to the classpath.

The --packages argument can also be used with bin/spark-submit.

This library is only for Scala 2.12.x, so users should replace the proper Scala version in the commands listed above.

Examples

Read

SQL Stream can be created with data streams received through InfluxDB using:

spark.readStream
  .format(classOf[FluxSourceProvider].getName)
  .option("host", "localhost")
  .option("port", "8086")
  .option("user", "influxdb")
  .option("password", "influxdb")
  .option("org", "org")
  .option("bucket", "test_bucket")
  .option("measurement", "sensor")
  .load()

You can refer to FluxStreamSourceApplication.scala.

【TIPS】

（1）The source uses the Minimum Time Slice Algorithm to extract data from InfluxDB through incremental reads.

（2）The read data currently only supports raw record based on the StringType string situation of Line Protocol.

Write

SQL Stream may be also transferred into InfluxDB using:

spark.writeStream
  .format(classOf[FluxSinkProvider].getName)
  .outputMode(OutputMode.Append())
  .option("host", "localhost")
  .option("port", "8086")
  .option("user", "influxdb")
  .option("password", "influxdb")
  .option("org", "org")
  .option("bucket", "test_bucket")
  .option("checkpointLocation", "/tmp")
  .start()
  .awaitTermination()

You can refer to FluxStreamSinkApplication.scala.

【TIPS】

（1）The Sink only supports data sources based on line protocols.

（2）OutputMode mode only supports append mode.

Configuration

Parameter Name	Description	Default Value	Read	Write
host	【Require】InfluxDB Server host	localhost	✅	✅
port	【Require】InfluxDB Server post	8086	✅	✅
user	【Require】InfluxDB Server user		✅	✅
password	【Require】InfluxDB Server password		✅	✅
token	InfluxDB Server access token		✅	✅
org	【Require】InfluxDB organization		✅	✅
bucket	【Require】InfluxDB bucket		✅	✅
measurement	【Require】InfluxDB measurement		✅
delta-time	Incremental read from influxdb by minimum time slice (Unit: ms)	1000	✅
time-zone	A time-zone ID, such as Europe/Paris.	Asia/Shanghai	✅
batchSize	Specify how many pieces of data to write as a batch.	1000		✅
numPartitions	Number of partitions to write.	None		✅

Name	Name	Last commit message	Last commit date
Latest commit AirToSupply Merge pull request #20 from AirToSupply/dev Aug 12, 2024 172c4d6 · Aug 12, 2024 History 41 Commits
examples/src/main/scala/org/apache/spark/sql/example	examples/src/main/scala/org/apache/spark/sql/example	[refactor]: refresh example	Aug 12, 2024
src/main	src/main	[refactor]: support access token	Aug 8, 2024
.gitignore	.gitignore	Update .gitignore	Mar 27, 2023
LICENSE	LICENSE	Initial commit	Mar 27, 2023
README.md	README.md	[refactor]: support access token	Aug 8, 2024
pom.xml	pom.xml	[refactor]: branch mini version for +1	Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Connector InfluxDB Data Source

Linking

Examples

Read

Write

Configuration

About

Releases 2

Packages

Languages

License

AirToSupply/spark-connector-influxdb

Folders and files

Latest commit

History

Repository files navigation

Spark Connector InfluxDB Data Source

Linking

Examples

Read

Write

Configuration

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages