These instructions will show you how to run a .NET for Apache Spark app using .NET Core on Ubuntu 18.04.
- Download and install the following: .NET Core 3.1 SDK | OpenJDK 8 | Apache Spark 2.4.1
- Download and install Microsoft.Spark.Worker release:
- Select a Microsoft.Spark.Worker release from .NET for Apache Spark GitHub Releases page and download into your local machine (e.g.,
~/bin/Microsoft.Spark.Worker
). - IMPORTANT Create a new environment variable
DOTNET_WORKER_DIR
and set it to the directory where you downloaded and extracted the Microsoft.Spark.Worker (e.g.,~/bin/Microsoft.Spark.Worker
).
- Select a Microsoft.Spark.Worker release from .NET for Apache Spark GitHub Releases page and download into your local machine (e.g.,
For detailed instructions, you can see Building .NET for Apache Spark from Source on Ubuntu.
- Use the
dotnet
CLI to create a console application.dotnet new console -o HelloSpark
- Install
Microsoft.Spark
Nuget package into the project from the spark nuget.org feed - see Ways to install Nuget Packagecd HelloSpark dotnet add package Microsoft.Spark
- Replace the contents of the
Program.cs
file with the following code:using Microsoft.Spark.Sql; namespace HelloSpark { class Program { static void Main(string[] args) { var spark = SparkSession.Builder().GetOrCreate(); var df = spark.Read().Json("people.json"); df.Show(); } } }
- Use the
dotnet
CLI to build the application:dotnet build
- Open your terminal and navigate into your app folder.
cd <your-app-output-directory>
- Create
people.json
with the following content:{"name":"Michael"} {"name":"Andy", "age":30} {"name":"Justin", "age":19}
- Run your app.
Note: This command assumes you have downloaded Apache Spark and added it to your PATH environment variable to be able to use
spark-submit \ --class org.apache.spark.deploy.dotnet.DotnetRunner \ --master local \ microsoft-spark-2.4.x-<version>.jar \ dotnet HelloSpark.dll
spark-submit
, otherwise, you would have to use the full path (e.g.,~/spark/bin/spark-submit
). For detailed instructions, you can see Building .NET for Apache Spark from Source on Ubuntu. - The output of the application should look similar to the output below:
+----+-------+ | age| name| +----+-------+ |null|Michael| | 30| Andy| | 19| Justin| +----+-------+