This is a project that contains a rendering raster process via GeoTrellis
.
The input data is a Kafka
stream, the output - a set of rasters and some json metadata outputs.
The project contains two subprojects: streaming
(the actual streaming application) and
producer
(subproject used for test purposes, that generates test kafka messages).
To run this app in any Spark
mode, be sure that you have a proper installed Spark
client on your machine.
To rise a Kafka
instance it's enough to run make kafka
command (more detailed information about kafka in docker is provided in Kafka in docker section).
Be sure that all neccesary changes were introduced into the application.conf
file.
Makefile is provided to simplify launch and integration tests of the application.
Command | Description |
---|---|
local-spark-demo | Run a spark streaming assembly on a local Spark server |
local-spark-shell | Run a spark shell with included fat jar locally |
build | Build a fat jar to run on Spark |
clean | Clean up targets |
kafka | Run a dockerized kafka, see README.md to know more about it |
kafka-send-messages | Produce demo kafka messages |
sbt-spark-demo | Run a spark streaming application from the SBT shell |
Application settings provided via configuration file in the resources folder (streaming
).
ingest.stream {
# kafka setting
kafka {
threads = 10
topic = "geotrellis-streaming"
otopic = "geotrellis-streaming-output"
application-id = "geotrellis-streaming"
bootstrap-servers = "localhost:9092"
}
# spark streaming settings
spark {
batch-duration = 10 // in seconds
partitions = 10
auto-offset-reset = "latest"
auto-commit = true
publish-to-kafka = true
group-id = "spark-streaming-data"
checkpoint-dir = ""
}
}
# geotrellis gdal VLM settings
vlm {
geotiff.s3 {
allow-global-read: false
region: "us-west-2"
}
gdal.options {
GDAL_DISABLE_READDIR_ON_OPEN = "YES"
CPL_VSIL_CURL_ALLOWED_EXTENSIONS = ".tif"
}
# if true then uses GDALRasterSources, if false GeoTiffRasterSources
source.gdal.enabled = true
}
Application settings provided via configuration file in the resources folder (producer
).
lc8 {
scenes = [
{
name = "LC08_L1TP_139044_20170304_20170316_01_T1" # name of the LC8 scene
band = "1" # band number
count = 2 # number of generated polygons
crs = "EPSG:4326" # desired generated CRS
output-path = "../data/img" # the output path where the result output should be placed after processing
},
{
name = "LC08_L1TP_139045_20170304_20170316_01_T1"
band = "2"
count = 2
crs = "EPSG:4326"
output-path = "../data/img"
},
{
name = "LC08_L1TP_139046_20170304_20170316_01_T1"
band = "2"
count = 2
crs = "EPSG:4326"
output-path = "../data/img"
}
]
}
- For instance we already have
Kafka
running localy on 9092 port. 1.1 If not, it is possible to launch Kafka in docker - Open two projects
producer
andstreaming
in two separate terminal windows run
a streaming application (project streaming
,run
)run
a procuder application (project producer
,run --generate-and-send
)
To summarise:
Terminal №1:
$ make kafka
Terminal №2:
$ cd app; ./sbt
$ project streaming
$ run
or
$ make sbt-spark-demo
Terminal №3:
$ ./sbt
$ project producer
$ run --generate-and-send
or
$ make kafka-send-messages
Extra summary:
# terminal 1
make kafka
# terminal 2
make sbt-spark-demo
# terminal 3
make kafka-send-messages
- For instance we already have
Kafka
running localy on 9092 port. If not, it is possible to launch Kafka in docker - Build a fat assembly jar:
make build
- Launch a
Spark
app:make local-spark-processing
- Post a test kafka message:
make kafka-send-messages
To summarise:
$ make build && make local-spark-processing
$ make kafka-send-messages
- Add into the
/etc/hosts
file the following alias:127.0.0.1 localhost kafka
(definitely a working variant of a Mac OS setup:127.0.0.1 localhost.localdomain localhost kafka
) - Run
make kafka