Skip to content

Latest commit

 

History

History
107 lines (65 loc) · 9.9 KB

File metadata and controls

107 lines (65 loc) · 9.9 KB

Distributed Event Processing with Raspberry Pis

This repository holds the source code for one of the layers in a 2-tier architecture for distriuted event processing in an IoT system. The infstrastructure for the project is intended for capturing and analyzing sensor data in real time between smartwatches, on the one hand, and Raspeby Pis, on the other. The current repository focuses on the latter.

1.png

Design

The conceptual design for the Raspberry Pi layer comprises 3 services:

  • native event generator that sends out events detected by the sensors on a Raspberry Pi
  • HTTP-server that listens for events generated both locally and elsewhere
  • event processing application for enabling SQL-like querying of saved events

2.jpeg

Event Distribution

A few words have to be said about the idea behind event data processing in a distributed system and how it was handled in our use case. Although our system does not comprise dozens of devices, we still strived to minimize the processing load on each individual device and prevent network flooding. The undesirable scenario would necessarily include every device sending out every event occurring locally, which would inevitably lead to a lot of noise, data redundancy and memory shortage.

For this reason, every event type in our system is categorized as either "rare" or "frequent" and based on this specification, the "rare" events are always sent to the other devices while the "frequent" events are processed locally. This binary distinction of events into "rare" and "frequent" is a rough estimation and is based on whether data is generated by environmentally oriented sensors (e.g.orientation and angular velocity, humidity, pressure, etc.) or by a user (e.g. pressing a button, switching a toggle, producing a sound, etc.).

For example, in the current system, the "frequently" generated events such as pressure, humidity and accelerometer_<id> are never sent from the device of origin. Instead, they are stored and processed locally. Whereas, the "rarely" occuring events, such as joystick and sound are always exchanged between the devices.

Events coming in on the device pi_3: 6.jpeg

Events coming in on the device pi_4: 7.jpeg

Such a design also presupposed the concept of distributed query placement: the target query is broken down into subqueries that are evaluated "on edge" as much as possible. This allows a scenario, for example, where rare events are matched against a subquery and, when a match occurs, the events that led to this match are sent to other devices for further evaluation.

Implementation

The obvious solution is a contenerized application (see section ... below on the problems) which ensures stable and consistent performance across the devices with different system environments.

The backbone of the architecture is, therefore, a multi-service Docker Compose stack. In this case, it's two services: (1) a service to read sensors and send events and (2) a service with an HTTP server streaming incoming events to the event processing application. The HTTP server listens to both the events generated locally, as well as those generated on other devices.

The Docker images for the sensor reader and the http-event-processor are pre-built and readily available in the public repository on Docker Hub. The images basically provide the required environments and the dependencies, while the application itself is mounted to containers on launch as a volume. This setup offers substantial advantages for development and quick reconfiguration: any changes in the application code and/or config files does not require rebuilding the images, which takes an unacceptably long time given the Rapberry Pi computing resources and the armv7 architecture.

The system can be configured in config.yaml by specifying the IP addresses of the devices to enable communication between them, time intervals for sending data, event types for each devices, etc.

Another important aspect to the Docker Compose file is that it uses the previleged mode for the services in order grant launched containers root capabilities to all devices on the host system. This is required for allowing the containers to gain access to the sensors.

Lastly, both Docker images are built specifically for armv7 as the target architecture as it is used by our Raspberry Pi 4 devices. This is crucial as most of the required dependencies are not shipped as pre-built binaries for this architecture and, thus, have to be compiled from sratch. This results in extremely long build times and was the main reason behind the decision for uploading pre-built Docker images restricted to just setting the environment for the application instead of baking the application source code into the images, as well.

Sensor Reading Service

Uses nothing more than the sense-hat Python library for controling the Raspberry Pi Sense HAT, as well as the GPIO library for reading data from the General-Purpose Input/Output pins. The events are then sent to specified HTTP endpoints.

HTTP Server with Event Processing App

This service comprised the logic of hosting an HTTP server that passes data directly to the event processing application. FastAPI framework for Python serves as the basis for the HTTP server whereas Siddhi was the choice for event processing. To integrate the FastAPI component written in Python with Siddhi, implemented in Java, we opted for the Python wrapper for Siddhi. PUT A FOOTNOTE HERE

While Siddhi can be integrated with external data stores such as RDBMS, MongoDB, etc., the default behavior is in-memory data storage, which fits our use case. The matching query has to be specified before launching the application but, having said that, it is possible to list multiple matching queries (and subqueries) to select from for each individual device.

Another feature of Siddhi is a variety of different sources that can be used as a producer of events fed to the stream. Among them are HTTP, Kafka, TCP, etc. While the functionality of Siddhi allows for integrating our HTTP listener with the stream, we chose to decouple them for more control over the integration and event pre-processing before it enters the stream.

Installation Guide

  1. Before you start, make sure you have all the devices connected to one network, otherwise they will not be able to communicate. In order to connect a Raspberry Pi to a local network, modify its network configuration file /etc/wpa_supplicant/wpa_supplicant.conf by simply appending a new block to it:
network={
    ssid="network_name"
    psk="network_password"
}

The system might requied you to use the previleged mode for modying this file. Reboot the device afterwards.

  1. Clone the repository with the source code:
git clone https://github.com/Kristina-Pianykh/esp32_event_processing.git
  1. Run the installation script for Docker:
./install_docker.sh
  1. Specify the IP addresses for the connected devices in the project configuration file config.yaml. You can use `Angry Scanner, a free and open-source network scanner, to look up the addresses of all the devices connected to your local network.

  2. Launch Docker Compose and specify the device ID pi_id as an argument:

sudo DEVICE_ID=<pi_id> docker compose up

The services are set up with logging. It is essential that the applications are launched on all Respberry Pi devices for the service to start sending data. Otherwise, it will be stuck in an infinite loop of waiting for the devices specified in the config to start first.

5.jpeg

Problems

  1. The most obvious issue is the armv7 architecture on the Raspberry Pi 4 devices that were available to us. Most (if not all) libraries and packages needed to set up the environment for our application are not available as pre-built binaries for this type of architecture and, for this reason, all of the subdependencies, required for a certain library, have to be built and/or compiled from scratch. That not only led to enormous image build times, but also called for additional installation of required compilers.
  2. The choice of an event processing engine was made too lightly. While it was initially agreed on testing out Siddhi and its original Java implementation might have played out nicer, its Python wrapper is extremely outdated (last update dated 2019) and pooorly packaged (no dependency management in the project, no version tracking). Our current implementation features the FastAPI HTTP server (in Python) tightly coupled with the Siddhi application and in the presence of the Python wrapper for Siddhi, itwas decided to go with Python. To get PySiddhi to work 4 years since its last update with no version tracking of the library's subdependancies, a lot of work had to be done to set up the environment and resolve multiple version conflicts (see the Dockerfile for the respective service). An alternative (and perhaps a better) solution would have been decoupling the HTTP server and the Siddhi application and have them both in Java. Another option would be pivoting to an alternative engine (e.g. Kafka or Arroyo).