-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the NiFi-flows wiki!
In this tutorial, you will get to know Apache NiFi and get to know its concepts and architecture in detail. You'll discover how easy and adaptable it is to build and manage real-time data pipelines.
Apache NiFi is a data streaming system based on stream-based programming concepts. It was developed by the National Security Agency (NSA) and then became an official part of the Apache Project Suite in 2015.
Apache NiFi releases a new update every 6-8 weeks to meet user requirements.
This Apache NiFi guide is for beginners and professionals who want to learn the basics of Apache NiFi. It includes several sections that provide basic knowledge on how to work with NiFi.
Apache NiFi is a robust, scalable and reliable system that is used to process and distribute data. It was created to automate the transfer of data between systems.
NiFi offers a web-based user interface for creating, monitoring and managing data streams. NiFi stands for Niagara Files, which was developed by the National Security Agency (NSA) but is now maintained by the Apache Foundation. Apache NiFi is a web UI framework where we need to define the source, destination and processor to collect, store and transmit data respectively. Each processor in NiFi has relationships that are used when connecting one processor to another. Why do we use Apache NiFi? Apache NiFi is open source; therefore it is freely available on the market. It supports multiple data formats such as social networks, geographic location, logs, etc.
Apache NiFi supports a wide range of protocols such as SFTP, KAFKA, HDFS, etc., which makes this platform more popular in the IT industry. There are so many reasons to choose Apache NiFi. They are next.
Apache NiFi helps organizations integrate NiFi into their existing infrastructure. This allows users to take advantage of Java ecosystem features and existing libraries. It provides real-time control that allows the user to control the flow of data between any source, processor, and destination. It helps to visualize DataFlow at the enterprise level. It helps to aggregate, transform, route, extract, listen, split and drag the data stream. This allows users to start and stop components at the individual and group levels. NiFi allows users to extract data from various sources in NiFi and allows them to create streaming files. It is designed to scale in clusters that provide guaranteed data delivery. Visualize and monitor performance and behavior in the Flow Bulletin, which offers built-in and informative documentation. Features of Apache NiFi
- Apache NiFi is a web-based user interface that offers a seamless design, monitoring, control, and feedback experience.
- It even provides a data origin module that helps track and control data from source to data flow destination.
- Developers can create their own custom processors and reporting tasks as required.
- It supports troubleshooting and flow optimization.
- It provides fast development and efficient testing.
- It provides content encryption and communication over a secure protocol.
- It maintains buffering of all data in the queue and provides backpressure capability as queues can reach given limits.
- Apache NiFi provides the system to the user, the user to the system, and the security features of multi-tenant authentication.
The Apache NiFi architecture includes a web server, a flow controller, and a processor running on a Java virtual machine (JVM). It has three repositories such as FlowFile repository, content repository and origin repository.
Web server The web server is used to host the HTTP based management and control API.
flow controller The flow controller is the brain of the operation. It proposes threads for running extensions and manages the schedule for when extensions get resources to run.
Extensions Several types of NiFi extensions are defined in other documents. Extensions are used to work and execute in the JVM.
FlowFile repository The FlowFile repository includes the current state and attribute of each FlowFile that passes through the NiFi data stream.
It keeps track of the state that is currently active on the thread. The standard approach is a continuous write-ahead log, which is located in the described partition of the disk.
Content repository The content repository is used to store all the data present in the stream files. The default approach is a fairly simple mechanism that stores blocks of data in the file system.
To reduce contention for any single volume, specify more than one file system storage location to get different partitions.
origin repository The origin repository stores all data about origin events. The repository construct can be connected to a default implementation that uses one or more physical disk volumes.
Event data is indexed and searchable in every location.
As of NiFi 1.0, the zero-leader clustering pattern is included. Each node in the cluster performs similar data tasks, but operates on a different set of data.
Apache Zookeeper selects one node as the cluster coordinator. The cluster coordinator is used to connect and disconnect nodes. In addition, each cluster has one master node.
The key concepts of Apache NiFi are as follows:
Thread : A thread is created to connect different processors to share and modify data that is required from one data source to another destination.
Connection : A connection is used to connect processors that act as a queue to store queued data when needed. It is also known as a bounded buffer in thread-based programming (FBP) terms. This allows multiple processes to communicate at different rates.
Processors . A processor is a Java module that is used to either retrieve data from a source system or store it on a target system. You can use multiple processors to add an attribute or change content in a FlowFile. It is responsible for sending, merging, routing, transforming, processing, creating, splitting, and receiving stream files.
FlowFile : FlowFile is the core concept of NiFi, which is a single object of data fetched from a source system in NiFi. This allows users to make changes to the Flowfile as it moves from the source processor to the destination. Various events such as create, get, clone, etc. that are executed on the Flowfile using different processors in the thread.
Event : An event represents a change in the Flowfile when the NiFi flow is bypassed. Such events are tracked in the data source. Data Origin : Data Origin is a repository that allows users to check the data regarding the Flow file and helps in troubleshooting if there are any issues while processing the Flow file.
Process Group : A process group is a set of processes and their respective connections that can receive data from an input port and send it through output ports.