Hello! Please refer to the MDA (Modern Data Architectures) PDF.
This is a lab to understand batch data ingestion, while using the OSBDET environment. This is a virtual machine project which packages many big data open source projects such as Hadoop, Spark, Kafka or Nifi. https://github.com/raulmarinperez/osbdet
In this case we will be using the configuration file .xml as a template to gather the data from the UK Police Crime Free Rest-API.
- Import the .xml configuration template file to Nifi
- Create a HDFS directory to safe the files
- Run the NiFi Flow to gather the data
- Run the .ipynb to watch the data being analysed.
BAM! That's it! I recommend you check the PDF, it is an easy way to look at the analysis without having to replicate all the steps.