Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blog : Using Node-RED as an ETL tool #2175

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

sumitshinde-84
Copy link
Collaborator

Description

Related Issue(s)

Checklist

  • I have read the contribution guidelines
  • I have considered the performance impact of these changes
  • Suitable unit/system level tests have been added and they pass
  • Documentation has been updated

Copy link
Contributor

Images automagically compressed by Calibre's image-actions

Compression reduced images by 8%, saving 7.25 KB.

Filename Before After Improvement Visual comparison
src/blog/2024/06/images/etl-with-node-red-chart-based-on-age.png 24.77 KB 23.42 KB -5.5% View diff
src/blog/2024/06/images/etl-with-node-red-chart-based-on-job-profile.png 65.89 KB 59.99 KB -9.0% View diff

1056 images did not require optimisation.

Copy link
Contributor

Images automagically compressed by Calibre's image-actions

Compression reduced images by 48.4%, saving 14.81 KB.

Filename Before After Improvement Visual comparison
src/blog/2024/06/images/etl-with-node-red-chart-customer-data-on-debug-panel.png 30.60 KB 15.78 KB -48.4% View diff

1058 images did not require optimisation.

@sumitshinde-84 sumitshinde-84 linked an issue Jun 13, 2024 that may be closed by this pull request
- business intelligence
---

ETL (Extract, Transform, Load) is essential for integrating and analyzing data, helping businesses unlock detailed insights. You already know Node-RED for its user-friendly approach to creating IoT applications, but did you know it can also be a powerful tool for ETL tasks? When IBM published a blog about using Node-RED for ETL, it caught a lot of attention and got people talking about its potential in this space. In this guide, we'll walk you through how to use Node-RED for ETL, sharing its strengths and weaknesses along the way.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumitshinde-84 This feels very ChatGPT, can you tone down the bravado?

src/blog/2024/06/using-node-red-as-an-etl-tool.md Outdated Show resolved Hide resolved
src/blog/2024/06/using-node-red-as-an-etl-tool.md Outdated Show resolved Hide resolved
src/blog/2024/06/using-node-red-as-an-etl-tool.md Outdated Show resolved Hide resolved
src/blog/2024/06/using-node-red-as-an-etl-tool.md Outdated Show resolved Hide resolved

### Extracting

Node-RED can extract data from various sources, including APIs, databases, local filesystems, and IoT devices using built-in nodes and community-contributed nodes. For example, the HTTP request node can be used to pull data from web services, while nodes for MySQL, MongoDB, and PostgreSQL can extract data from databases. Nodes for MQTT and Kafka can fetch data from message brokers. File nodes enable extraction of data from local filesystems, while different cloud nodes for platforms like AWS, GCP, and IBM Watson allow extraction of data from cloud storage services. Moreover, Node-RED running on edge devices can extract data from sensors connected directly to them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modbus, OPC-UA, etc etc aren't mentioned. Please lean into the industrial applicability more

Comment on lines +49 to +55
## Benefits of using Node-RED as an ETL tool.

1. **Ease of Use:** Its visual programming interface makes it accessible to non-developers.
2. **Flexibility:** A wide range of nodes and the ability to write custom JavaScript allow for flexible data processing.
3. **Integration:** Node-RED excels in integrating IoT devices and handling real-time data, making it well-suited for combining diverse data sources into unified workflows.
4. **Cost-Effective:** Being open-source, Node-RED can be a cost-effective alternative to expensive ETL tools.
5. **Community Support:** A large community provides a wealth of nodes, examples, and support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChatGPT? Feels uninspired and machine generated.

### Customer Data Deduplication

1. Drag a Change node onto the canvas, and set `msg.data` to `msg.payload` and `msg.payload` to `[]`.
2. Drag a MongoDB4 node onto the canvas, and configure it with your correct details. If you haven't used MongoDB with Node-RED, please refer to the [Using MongoDB with Node-RED](/blog/2024/04/using-mongodb-with-node-red/). Enter `find` into the Operation field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think MongoDB is used very often in an ETL context, precisely as it's structureless. Do we have a better example? An event based sensor perhaps?

Comment on lines +220 to +226
## What are the limitations of using Node-RED as an ETL tool

While Node-RED is versatile, there are some limitations to consider while using it as an ETL tool:

- Advanced Features: Some advanced ETL features, like automated schema detection and sophisticated error handling, might require additional customization or external modules.
- Data Governance: Node-RED does not inherently provide robust data governance and lineage tracking, which are often essential in enterprise ETL tools.
- Scalability: While Node-RED can handle many heavy tasks, but it may not offer the same level of optimization for processing extremely large datasets compared to dedicated ETL tools.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChatGPT?

sumitshinde-84 and others added 2 commits June 17, 2024 11:16
Co-authored-by: Zeger-Jan van de Weg <ZJvandeWeg@users.noreply.github.com>

## Building a Simple ETL Project Using Node-RED

Let's walk through a simple project where we will use Node-RED as an ETL tool to extract sample customer data from an API, transform it, and then load cleaned data into a MongoDB database and process data into a local file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should advice to use MongoDB? PG, Snowflake, and others are more often a target for the data.


Let's walk through a simple project where we will use Node-RED as an ETL tool to extract sample customer data from an API, transform it, and then load cleaned data into a MongoDB database and process data into a local file.

*Note: The goal of this project is to understand how to utilize Node-RED as an ETL tool. We assume that the reader has a basic knowledge of Node-RED.*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Note: The goal of this project is to understand how to utilize Node-RED as an ETL tool. We assume that the reader has a basic knowledge of Node-RED.*

## Extracting Data

1. Drag an Inject node onto the canvas.
2. Drag an HTTP Request node onto the canvas, double-click on it, and set the URL to `https://api.slingacademy.com/v1/sample-data/files/customers.json`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like cheating. Not a real world scenario

@Yndira-E Yndira-E linked an issue Jul 4, 2024 that may be closed by this pull request
@ZJvandeWeg
Copy link
Member

@sumitshinde-84 This has gone stale? Any updates pending?

@sumitshinde-84
Copy link
Collaborator Author

Not sure which example I should use. Using the temperature example every time is not good, though.

@ZJvandeWeg
Copy link
Member

@sumitshinde-84 What about machine logs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Blog Art: Using Node-RED as an ETL tool Blog : How to Use Node-RED as an ETL Tool
2 participants