Skip to content

Latest commit

 

History

History
90 lines (68 loc) · 7.96 KB

README.md

File metadata and controls

90 lines (68 loc) · 7.96 KB

BigData

Playground space for BigData projects

Projects were generally run using Debian WSL (Windows SubSystem for Linux)

If project depends on systemd check : https://devblogs.microsoft.com/commandline/systemd-support-is-now-available-in-wsl/

List of the technologies and tools for big data projects:

NOTE: Added (<num_jobs>) next to the tool name as reference (metric) to establish an orden (prioritize) based on job market demand. (just for some)

Related Programming Languages Python, Scala, Java

  1. Data Ingestion and Collection:

    • Apache Kafka (778) : A real-time data streaming platform for event sourcing.
    • Apache NiFi (34) : An integrated data logistics platform for automating data movement.
    • Apache Flume (17): Collects, aggregates, and moves large volumes of data.
  2. Data Storage:

  3. Data Processing and Batch Processing:

  4. Data Processing and Stream Processing:

    • Apache Kafka Streams (782) : Real-time stream processing and event-driven applications.
    • Apache Flink (75) : Stream processing framework with low-latency and high-throughput capabilities.
  5. Data Orchestration and Workflow:

    • Apache Airflow (432): Workflow automation and scheduling.
    • Kubeflow Pipelines (32): Management of machine learning workflows on Kubernetes.
    • Apache Oozie (5): Job scheduling, workflow management, and automation.
    • dagster : Dagster is an orchestrator that's designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports.
  6. Machine Learning and AI:

  7. Data Visualization and Reporting:

    • Power BI (2199): Data analytics and visualization by Microsoft.
    • Tableau (829): Data visualization and business intelligence tool.
    • Looker (674): Data exploration and business intelligence platform.
    • Dash (328): Python web application framework for building interactive data dashboards.
    • Kibana (93): Data visualization and exploration for the Elastic Stack.
    • Apache Superset (1): Data exploration and visualization platform.
  8. Data Security and Governance:

  9. Containers and Orchestration:

  10. Monitoring and Logging:

  11. Resource Management:

  12. Database and Querying Tools:

These descriptions provide an overview of each technology's purpose and capabilities. You can follow the links to access their official documentation for more detailed information.