Skip to content

Latest commit

ย 

History

History
1425 lines (1352 loc) ยท 164 KB

apache.md

File metadata and controls

1425 lines (1352 loc) ยท 164 KB

Apache

Airflow

Ambari

  • 3 GREAT REASONS TO TRY APACHE HIVE VIEW 2.0
    • Apache Ambari์—์„œ Apache Hive 2.5์™€ ์ƒํ˜ธ ์ž‘์šฉํ•  ์ˆ˜ ์ž‡๋Š” ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์„ ์†Œ๊ฐœ
    • Optimizer๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ํ…Œ์ด๋ธ”๊ณผ ์ปฌ๋Ÿผ ํ†ต๊ณ„๋ฅผ ๋ณด๊ณ  ์—ฐ์‚ฐ ๊ฐ€๋Šฅ, Explain pland ์‹œ๊ฐํ™” ํฌํ•จ
  • WHY SHOULD YOU CARE ABOUT AMBARI 2.5?
    • Apache Ampari 2.5 ๊ณต๊ฐœ. ์„œ๋น„์Šค ์ž๋™ ์žฌ์‹œ์ž‘, ๋กœ๊ทธ ๋กœํ…Œ์ด์…˜/๋กœ๊ทธ ๊ฒ€์ƒ‰, ๊ฐœ์„ ๋œ ๊ตฌ์„ฑ ๊ด€๋ฆฌ์™€ ์ƒˆ๋กœ์šด ๋ชจ๋‹ˆํ„ฐ๋ง ๊ธฐ๋Šฅ ๋“ฑ์ด ํฌํ•จ
  • How to upgrade Apache Ambari 2.6.2 to Apache Ambari 2.7.3

Apex

Arrow

Atlas

Beam

BookKeeper

Brooklyn

Camel

Commons

Cordova

Crunch

Doris

Drill

Druid

Eagle

Falcon

Flink

Flume

  • Scaling a flume agent to handle 120K events/sec
    • Apache Flume์šฉ ์ƒˆ๋กœ์šด channel selector์ธ "Round-Robin Channel Selector" ์„ค๋ช…
    • ์ด ์„ ํƒ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๊ธฐ๋ณธ ๋ฐฐ์น˜ ์ฒ˜๋ฆฌ๋Ÿ‰์˜ ์•ฝ 10๋ฐฐ๊นŒ์ง€ ํ™•์žฅ

Geode

Goblin

HAWQ - advanced enterprise SQL-on-Hadoop query engine and analytic database

Hivemall

Hop

Hudi

Iceberg

Ignite

Impala

Jena

Kafka

Kafka Library

Kafka Client library

Kafka Installation & Management

Kafka Monitoring

  • Burrow: Kafka Consumer Lag Checking
  • Kafka Dashboard | Datadog
    • ๋ชจ๋‹ˆํ„ฐ๋ง์„ ์œ„ํ•ด DataDog์„ ์‚ฌ์šฉํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์—๊ฒŒ ์นดํ”„์นด ํด๋Ÿฌ์Šคํ„ฐ์™€ ๋ชจ๋‹ˆํ„ฐ๋ง ์Šคํƒ์„ ํ†ตํ•ฉํ•˜๋Š” ๊ฒƒ์„ ๋„์™€์ฃผ๋Š” ํ›Œ๋ฅญํ•œ ์นดํ”„์นด ๋Œ€์‹œ๋ณด๋“œ๋ฅผ ์ œ๊ณต
    • ๋งŽ์€ ์ง€ํ‘ฏ๊ฐ’๋“ค์„ ๋‹จ์ˆœํ™”ํ•จ์œผ๋กœ์จ ์นดํ”„์นด ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ƒํƒœ๋ฅผ ํ•œ๋ˆˆ์— ์•Œ์•„๋ณผ ์ˆ˜ ์žˆ๋„๋ก ๋””์ž์ธ
  • kafka-monitor: Xinfra Monitor monitors the availability of Kafka clusters by producing synthetic workloads using end-to-end pipelines to obtain derived vital statistics - E2E latency, service produce/consume availability, offsets commit availability & latency, message loss rate and more
    • ์—‘์Šค ์ธํ”„๋ผ ๋ชจ๋‹ˆํ„ฐ(์˜ˆ์ „ ์ด๋ฆ„: ์นดํ”„์นด ๋ชจ๋‹ˆํ„ฐ(Kafka Monitor))๋Š” ๋งํฌ๋“œ์ธ์—์„œ ์นดํ”„์นด ํด๋Ÿฌ์Šคํ„ฐ์™€ ๋ธŒ๋กœ์ปค์˜ ๊ฐ€์šฉ์„ฑ์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ธฐ ์œ„ํ•ด ๊ฐœ๋ฐœ
    • ํด๋Ÿฌ์Šคํ„ฐ์˜ ํ† ํ”ฝ ์ง‘ํ•ฉ์— ์ธ์œ„์ ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•ด ๋„ฃ์€ ๋’ค ์ง€์—ฐ, ๊ฐ€์šฉ์„ฑ, ๋ˆ„๋ฝ ์—ฌ๋ถ€ ๋“ฑ์„ ์ธก์ •ํ•˜๋Š” ์‹์œผ๋กœ ๋™์ž‘
    • ํด๋ผ์ด์–ธํŠธ๋กœ ์ง์ ‘ ์กฐ์ž‘ํ•  ํ•„์š” ์—†์ด ์นดํ”„์นด ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ƒํƒœ๋ฅผ ์ธก์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด ์ฃผ๋Š” ๊ท€์ค‘ํ•œ ํˆด
    • URP? Excuse You! The Three Metrics You Have to Know (Todd Palino, Linkedin) Kafka Summit 2018
  • kcat: Generic command line non-JVM Apache Kafka producer and consumer
    • kcat(์˜ˆ์ „ ์ด๋ฆ„: kafkacat)์€ ์ฝ”์–ด ์•„ํŒŒ์น˜ ์นดํ”„์นด ํ”„๋กœ์ ํŠธ์— ํฌํ•จ๋œ ์ฝ˜์†” ํ”„๋กœ๋“€์„œ์™€ ์ปจ์Šˆ๋จธ์˜ ๋Œ€์ฒด์žฌ๋กœ์„œ ์ธ๊ธฐ
    • ์ž‘๊ณ , ๋น ๋ฅด๊ณ , C๋กœ ์ž‘์„ฑ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— JVM ์˜ค๋ฒ„ํ—ค๋“œ ์—†์Œ
    • ํด๋Ÿฌ์Šคํ„ฐ์— ๋Œ€ํ•œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ๋ณด์—ฌ์คŒ์œผ๋กœ์จ ํด๋Ÿฌ์Šคํ„ฐ ์ƒํƒœ๋ฅผ ์ œํ•œ์ ์œผ๋กœ๋‚˜๋งˆ ํ™•์ธ ๊ฐ€๋Šฅ
    • kcat(kafkacat) ์†Œ๊ฐœ kcat์œผ๋กœ ์ด๋ฆ„์ด ๋ฐ”๋€ kafkacat CLI ๋„๊ตฌ ์‚ฌ์šฉ ๋ฐฉ๋ฒ• ์„ค๋ช…. kcat์€ non-JVM ๊ธฐ๋ฐ˜ ํ”„๋กœ๋“€์„œ, ์ปจ์Šˆ๋จธ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋Š” ๋„๊ตฌ, kafka์˜ ๋ฉ”ํƒ€ ์ •๋ณด ํ™•์ธ ๋ฉ”์‹œ์ง€ ์ „์†กํ•˜๊ณ  ๊ฐ€์ ธ์˜ค๋Š” ๋ฐฉ๋ฒ•
  • streams-explorer: Explore Data Pipelines in Apache Kafka
    • ์ŠคํŠธ๋ฆผ์ฆˆ ์ต์Šคํ”Œ๋กœ๋Ÿฌ๋Š” ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค ์•ˆ์—์„œ ๋™์ž‘ํ•˜๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜๊ณผ ์ปค๋„ฅํ„ฐ๋“ค ์‚ฌ์ด์˜ ๋ฐ์ดํ„ฐ ํ๋ฆ„์„ ์‹œ๊ฐํ™”ํ•ด์„œ ๋ณด์—ฌ์ฃผ๋Š” ํˆด
    • bakdata์˜ ํˆด์„ ํ†ตํ•ด ์นดํ”„์นด ์ŠคํŠธ๋ฆผ์ฆˆ๋‚˜ Faust๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์ „์ฒด ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•ด์•ผ ํ•˜์ง€๋งŒ, ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜๊ณผ ๊ทธ ์ง€ํ‘œ๋“ค์„ ์‰ฝ๊ฒŒ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ํ˜•ํƒœ๋กœ ๋ณด์—ฌ ์คŒ

Kafka Platform

  • Aiven - Data infrastructure made simple
    • ์•„ํŒŒ์น˜ ์นดํ”„์นด๋ฅผ ํฌํ•จํ•œ ๋งŽ์€ ๋ฐ์ดํ„ฐ ํ”Œ๋žซํผ์„ ์œ„ํ•œ ๋งค๋‹ˆ์ง€๋“œ ์†”๋ฃจ์…˜ ์ œ๊ณต
    • ์Šคํ‚ค๋งˆ ๋ ˆ์ง€์ŠคํŠธ๋ฆฌ์™€ REST ํ”„๋ก์‹œ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์นด๋ผ์ŠคํŽ˜์ด์Šค(Karapace) ๊ฐœ๋ฐœ
      • ๋‘ ์ปจํ”Œ๋ฃจ์–ธํŠธ ์†”๋ฃจ์…˜์˜ API์™€ ํ˜ธํ™˜๋˜์ง€๋งŒ ์•„ํŒŒ์น˜ 2.0 ๋ผ์ด์„ ์Šค๋ฅผ ๋”ฐ๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ์ œํ•œ๋˜๋Š” ํ™œ์šฉ ์‚ฌ๋ก€ ๊ฐ™์€ ๊ฒƒ์€ ์—†์Œ
    • 3๊ฐœ ์ฃผ์š” ํด๋ผ์šฐ๋“œ ์ œ๊ณต์ž ์™ธ์—๋„ ๋””์ง€ํ„ธ์˜ค์…˜(DigitalOcean)๊ณผ ์—…ํด๋ผ์šฐ๋“œ(UpCloud) ์ง€์›
  • Amazon MSK ์™„์ „๊ด€๋ฆฌํ˜• Apache Kafka โ€“ Amazon MSK โ€“ Amazon Web Services
    • REST ํ”„๋ก์‹œ๋Š” ์ง์ ‘์ ์œผ๋กœ ์ง€์›๋˜์ง€ ์•Š์ง€๋งŒ, ์Šคํ‚ค๋งˆ ์ง€์›์€ AWS Glue์™€์˜ ํ†ตํ•ฉ์„ ํ†ตํ•ด ์ œ๊ณต
    • ํฌ๋ฃจ์ฆˆ ์ปจํŠธ๋กค, ๋ฒ„๋กœ์šฐ, ์ปจํ”Œ๋ฃจ์–ธํŠธ REST ํ”„๋ก์‹œ์™€ ๊ฐ™์€ ์ปค๋ฎค๋‹ˆํ‹ฐ ํˆด ์‚ฌ์šฉ์„ ๊ถŒ์žฅ
      • ํ•˜์ง€๋งŒ, ์ง์ ‘ ์ง€์›์€ ์—†์œผ๋ฏ€๋กœ ๋‹ค๋ฅธ ๊ฒƒ๋“ค์— ๋น„ํ•ด ํ†ตํ•ฉ์„ฑ์€ ์•ฝ๊ฐ„ ๋–จ์–ด์ง€์ง€๋งŒ ์—ฌ์ „ํžˆ ์ฝ”์–ด ์นดํ”„์นด ํด๋Ÿฌ์Šคํ„ฐ๋Š” ์ง€์›
  • Azure HDInsight - Hadoop, Spark, and Kafka | Microsoft Azure
    • HDInsight ์•ˆ์— ํ•˜๋‘ก, ์ŠคํŒŒํฌ, ๋‹ค๋ฅธ ๋น…๋ฐ์ดํ„ฐ ์ปดํฌ๋„ŒํŠธ๋“ค๊ณผ ํ•จ๊ป˜ ๋งค๋‹ˆ์ง€๋“œ ์นดํ”„์นด ํ”Œ๋žซํผ ์ œ๊ณต
    • MSK์™€ ๋น„์Šทํ•˜๊ฒŒ, HDInsight๋Š” ์ฝ”์–ด ์นดํ”„์นด ํด๋Ÿฌ์Šคํ„ฐ์— ์ดˆ์ 
      • ์Šคํ‚ค๋งˆ ๋ ˆ์ง€์ŠคํŠธ๋ฆฌ์™€ REST ํ”„๋ก์‹œ๋ฅผ ํฌํ•จํ•œ ๋‹ค๋ฅธ ์ปดํฌ๋„ŒํŠธ๋“ค์€ ์‚ฌ์šฉ์ž์˜ ์„ ํƒ
    • ๋ช‡๋ช‡ ์„œ๋“œ ํŒŒํ‹ฐ๋“ค์ด ์ด๋Ÿฌํ•œ ์‹œ์Šคํ…œ๋“ค์„ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ด ์ฃผ๋Š” ํ…œํ”Œ๋ฆฟ์„ ์ œ๊ณตํ•˜์ง€๋งŒ ๋งˆ์ดํฌ๋กœ์†Œํ”„ํŠธ ์ง€์›์€ ์—†์Œ
  • Cloudera Apache Kafka supported by Cloudera Enterprise
    • ํด๋ผ์šฐ๋ฐ๋ผ๋Š” ์•„ํŒŒ์น˜ ์นดํ”„์นด ์ดˆ๊ธฐ๋ถ€ํ„ฐ ์นดํ”„์นด ์ปค๋ฎค๋‹ˆํ‹ฐ์˜ ์ผ์›
    • ํด๋ผ์šฐ๋ฐ๋ผ ๋ฐ์ดํ„ฐ ํ”Œ๋žซํผ(Cloudera Data Platform, CDP) ์ œํ’ˆ์˜ ์ŠคํŠธ๋ฆผ ๋ฐ์ดํ„ฐ ์ปดํฌ๋„ŒํŠธ๋กœ์„œ ๋งค๋‹ˆ์ง€๋“œ ์นดํ”„์นด ์ œ๊ณต
    • CDP๋Š” ๋‹จ์ˆœํ•œ ์นดํ”„์นด ์ด์ƒ์˜ ๊ฒƒ์— ์ดˆ์ ์„ ๋งž์ถ”์ง€๋งŒ, ํ”„๋ผ์ด๋น— ์˜ต์…˜์€ ๋ฌผ๋ก ์ด๊ณ  ํผ๋ธ”๋ฆญ ํด๋ผ์šฐ๋“œ ํ™˜๊ฒฝ์—์„œ๋„ ๋™์ž‘
  • CloudKarafka - Apache Kafka Message streaming as a Service
    • ๋ฐ์ดํ„ฐ๋…(DataDog)์ด๋‚˜ ์Šคํ”Œ๋ ํฌ(Splunk)์™€ ๊ฐ™์ด ๋„๋ฆฌ ์“ฐ์ด๋Š” ์ธํ”„๋ผ์ŠคํŠธ๋Ÿญ์ฒ˜ ์„œ๋น„์Šค์™€์˜ ํ†ตํ•ฉ๊ณผ ํ•จ๊ป˜, ๋งค๋‹ˆ์ง€๋“œ ์นดํ”„์นด ์†”๋ฃจ์…˜์„ ์ œ๊ณตํ•˜๋Š” ๋ฐ ์ดˆ์ 
    • ์ปจํ”Œ๋ฃจ์–ธํŠธ์˜ ์Šคํ‚ค๋งˆ ๋ ˆ์ง€์ŠคํŠธ๋ฆฌ์™€ REST ํ”„๋ก์‹œ ์—ญ์‹œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์›
      • ์ปจํ”Œ๋ฃจ์–ธํŠธ ์ธก์˜ ๋ผ์ด์„ ์Šค ๋ณ€๊ฒฝ์œผ๋กœ ์ธํ•ด 5.0 ๋ฒ„์ „๊นŒ์ง€๋งŒ ์ง€์›
    • AWS์™€ ๊ตฌ๊ธ€ ํด๋ผ์šฐ๋“œ ํ”Œ๋žซํผ์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
  • Confluent Cloud: Fully Managed Kafka as a Cloud-Native Service
    • ์—ฌ๋Ÿฌ ํ•„์ˆ˜ ํˆด๋“ค(์Šคํ‚ค๋งˆ ๊ด€๋ฆฌ, ํด๋ผ์ด์–ธํŠธ, REST ์ธํ„ฐํŽ˜์ด์Šค, ๋ชจ๋‹ˆํ„ฐ๋ง)๊ณผ ํ•จ๊ป˜ ์ œ๊ณต
    • 3๊ฐœ ์ฃผ์š” ํด๋ผ์šฐ๋“œ ํ”Œ๋žซํผ(AWS, Microsoft Azure, Google Cloud Platform) ๋ชจ๋‘์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
    • ์ปจํ”Œ๋ฃจ์–ธํŠธ์—์„œ ์ผํ•˜๊ณ  ์žˆ๋Š” ์ƒ๋‹น์ˆ˜์˜ ์•„ํŒŒ์น˜ ์นดํ”„์นด ์ฝ”์–ด ๊ฐœ๋ฐœ์ž๋“ค์— ์˜ํ•œ ์ง€์› ์ œ๊ณต
    • ์Šคํ‚ค๋งˆ ๋ ˆ์ง€์ŠคํŠธ๋ฆฌ์™€ REST ํ”„๋ก์‹œ์™€ ๊ฐ™์ด ํ”Œ๋žซํผ์— ํฌํ•จ๋˜์–ด ์žˆ๋Š” ๋งŽ์€ ์ปดํฌ๋„ŒํŠธ๋“ค์€ ๋ช‡๋ช‡ ํ™œ์šฉ ์‚ฌ๋ก€๋ฅผ ์ œํ•œํ•˜๋Š” ์ปจํ”Œ๋ฃจ์–ธํŠธ ์ปค๋ฎค๋‹ˆํ‹ฐ ๋ผ์ด์„ ์Šค ํ•˜์— ๋‹จ๋…์œผ๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅ

Kafka Stream

Kudu

  • Kudu

  • Kudu

  • getkudu.io

  • Kudu: New Hadoop Storage for Fast Analytics on Fast Data

  • Apache Kudu as a More Flexible And Reliable Kafka-style Queue

  • Big Data: current trends & next big thing 'Apache Kudu' - my takeaways from Strata + Hadoop 2016 @San Jose

  • #bbuzz 2016: Todd Lipcon - Apache Kudu (incubating): Fast Analytics on Fast Data

  • Build a Prediction Engine Using Spark, Kudu, and Impala

  • Creating a Post-Lambda World with Apache Kudu

  • Up and running with Apache Spark on Apache Kudu

  • Apache Kudu 1.3.0 was released

    • Apache Kudu 1.3.0 ๋ฆด๋ฆฌ์ฆˆ
    • Kerberos ์ธ์ฆ, TLS๋ฅผ ์‚ฌ์šฉํ•œ ์•”ํ˜ธํ™” ์ „์†ก, coarse-grained authorization ๋“ฑ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ ์ถ”๊ฐ€
    • LZ4 ์••์ถ•์œผ๋กœ ์ „ํ™˜ํ•˜๋Š” ๋“ฑ ๋ช‡ ๊ฐ€์ง€ ์ตœ์ ํ™” ๊ธฐ๋Šฅ ํฌํ•จ
  • Apache Kudu Read & Write Paths

  • kudu-master clustering

    kudu-master \
      --master_addresses=172.23.30.101,172.23.30.102,172.23.30.103 \
      --fs_data_dirs=/data1/kudu/master/data \
      --fs_wal_dir=/data1/kudu/master/wal \
      --log_dir=/opt/log/kudu \
      --raft_get_node_instance_timeout_ms=60000
    
    • ์œ„์™€ ๊ฐ™์ด 3๋Œ€์— ๋„์šฐ๋ฉด, /data1/kudu/master/data ํ•˜์œ„์— consensus๋ฅผ ๋งž์ถ”๊ณ  ๋ฆฌ๋”๊ฐ€ ์„ ์ถœ๋œ ํ›„์— ๋ณ„๋„์˜ 000000000000000000 ํŒŒ์ผ์„ ์ƒ์„ฑ
    • ์„ฑ๊ณต์ ์œผ๋กœ ๋„์›Œ์ง€๊ณ  ๋‚œ ํ›„๋กœ๋Š” ํด๋Ÿฌ์Šคํ„ฐ ๋…ธ๋“œ๊ฐ€ ๊นจ์ ธ๋„ ๋‹ค์‹œ ๋„์šธ๋•Œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์Œ
    • ์˜ค๋ฅ˜ ๋ฐœ์ƒํ•˜์˜€์„ ๋•Œ๋Š”, /data1/kudu/master/data ์™€ /data1/kudu/master/wal ๋””๋ ‰ํ† ๋ฆฌ ์‚ญ์ œํ›„ ๋‹ค์‹œ raft_get_node_instance_timeout_ms ๋‚ด์— ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์ด๋ฃจ๋Š” IP์— ํ”„๋กœ์„ธ์Šค๊ฐ€ ์‹คํ–‰๋˜๋„๋ก ํ•˜๋ฉด ๋จ
  • Low latency high throughput streaming using Apache Apex and Apache Kudu

    • Apache Kudu์™€ Apache Apex๋ฅผ ์ด์šฉํ•œ ๊ณ ์„ฑ๋Šฅ ์ŠคํŠธ๋ฆฌ๋ฐ์ฒ˜๋ฆฌ ๋ฐฉ์‹์— ๋Œ€ํ•ด ์„ค๋ช…
  • A brave new world in mutable big data relational storage (Strata NYC 2017)

  • Kudu๋ฅผ ์ด์šฉํ•œ ๋น…๋ฐ์ดํ„ฐ ๋‹ค์ฐจ์› ๋ถ„์„ ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ

  • Guide to Using Apache Kudu and Performance Comparison with HDFS

  • Transparent Hierarchical Storage Management with Apache Kudu and Impala

    • Apache Kudu ๋ฐ Impala๋ฅผ ์‚ฌ์šฉํ•œ ๊ณ„์ธต์  ์Šคํ† ๋ฆฌ์ง€ ๊ด€๋ฆฌ
    • Apache Impala๋ฅผ Apache Kudu ๋ฐ Apache HDFS์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋Š” ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ(sliding window) ํŒจํ„ด
    • ์ด๋Ÿฌํ•œ ํŒจํ„ด์„ ์‚ฌ์šฉํ•˜๋ฉด ์—ฌ๋Ÿฌ ์Šคํ† ๋ฆฌ์ง€ ๊ณ„์ธต์˜ ์ด์ ์„ ์‚ฌ์šฉ์ž์—๊ฒŒ ํˆฌ๋ช…ํ•œ ๋ฐฉ์‹์œผ๋กœ ๋ชจ๋‘ ๊ตฌํ˜„ ๊ฐ€๋Šฅ
    • Apache Kudu๋Š” ๊ธ‰๋ณ€ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋น ๋ฅด๊ฒŒ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„. ๋˜ํ•œ ๋น ๋ฅธ ์ธ์„œํŠธ/์—…๋ฐ์ดํŠธ์™€ ํšจ์œจ์ ์ธ ์—ด ๊ธฐ๋ฐ˜ ์Šค์บ”์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋‹จ์ผ ์Šคํ† ๋ฆฌ์ง€ ๊ณ„์ธต์—์„œ๋„ ๋‹ค์ˆ˜์˜ ์‹ค์‹œ๊ฐ„ ๋ถ„์„ ์›Œํฌ๋กœ๋“œ๋ฅผ ์ง€์›. ์ด๋Ÿฌํ•œ ์ด์œ  ๋•Œ๋ฌธ์— ์–ธ์ œ๋“ ์ง€ ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ๊ฐ€ ์ €์žฅ๋˜๋Š” ์žฅ์†Œ๋กœ์„œ ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์— ๋งค์šฐ ์ ํ•ฉ. ๋˜ํ•œ ํ–‰ ์—…๋ฐ์ดํŠธ์™€ ํ–‰ ์‚ญ์ œ๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ง€์›ํ•˜์—ฌ ์ง€์—ฐ ์ˆ˜์‹ ๋˜๋Š” ๋ฐ์ดํ„ฐ ๋ฐ ๋ฐ์ดํ„ฐ ๊ต์ •๋„ ๊ฐ€๋Šฅ
    • Apache HDFS๋Š” ๋‚ฎ์€ ๋น„์šฉ์œผ๋กœ ๋ฌด์ œํ•œ ํ™•์žฅ์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ์„ค๊ณ„. ๋”ฐ๋ผ์„œ ๋ฐ์ดํ„ฐ ๋ณ€๊ฒฝ์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ๋ฐฐ์น˜ ์ง€ํ–ฅ ์‚ฌ์šฉ ์‚ฌ๋ก€์— ์ตœ์ ํ™”. ๊ทธ ๋ฐ–์—๋„ Apache Parquet ํŒŒ์ผ ํ˜•์‹๊ณผ ์—ฐ๊ฒฐํ•  ๊ฒฝ์šฐ ๋งค์šฐ ๋†’์€ ์ฒ˜๋ฆฌ๋Ÿ‰๊ณผ ํšจ์œจ์„ฑ์œผ๋กœ ์ •ํ˜• ๋ฐ์ดํ„ฐ์— ์•ก์„ธ์Šค ๊ฐ€๋Šฅ
    • ์ฐจ์› ํ…Œ์ด๋ธ”์ฒ˜๋Ÿผ ๋ฐ์ดํ„ฐ๊ฐ€ ์†Œ๋Ÿ‰์ด๋ฉด์„œ ๋Š์ž„์—†์ด ๋ฐ”๋€Œ๋Š” ์ƒํ™ฉ์—์„œ๋Š” ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ Kudu์— ์ €์žฅํ•˜๋Š” ๊ฒฝ์šฐ ๋‹ค์ˆ˜. ๋ฐ์ดํ„ฐ๊ฐ€ Kudu์˜ ํ™•์žฅ ์ œํ•œ์„ ๋„˜์ง€ ์•Š๋Š”๋‹ค๋ฉด ๋Œ€์šฉ๋Ÿ‰ ํ…Œ์ด๋ธ”์ด๋ผ๊ณ  ํ•ด๋„ Kudu์˜ ๊ณ ์œ  ๊ธฐ๋Šฅ์„ ์ด์šฉ ๊ฐ€๋Šฅํ•˜๋ฏ€๋กœ Kudu์— ์ €์žฅ. ๋ฐ์ดํ„ฐ๊ฐ€ ๋Œ€์šฉ๋Ÿ‰์ด๊ณ , ๋ฐฐ์น˜ ์ง€ํ–ฅ์ ์ด๊ณ , ๋ณ€๊ฒฝ์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ์—๋Š” Parquet ํ˜•์‹์„ ์‚ฌ์šฉํ•ด ๋ฐ์ดํ„ฐ๋ฅผ HDFS์— ์ €์žฅํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Œ. ๋‘ ์Šคํ† ๋ฆฌ์ง€ ๊ณ„์ธต์˜ ์ด์ ์ด ๋ชจ๋‘ ์š”ํ•˜๋‹ค๋ฉด ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ ํŒจํ„ด์ด ํšจ๊ณผ์ ์ธ ์†”๋ฃจ์…˜
  • Testing Apache Kudu Applications on the JVM

  • Kudu as Storage Layer to Digitize Credit Processes

  • kuduraft: A Raft Library in C++ based on the Raft implementation in Apache Kudu

    • Building and deploying MySQL Raft at Meta - Engineering at Meta
      • Facebook์ด semisynchronous ๋ณต์ œ ํ”„๋กœํ† ์ฝœ์„ ์ด์šฉํ•ด์„œ ๋‹ค๋ฅธ ๋ฆฌ์ „์— ๋ณต์ œ๋ณธ์„ ์ด์šฉํ•˜๊ณ  ์žˆ์—ˆ์œผ๋‚˜ ๊ตฌ์„ฑ๋„ ๋ณต์žกํ•˜๊ณ  ๊ด€๋ฆฌ๊ฐ€ ์–ด๋ ค์›Œ์„œ ๋งŽ์€ ๋ฌธ์ œ๋ฅผ ์ผ์œผํ‚จ๋‹ค๋Š” ๊ฒƒ์„ ๊นจ๋‹ซ๊ณ  Raft ํ•ฉ์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋„์ž…
      • Apache Kudu๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ MySQL์šฉ Raft ๊ตฌํ˜„์ธ kuduraft๋ฅผ ์˜คํ”ˆ์†Œ์Šค๋กœ ๊ณต๊ฐœ
      • ํ”„๋ผ์ด๋จธ๋ฆฌ๊ฐ€ Raft๋กœ binlog์— ์“ฐ๊ณ  Raft๊ฐ€ binlog๋ฅผ ํŒ”๋กœ์–ด/๋ฆฌํ”Œ๋ฆฌ์ผ€์ดํ„ฐ์— ์ „์†ก
      • MySQL Raft๋ฅผ ํ†ตํ•ด MySQL ์„œ๋ฒ„๊ฐ€ ํ”„๋กœ๋ชจ์…˜๊ณผ ๋ฉค๋ฒ„์‹ญ์„ ์ฒ˜๋ฆฌํ•˜๋„๋ก ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์šด์˜์ƒ ์–ด๋ ค์›€ ํฌ๊ฒŒ ๊ฐ์†Œ

Kylin

Kyuubi

Mesos

Metron

  • Metron ๋ณด์•ˆ์— ํฌ์ปค์Šค๋ฅผ ๋‘” ๋ถ„์„ ์‹œ์Šคํ…œ

Nifi

Nutch

Oozie

Ozone

  • Introducing Apache Hadoop Ozone: An Object Store for Apache Hadoop
    • Apache Hadoop Ozone ์†Œ๊ฐœ. ํ•˜๋‘ก ์ €์žฅ์†Œ ๋ ˆ์ด์–ด ์ตœ์ƒ๋‹จ. ์–ผ๋งˆ ์ „ ์•ŒํŒŒ ๋ฒ„์ „ ๋ฆด๋ฆฌ์ฆˆ
    • ๊ธฐ๋ณธ ์ปจ์…‰
      • SCALABLE
        • Ozone is designed to scale to tens of billions of files and blocks and, in the future, even more
        • Small files or huge number of datanodes are no longer a limitation
      • CONSISTENT; Storage Layer uses RAFT protocol for consistentency
      • CLOUD-NATIVE; Hadoop Ozone is designed to work well in containerized environments like YARN and Kubernetes
  • Apache Hadoop Ozone โ€“ Object Store Architecture
  • One billion files in Ozone

Parquet

Phoenix

Pig

Pinot

PredictionIO

Pulsar

Ranger

  • Ranger
  • ITโ€™S MORPHING TIME: APACHE RANGER GRADUATES TO A TOP LEVEL PROJECT โ€“ PART 2
    • Apache ํƒ‘ ๋ ˆ๋ฒจ ํ”„๋กœ์ ํŠธ๋กœ ์Šน๊ฒฉ๋œ Apach Ranger์— ๋Œ€ํ•œ Key Feature ์†Œ๊ฐœ
    • ์†์„ฑ ๊ธฐ๋ฐ˜์˜ ์—‘์„ธ์Šค ์ œ์–ด, ์ •์ฑ… ์—”์ง„, ํ•˜๋“œ์›จ์–ด ๊ด€๋ฆฌ ๋ชจ๋“ค๊ณผ ๊ฒฐํ•ฉํ•  ์ˆ˜ ์žˆ๋Š” ํ‚ค ๊ด€๋ฆฌ ์„œ๋น„์Šค ๋“ฑ์„ ํฌํ•จ
  • INTRODUCING ROW/ COLUMN LEVEL ACCESS CONTROL FOR APACHE SPARK
    • Hortonworks์—์„œ Apache Ranger๋ฅผ ํ†ตํ•ด Hive ๋˜๋Š” Apark SQL์—์„œ ํ–‰๋ ฌ ์ˆ˜์ค€์˜ ๋ฐ์ดํ„ฐ ์—‘์„ธ์Šค ๋ฐ ๋ฐ์ดํ„ฐ ๋งˆ์Šคํ‚น์„ ์ง€์›ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๊ฐ„๋‹จํ•œ ๋ฐ๋ชจ์™€ ํ•จ๊ป˜ ์„ค๋ช…
  • Apache Ranger Vs Sentry Hadoop ์—์ฝ”์‹œ์Šคํ…œ๋“ค์— ๋Œ€ํ•œ ์ธ์ฆ๊ณผ ์—ฌ๋Ÿฌ ๋ณด์•ˆ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋Š” Apache Ranger์™€ Apache Sentry์— ๋Œ€ํ•ด ๋น„๊ต ์„ค๋ช…

River

Samza

  • Samza
    • ์นดํ”„์นด๋ฅผ ์œ„ํ•ด ์„ค๊ณ„๋œ ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ ํ”„๋ ˆ์ž„์›Œํฌ
    • ์นดํ”„์นด ์ŠคํŠธ๋ฆผ์ฆˆ๋ณด๋‹ค ๋” ์˜ค๋ž˜๋˜๊ธด ํ–ˆ์ง€๋งŒ, ๊ฐœ๋ฐœํŒ€์˜ ์ƒ๋‹น์ˆ˜๊ฐ€ ๊ฒน์น˜๊ธฐ ๋•Œ๋ฌธ์— ๋‘˜์€ ๋งŽ์€ ๊ฐœ๋…๋“ค์„ ๊ณต์œ 
    • ๋‹จ, ์นดํ”„์นด ์ŠคํŠธ๋ฆผ์ฆˆ์™€๋Š” ๋‹ฌ๋ฆฌ ์‚ผ์ž๋Š” YARN์—์„œ ๋Œ์•„๊ฐ€๋ฉฐ, ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ๋™์ž‘ํ•  ์ˆ˜ ์žˆ๋Š” ์™„์ „ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ๊ณต
  • REAL-TIME FULL-TEXT SEARCH WITH LUWAK AND SAMZA
  • Apache Kafka, Samza, and the Unix Philosophy of Distributed Data
  • Concourse: Generating Personalized Content Notifications in Near-Real-Time
    • LinkedIn์˜ ๊ฐœ์ธํ™”๋œ ์•Œ๋ฆผ ์‹œ์Šคํ…œ์ธ Concourse์˜ ๋””์ž์ธ์— ๋Œ€ํ•ด ์†Œ๊ฐœ
    • Apache Kafka์™€ Apache Samza์— ๊ธฐ๋ฐ˜ํ•œ ๋ฐฐ์น˜ ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉ
    • ์ฒ˜๋ฆฌ๋Ÿ‰์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๋Š” ๊ฐ ๋ฐ์ดํ„ฐ์„ผํ„ฐ์—์„œ ํ•˜๋„๋ก ์„ค๊ณ„

SeaTunnel

ShardingSphere

SINGA

  • SINGA a general distributed deep learning platform for training big deep learning models over large datasets

Slider

Solr

Spot

  • Spot ๋„คํŠธ์›Œํฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ infosec ์œ„ํ˜‘์„ ํƒ์ง€ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ
  • Apache Spot (incubating) and Cloudera on AWS in 60 Minutes
    • Apache Kafka(์ฒ˜๋ฆฌ์šฉ), Apache Spark(์ฒ˜๋ฆฌ ๋ฐ ML ๋ถ„์„์šฉ), Apache Hadoop(์ฒ˜๋ฆฌ ๋ฐ ์ €์žฅ์šฉ) ๋“ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ Apache Spot์˜ ์•„ํ‚คํ…์ฒ˜ ์†Œ๊ฐœ
    • Spot์€ ํŒŒ์ผ ์‹œ์Šคํ…œ์˜ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์„ ๊ฐ์ง€ํ•˜๊ณ  ์ด๋ฒคํŠธ๋ฅผ ๋ฐœ์ƒ์‹œํ‚ค๋Š” Python Watchdog ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉ

Sqoop

Storm

Superset

SystemML

Tajo

Thrift

Tika

Toree

Traffic Server

UIMA

WEEX

  • WEEX A framework for building Mobile cross-platform UIs

Zookeeper