-
Notifications
You must be signed in to change notification settings - Fork 57
dataflow processing primitives
Sources have a continuous
(?) flag: keep reading on end of stream, or finish? (or, maybe they always are reading, and the EOS is an event that might or might not trigger a shutdown)
- stdin / stdout / stderr
- file (filesystem, s3, hdfs) - tail/read; write
- source: poll for file pattern
- sink: roll output filename
- hanuman log (hierarchical)
- database
- source: must specify query
- sink: must specify [ key / identifying fields ; update|create ; payload fields ]
- http request
- http stream
- socket / rpc
- jabber / amqp / twitter
- syslog / syslog-ng
- exec file
-
load, source
-
store, sink
- dump is a console sink
-
push
-
pull
- query
-
buffer
-
throttle
-
split
- 1-to-all: each record goes to all outputs
- 1-to-n: each record goes to exactly one output, based on an ordered set of filters
-
union
- in a stream context, produces a merged stream containing all incoming records with no guarantees of order or locality.
-
process -- pass a set of inputs through a flow, receiving at a set of outputs. Note that this can be an external process, another flow, whatever
-
foreach
-
flatten
-
subgroup, subsort
-
decorate (hash or fragment-replicate join) -- hang data from a hash on each record
-
delay
-
-
filter
- sample
-
limit, window
-
partition
-
sort
- distinct
-
group, cogroup --
-
join is sugar for
cogroup
-and-flatten
- cross
-
join is sugar for
- counters
- locks
- registers
- barrier
- announce/discover
- depend/invoke
- schedule
- idempotency control
- stop/start/sleep
- trigger/close
- hooks
- can hook in post-hoc, inline or teeing
- mini-mode
- monitoring/logging
- dark launch
- middleware
- can hook in post-hoc, inline or teeing
-
simulate, explain, describe
- audit
-
register sign something up to be a block
- compose -- register a composite graph as a component
-
locality declarative advice -- influence where your data lives. See mesos' resource offers mechanism
-
bind ties machine/resource groups to flows
-
priority prioritized execution
-
guarantees
- end-to-end -- ensures acknowledgement by destination
- next-hop -- guarantees successful handoff to next stage, but nothing more
- best effort -- fire and forget
- invoke uses the same rules as normal Rake execution: If the task has run already, it won’t run again.
- execute runs the task no matter what.
-
directory
-
file
-
link
-
Thor defines:
say, ask, yes?, no?, add_file, remove_file, copy_file, template, directory, inside, run, inject_into_file
.- we instead treat eg. a file as a model, and perform actions on it (create, delete, etc.)
-
rule
In rake, you can say
rule ".html" => ".markdown" do |t| sh "markdown #{t.source} > #{t.name}" end
Only runs if the target is missing or the source is newer.
- remote_file -- Fetch remote file
-
template -- Generate file from an
.erb
template
-
Execute command
-
script
- execute runs a command
- script runs its
code
block
-
schedule task
-
spawn graph to run in parallel with executor
- supervised?
- run as sub-process (so, dies when parent does) or independently (keeps running)
- Broadcast
- Listen
- Notify/ping
- Request/Response
-
Stream:
- unix pipes
- named pipes
- socket
- http streaming
- websockets
- syslogNG
- Flume
- file read/tail/append/write
-
Dispatch:
- UDP broadcast
- jabber
- http post/put
-
Pull:
- http get
- db query
-
Listen:
- http listen
- poll remote resource
executes a job only once.
class InvokeExample < Wukong::Task::Base
desc "one", "Prints 1 2 3"
def one
puts 1
invoke :two
invoke :three
end
desc "two", "Prints 2 3"
def two
puts 2
invoke :three
end
desc "three", "Prints 3"
def three
puts 3
end
end