Design overview

Minimum Kafka system test setup

a little set of scripts and tools in kafka repo
basic reporting and log gathering
a few mvp tests implemented
documentation on the wiki
concrete outlines of other tests we plan to implement

One barrier - vagrant is very useful, but also slow and brittle in the sense that it’s easy for vagrant tools to lose track of state when various errors occur - just slows down iteration

options to help with this - packer to reduce initial provision time? docker cluster?

Ducktape

Overall goal here is to provide a fairly lightweight set of tools to make writing and running Kafka system tests (and Confluent system tests) reasonably convenient.

Command-line interface

Command	Effect
`ducktape --version`	display current version
`ducktape <test_location>`	Discover and run tests. `<test_location>` can be a directory, a python module, a test class, or a method in a class
`ducktape ./muckrake/tests`	discover and run all tests in muckrake/tests
`ducktape muckrake/tests/everything_runs_test.py`	run all tests in everything_runs_test.py
`ducktape muckrake/tests/everything_runs_test.py::TestClass`	run tests in `TestClass`
`ducktape muckrake/tests/everything_runs_test.py::TestClass.test_method`	run `test_method`
`ducktape muckrake/tests/everything_runs_test.py::TestClass.test_method --count <N>`	run test_method N times. Should be able to parallelize this and run a bunch of the same test simultaneously

Output

ducktape outputs results to a top level directory <test_id>_results, and aliases the directory latest_results to this directory

<session_id>
    summary - top level report
    html_report - html view of top level report
    <test_class>/<test_method> 
        log.info - info level logs from test driver
        log.debug
        service_logs
            <service_name>/<instance_id>
                <various logs from this service>
    ...

Signal catching

Ctrl-c should be caught and handled gracefully - end tests when possible, do cleanup etc

Questions:

What do performance stats look like in this output scheme?

Can a service be expanded dynamically? Using logical designator for a node in a service is tricky if nodes can be added and removed

When do we grab full logs and when not?

What about gathering cpu, heapsize etc. on machines? naarad provides a way to snapshot this

First milestone - when is the code self testable? ;)

Developer stuff/design

Concepts

Broadly there are two tasks - cluster management and test running. Test discovery and test running will be modeled roughly off of junit or junit style python testing frameworks (think unittest, nose, pytest)

Test running

TestId - each run of ducktape is assigned a unique id which is used in part as a way to group test results together
TestSuite or TestGroup - a logical grouping of tests - this concept isn’t formalized as a class, but tests that are grouped together in the same class will be grouped together in reports
TestCase or SingleTest - this is a concept, not a class - refers to a ‘single test thing’ (need a better word for smallest atomic test unit)
TestLoader - logic for test discovery
TestRunner - runs discovered tests. First pass will be not much more than a for loop
TestResult - shared results object that will store aggregate test results in memory

Cluster and service management

Cluster
Service

Various requirements

Test run control - What do we need out of test running?

run one, run a subset of tests, discover tests
easy automation
parallelization across many machines - allocate machines to tests when available
parametrizable tests

Test discovery

what is a test?

a test is a test method in a test class
a test class is a leafy subclass of ‘Test’ in a module that looks like a test. Test classes provide a logical grouping for tests.
a test method is a method in a test class that looks like a test
a test class can have multiple test methods - these will be grouped together

Parametrize test

Probably with decorators? Want an easy way to run a given test multiple times with different configurations (from files, dynamically generated etc)

Collection/summary

summary/reporting, performance stats
machine stats during run (naarad style - cpu usage, heap, etc)
log collection
regression checks, particularly for performance tests

Cluster management

kill, bounce, individual nodes
multiple service instances (e.g. two kafka clusters)
individual nodes in the same logical service can have different binary versions can change

services

Service configuration

Basic templates are easy: http://jinja.pocoo.org/docs/dev/intro/#basic-api-usage, if you use something like template.render(self.dict) from within a test you can just use any member fields on the Test when rendering the template.
We could use their Loader infrastructure to standardize test asset layouts http://jinja.pocoo.org/docs/dev/api/#loaders Can probably support pulling from multiple sources so, e.g., templates can be located with services, not just tests, which further encourages reuse without having to embed templates directly into tests/services Jinja2 is a very commonly used templating engine within the Python community and is more than flexible enough for our needs. We could consider other options, but I think this is a pretty minor decision

Longer term

chaos-monkey-like functionality
network control (simulate partitions, latency, dropped packets)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly