This project contains examples of the right way to test Cascalog workflows using midje-cascalog. The tests and code mirror the discussion at this blog post. All tests can be found here.
I've been working on a Cascalog testing suite these past few weeks, an extension to Brian Marick's Midje, that eases much of the pain of testing MapReduce workflows. I think a lot of the dull work we see in the Hadoop community is a direct result of fear. Without proper tests, Hadoop developers can't help but be scared of making changes to production code. When creativity might bring down a workflow, it's easiest to get it working once and leave it alone.
The antidote to all of this fear is a functional testing suite. As I discussed in Getting Creative with MapReduce, Hadoop workflows are difficult to test at all; testing application logic in isolation of data storage is impossible.
Cascalog is free of this weakness. midje-cascalog allows you to test Cascalog queries as pure functions, both in isolation and as components of more complicated workflows. the resulting tests are truly beautiful.