New feature: when there exists subdirectories within the input directory, where each subdirectory contains subject data, these subdirectories are processed in parallel.
Fixed: output handling performance improved, stderr is handled in realtime (instead of buffered then spit out at the end)