Python script to build consecutive document streams from a collection of pdf documents.
python document_stream_builder.py ||
--input <Input Dir> ||
--output <Output Dir> ||
--random <True/False> ||
--limit <Number>
- input: Input directory (Default: "./input/")
- output: Output directory (Default: "./output/")
- random: Random document order in page stream (Default: True)
- limit: limit the amount of processed input documents