Investigate content extraction, normalization, canonical reproducibility, metadata, embedded files/attachments, annotations, etc.