From 9f1d31d771f1dc4be3f8b93e9cc2ac7a0482b431 Mon Sep 17 00:00:00 2001 From: Boris Doubrov Date: Tue, 10 Oct 2023 13:46:37 +0300 Subject: [PATCH] Update feature extraction in CLI --- cli/feature-extraction/index.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/cli/feature-extraction/index.md b/cli/feature-extraction/index.md index ce89078..3451366 100644 --- a/cli/feature-extraction/index.md +++ b/cli/feature-extraction/index.md @@ -48,7 +48,7 @@ demonstration clear we suggest you edit your [config/features.xml](../config#fea ensuring that only the information dictionary fields are extracted. Then issue the following command: -verapdf --off --extract adobe_supplement_iso32000.pdf +verapdf --off --config adobe_supplement_iso32000.pdf you should see the following output: @@ -86,6 +86,21 @@ you should see the following output: ``` +Alternatively, the features to be extracted from the PDF document can be explicitly specified in CLI option `--extract` or `-x` as follows: + +verapdf --off --extract informationDict adobe_supplement_iso32000.pdf + +Multiple features can be specified via comma-separated list: + +verapdf --off --extract informationDict,metadata adobe_supplement_iso32000.pdf + +The complete set of features for the option `--extract` is: + +actions, annotations, colorSpace, ds, embeddedFile, exGSt, font, formXobject, iccProfile, imageXobject, informationDict, interactiveFormField, lowLevelInfo, metadata, outlines, outputIntent, page, pattern, postscriptXobject, properties, shading, signature + +It matches the features specified in the `features.xml` configuration file: https://docs.verapdf.org/cli/config/#configuring-feature-extraction + + ### XMP Metadata We'll use the same [adobe supplement file](https://web.archive.org/web/20200621050243/https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/adobe_supplement_iso32000.pdf) to demonstrate the extraction of XMP metadata. First you'll need to use a text editor to change the contents of your