You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+20-3Lines changed: 20 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,6 @@ Firstly install the Python libraries specified in `requirements.txt`.
15
15
To run:
16
16
`python main.py --config config/config.yml`
17
17
18
-
19
18
# Configuration
20
19
The ingestion process properties are configured in `config.yml`.
21
20
@@ -31,18 +30,36 @@ Entries under `sink` key specify the ElasticSearch sink.
31
30
32
31
#### Security and SSL certificates
33
32
This only applies when ElasticSearch cluster is using X-Pack / Open Distro and requires secure connections with using SSL certificates. Under the key `source` and/or `sink` can be optionally specified `security` entry that will contain necessary configuration -- these are:
33
+
' `ca-file-path` -- path to CA certificate file (PEM) used solely with SSL, (DO NOT USE THIS along with the other ca/client parameters mentioned underneath, use it solely)
34
34
-`ca-certs-path` -- the path to CA certificates file (PEM),
35
35
-`client-cert-path` -- the path to client certificate file (PEM),
36
36
-`client-key-path` -- the path to client key (PEM).
37
37
38
-
### Extra ES connection options
38
+
### Extra ES sink/source connection options
39
39
Entires under the key `extra_params` are optional, useful for test cases or deployments where we only use internal resources:
40
40
-`use_ssl` -- use ssl connection
41
41
-`verify_certs` -- verify SSL certificates
42
42
43
+
-`credentials`
44
+
-`username` and `password` can be used to provide connection credentials
45
+
-`use-api-key` if this is enabled the username and password fields will be used as api_id and api_key
46
+
47
+
### NLP service
48
+
-`endpoint-url`
49
+
-`endpoint-request-mode` , this is either left empty, or in case of use with the GATE NLP Annie annotation service it should be set to `gate-nlp`
50
+
-`use-bulk-indexing` ingest in bulk mode (1000 docs / bulk chunk),
51
+
52
+
-`credentials`
53
+
-`username` and `password` can be used to provide connection credentials
54
+
43
55
### Fields mapping
44
56
Entries under `mapping` key define the mapping of the document fields for the ingestion.
45
57
58
+
The sub-entry `index-ingest-mode` defines:
59
+
-`same-index`: `False` , set to `True` if you wish to ingest annotations into the same index
60
+
-`use-nested-objects` : 'False` , set to True if you wish to ingest into the same index with nested object type, useful but beware of search query speed impact
61
+
-`es-nested-object-schema-mapping` : for medcat annotations use `medcat-separate-index` or `medcat-nested-object` , for GATE-nlp use `gate-nlp-separate-index` or `gate-nlp-nested-object`
62
+
46
63
The sub-entry `source` defines the field names that contain:
47
64
-`text-field` - the free text to be processed,
48
65
-`docid-field` - the unique identifier of the document,
@@ -57,7 +74,7 @@ The sub-entry `batch` defines the possible portion of documents to be processed
57
74
-`threads` - the number of processing threads to speed up the ingestion.
58
75
59
76
The sub-entry `sink` specifies additional options during sending the processed annotations:
60
-
-`split-index-by-field` - the name of the field in the returned annotations the value of which will be used as a prefix for the index name (e.g., used to send annotations of different types to separate indices).
77
+
-`split-index-by-field` - the name of the field in the returned annotations the value of which will be used as a prefix for the index name (e.g., used to send annotations of different types to separate indices). If you don't want this functionality simply leave the field empty, otherwise , to split by annotation type use `type`
61
78
62
79
The sub-entry `nlp` specifies additional options during processing the documents with NLP:
63
80
-`skip-processed-doc-check` - whether to skip checking for already processed documents in ElasticSearch,
0 commit comments