You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you intend to provide development support, install the development dependencies:
38
37
39
38
```sh
40
-
pipenv lock --dev
41
-
pipenv sync
39
+
pipenv lock --dev && pipenv sync
42
40
```
43
41
44
42
### Setting up Neo4j
@@ -49,14 +47,14 @@ First, follow the [desktop setup instructions](https://neo4j.com/developer/neo4j
49
47
50
48
Once you have opened Neo4j desktop, use the "New" button in the upper-left region of the window to create a new project. Within that project, click the "Add" button in the upper-right region of the window and select "Local DBMS". The name of the DBMS doesn't matter, but the password will be used later to connect the database to MetaKB (we have been using "admin" by default). Click "Create". Then, click the row within the project screen corresponding to your newly-created DBMS, and click the green "Start" button to start the database service.
51
49
52
-
The graph will initially be empty, but once you have successfully loaded data, Neo4j Desktop provides an interface for exploring and visualizing relationships within the graph. To access it, click the blue "Open" button. The prompt at the top of this window processes [Cypher queries](https://neo4j.com/docs/cypher-refcard/current/); to start, try `MATCH (n:Statement {id:"civic.eid:5818"}) RETURN n`. Buttons on the left-hand edge of the results pane let you select graph, tabular, or textual output.
50
+
The graph will initially be empty, but once you have successfully loaded data, Neo4j Desktop provides an interface for exploring and visualizing relationships within the graph. To access it, click the blue "Open" button. The prompt at the top of this window processes [Cypher queries](https://neo4j.com/docs/cypher-refcard/current/); to start, try `MATCH (n:Statement {id:"civic.eid:1409"}) RETURN n`. Buttons on the left-hand edge of the results pane let you select graph, tabular, or textual output.
53
51
54
52
55
53
### Setting up normalizers
56
54
57
55
The MetaKB calls a number of normalizer libraries to transform resource data and resolve incoming search queries. These will be installed as part of the package requirements, but require additional setup.
58
56
59
-
First, [download and install Amazon's DynamoDB](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.DownloadingAndRunning.html). Once installed, in a separate terminal instance, navigate to its source directory and run the following to start the database instance:
57
+
First, [follow these instructions for deploying DynamoDB locally on your computer](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.DownloadingAndRunning.html). Once setup, in a separate terminal instance, navigate to its source directory and run the following to start the database instance:
Next, navigate to the `site-packages` directory of your virtual environment. Assuming Pipenv is installed to your user directory, this should be something like:
cd~/.local/share/virtualenvs/metakb-<various characters>/lib/python<python-version>/site-packages/ # replace <various characters> and <python-version>
69
67
```
70
68
71
-
Next, initialize the [Variation Normalizer](https://github.com/cancervariants/variation-normalization) by following the instructions in the [README](https://github.com/cancervariants/variation-normalization#installation).
69
+
Next, initialize the [Variation Normalizer](https://github.com/cancervariants/variation-normalization) by following the instructions in the [README](https://github.com/cancervariants/variation-normalization#installation). When setting up the UTA database, [these](https://github.com/ga4gh/vrs-python/tree/main/docs/setup_help) docs may be helpful.
72
70
73
71
74
72
The MetaKB can acquire all other needed normalizer data, except for that of [OMIM](https://www.omim.org/downloads), which must be manually placed:
@@ -79,9 +77,46 @@ mkdir -p data/omim
79
77
cp ~/YOUR/PATH/TO/mimTitles.txt data/omim/omim_<date>.tsv # replace <date> with date of data acquisition formatted as YYYYMMDD
80
78
```
81
79
80
+
### Environment Variables
81
+
82
+
MetaKB relies on environment variables to set in order to work.
83
+
84
+
* Always Required:
85
+
*`UTA_DB_URL`
86
+
* Used in Variation Normalizer which relies on UTA Tools
* Required when using the `--load_normalizers_db` or `--force_load_normalizers_db` arguments in CLI commands
97
+
*`RXNORM_API_KEY`
98
+
* Used in Therapy Normalizer to retrieve RxNorm data
99
+
* RxNorm requires a UMLS license, which you can register forone [here](https://www.nlm.nih.gov/research/umls/index.html). You must set the `RxNORM_API_KEY` environment variable to your API key. This can be foundin the [UTS 'My Profile' area](https://uts.nlm.nih.gov/uts/profile) after singing in.
100
+
101
+
Example:
102
+
103
+
```shell script
104
+
export RXNORM_API_KEY={rxnorm_api_key}
105
+
```
106
+
107
+
*`DATAVERSE_API_KEY`
108
+
* Used in Therapy Normalizer to retrieve HemOnc data
109
+
* HemOnc.org data requires a Harvard Dataverse API key. After creating a user account on the Harvard Dataverse website, you can follow [these instructions](https://guides.dataverse.org/en/latest/user/account.html) to generate a key. You will create or login to your account at [this](https://dataverse.harvard.edu/) site. You must set the `DATAVERSE_API_KEY` environment variable to your API key.
110
+
111
+
Example:
112
+
113
+
```shell script
114
+
export DATAVERSE_API_KEY={dataverse_api_key}
115
+
```
116
+
82
117
### Loading data
83
118
84
-
Once Neo4j and DynamoDB instances are both active, and necessary normalizer data has been placed, run the MetaKB CLI with the `--initialize_normalizers` flag to acquire all other necessary normalizer source data, and execute harvest, transform, and load operations into the graph datastore.
119
+
Once Neo4j and DynamoDB instances are both running, and necessary normalizer data has been placed, run the MetaKB CLI with the `--initialize_normalizers` flag to acquire all other necessary normalizer source data, and execute harvest, transform, and load operations into the graph datastore.
* URL endpoint for the application Neo4j database. Can also be provided via environment variable `METAKB_DB_URL`.
7
+
8
+
*`--db_username`
9
+
* Username to provide to application Neo4j database. Can also be provided via environment variable `METAKB_DB_USERNAME`.
10
+
11
+
*`--db_password`
12
+
* Password to provide to application Neo4j database. Can also be provided via environment variable `METAKB_DB_PASSWORD`.
13
+
14
+
*`--load_normalizers_db`
15
+
* Check normalizers' (therapy, disease, and gene) DynamoDB database and load data if source data is not present.
16
+
17
+
*`--force_load_normalizers_db`
18
+
* Load all normalizers' (therapy, disease, and gene) data into DynamoDB database. Overrides `--load_normalizers_db` if both are selected.
19
+
20
+
*`--normalizers_db_url`
21
+
* URL endpoint of normalizers' (therapy, disease, and gene) DynamoDB database. Set to `http://localhost:8000` by default.
22
+
23
+
*`--load_latest_cdms`
24
+
* Deletes all nodes from the MetaKB Neo4j database and loads it with the latest source transformed CDM files stored locally in the `metakb/data` directory. This bypasses having to run the source harvest and transform steps. Exclusive with `--load_target_cdm` and `--load_latest_s3_cdms`.
25
+
26
+
*`--load_target_cdm`
27
+
* Load a source's transformed CDM file at specified path. This bypasses having to run the source harvest and transform steps. Exclusive with `--load_latest_cdms` and `--load_latest_s3_cdms`.
28
+
29
+
*`--load_latest_s3_cdms`
30
+
* Deletes all nodes from the MetaKB Neo4j database, retrieves latest source transformed CDM files from public s3 bucket, and loads the Neo4j database with the retrieved data. This bypasses having to run the source harvest and transform steps Exclusive with `--load_latest_cdms` and `--load_target_cdms`.
0 commit comments