Given a list of terms and a set of UMLS files, the CTB generates a subset the of UMLS containing the supplied terms and their word-based variants.
The following files should be placed in the data/input directory:
- MRCONSO.RRF concepts file
- MRSTY.RRF concept -> semantic types file
Supplied to Web Interface
- list of supplied terms
- Custom version of mrconso.rrf
- Custom version of mrsty.rrf
To use CTB you must first create indexes of your UMLS files and then start the tool.
Copy MRCONSO.RRF, MRSTY.RRF to ctb/data/input/your data set name/.
In the ctb directory run:
bin/prepumls.sh 'your data set name'
For example:
bin/prepumls.sh 2016AA
Note: When using the GITHUB release, the name and path the standalone jar will vary based on version in the project.clj file and the version of Leiningen used, the CLASSPATH variable in the script bin/prepumls.sh must be modified to match the current location of the standalone jar (or uberjar).
There should be a file called ctb.properties in the config
directory. In ctb.properties change:
ctb.ivf.dataroot: ...
to:
ctb.ivf.dataroot: data/ivf/<your data set name>
If you want to use the Lexical Tools Lexical Variant Generator (LVG) to supply term combinations not found in the UMLS then download LVG from the Lexical Systems Group website (https://lsg3.nlm.nih.gov/LexSysGroup/Projects/lvg/current/web/index.html) and install it according to its directions. After installing the Lexical Tools then add the following to the ctb.properties file:
ctb.lvg.directory: {LVGDIR}
Where LVGDIR is the location of your LVG installation.
If you are using the GITHUB release of CTB then you will need the a directory for the output.
mkdir -p resources/public/output
In the top-level ctb directory run:
java -jar target/ctb-0.1.3-SNAPSHOT-standalone.jar [port]
Note: When using the GITHUB release, the name and path the standalone jar will vary based on version in the project.clj file and the version of Leiningen used.
or if you have Leiningen:
lein ring server [port]
Then point your web browser to localhost:3000 (or if you supplied a port number, that port number.)
Paste your term list into the "Input Terms" (first) page and press "Submit".
Select or de-select terms in Synonym Set View to filter the synonyms generated by the tool and press "Submit".
The generated dataset will be placed in the directory resources/public/output/user//.
The directory should contain the following files:
filtered-synset
filtered-termlist.edn
mrconso.rrf
mrsty.rrf
params
synonyms.checksum
termlist
You will need both Leiningen and Maven to be installed.
Irutils 2.1 inverted file library is necessary to use the latest version of CTB. In separate directory clone, compile and install irutils version 2.1 into your local maven (and leiningen) repository:
$ git clone https://github.com/willjrogers/irutils.git
$ cd irutils/java
$ git branch rel2.1 rel-2.1
$ git checkout rel2.1
$ mkdir -p src/main
$ (cd src/main && ln -s ../../sources java)
$ mvn install
Goto The "ctb" directory and compile and package CTB:
$ cd ctb
$ lein uberjar
If the uberjar builds successfully, the steps in the usage section above should work normally.
If you have tomcat you can use the file target/ctb-0.1.0-SNAPSHOT-standalone.war to deploy the system to tomcat.
The application now expects the config directory containing
ctb.properties and the data directory containing the indexes to be in
sub-directory war-resources before deployment using the command: lein ring uberwar
.
Note: CTB has not been extensively tested in Tomcat and may require modification to work properly.
CTB is product of the U.S. Government and is not subject to copyright.
For more information see: http://www.usa.gov/government-works