Merge pull request #39 from ITM-Kitware/dev/api-updates-for-metrics-eval

Dev/api updates for metrics eval
ITM-Kitware · Mar 12, 2024 · b4b5431 · b4b5431
2 parents 74aae20 + a16cb4f
commit b4b5431
Show file tree

Hide file tree

Showing 52 changed files with 1,735 additions and 380 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,10 +3,33 @@
 This changelog follows the specifications detailed in: [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html), although we have not yet reached a `1.0.0` release.
 
-## Unreleased
+## 0.3.0
 
 ### Added
+
+* Added new driver script for TA3 interactions that uses a new YAML config format for ADMs
+* Added several ADM config files for new driver script
+* Added a new ADM HybridKaleidoADM which defers to a Llama2SingleKDMAADM instance to fill out action parameters
+* Added new abstract class for action based ADMs (called ActionBasedADM), requires a `choose_action` method
+* Implemented ActionBasedADM `choose_action` method on the KaleidoADM, Llama2SingleKDMAADM, and a new ADM HybridKaleidoADM
 * Added alignment accuracy metric in self-evaluation framework
+* Added re-usable methods for filling out action parameters to Llama2SingleKDMAADM
+* Added short KDMA descriptions for moral deservingness and maximization for Kaleido
+* Added new prompt template for selecting the target character of an action
+* Added high and low alignment system prompts for SoarTech's maximization KDMA
+
+### Changed
+
+* Replaced instances of "casualties" with "characters" as per the new new TA3 scenario data format
+* Changed TA3 interface component over to using TA3 client module (rather than raw HTTP requests)
+* Moved the previous `run_align_system.py` script to `run_simplified_align_system.py`, replacing it with the new primary CLI script
+* Updated README with respect to new CLI script
+* Changed some prompts to not display vitals with a value of None
+
+### Fixed
+
+* Fixed issue with logging of choice scores after multiple-sampling with voting
+* Fixed issue where per-sample LLM outputs weren't being logged correctly
 
 ## 0.2.6
 
@@ -24,6 +47,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
 
 * Fixed issue with configurable KDMA Estimator and Distance functions for Kaleido ADM
 
+### Changed
+
+* Better error message on TA3 API action taken failure
+
 
 ## Version 0.2.5
 

diff --git a/README.md b/README.md
@@ -20,8 +20,6 @@ Repository](https://github.com/NextCenturyCorporation/itm-evaluation-server).
 
 There's a corresponding client module: [TA3 Evaluation Client](https://github.com/NextCenturyCorporation/itm-evaluation-client)
 
-Note that this client module isn't a required dependency for the ALIGN system code.
-
 #### Soartech's TA1 API
 
 Soartech's TA1 service code can be found at: [Soartech's TA1
@@ -43,13 +41,14 @@ install git+https://github.com/ITM-Kitware/align-system.git`.
 ## Running the system against the TA3 action-based API
 
 ```
-$ run_action_based_align_system --help
-usage: run_action_based_align_system [-h] {TA3ActionBased} ...
+$ run_align_system --help
+usage: run_align_system [-h] {TA3ActionBased} ...
 
-ALIGN Action Based System CLI
+ALIGN System CLI
 
 positional arguments:
-  {TA3ActionBased}  Select interface. Adding --help after interface selection will print interface and system specified arguments
+  {TA3ActionBased}  Select interface. Adding --help after interface selection will print interface and
+                    system specified arguments
     TA3ActionBased  Interface with CACI's TA3 web-based service
 
 options:
@@ -59,9 +58,13 @@ options:
 Running `--help` after the selected interface prints the full set of options for the interface and system.  E.g.:
 
 ```
-$ run_action_based_align_system TA3ActionBased --help
-usage: run_action_based_align_system TA3ActionBased [-h] [-u USERNAME] [-s SESSION_TYPE] [-e API_ENDPOINT] [--training-session] [-m MODEL] [-t] [-a ALGORITHM] [-A ALGORITHM_KWARGS]
-                                                    [--similarity-measure SIMILARITY_MEASURE]
+$ run_align_system TA3ActionBased --help
+usage: run_align_system TA3ActionBased [-h] [-u USERNAME] [-s SESSION_TYPE]
+                                       [-e API_ENDPOINT] [--training-session]
+                                       [--scenario-id SCENARIO_ID] -c ADM_CONFIG [-t]
+                                       [-l LOGLEVEL] [--logfile-path LOGFILE_PATH]
+                                       [--save-input-output-to-path SAVE_INPUT_OUTPUT_TO_PATH]
+                                       [--save-alignment-score-to-path SAVE_ALIGNMENT_SCORE_TO_PATH]
 
 options:
   -h, --help            show this help message and exit
@@ -70,28 +73,30 @@ options:
   -s SESSION_TYPE, --session-type SESSION_TYPE
                         TA3 API Session Type (default: "eval")
   -e API_ENDPOINT, --api_endpoint API_ENDPOINT
-                        Restful API endpoint for scenarios / probes (default: "http://127.0.0.1:8080")
+                        Restful API endpoint for scenarios / probes (default:
+                        "http://127.0.0.1:8080")
   --training-session    Return training related information from API requests
-  -m MODEL, --model MODEL
-                        LLM Baseline model to use
+  --scenario-id SCENARIO_ID
+                        Specific scenario to run
+  -c ADM_CONFIG, --adm-config ADM_CONFIG
+                        Path to ADM config YAML
   -t, --align-to-target
                         Align algorithm to target KDMAs
-  -a ALGORITHM, --algorithm ALGORITHM
-                        Algorithm to use
-  -A ALGORITHM_KWARGS, --algorithm-kwargs ALGORITHM_KWARGS
-                        JSON encoded dictionary of kwargs for algorithm initialization
-  --similarity-measure SIMILARITY_MEASURE
-                        Similarity measure to use (default: 'bert')
+  -l LOGLEVEL, --loglevel LOGLEVEL
+  --logfile-path LOGFILE_PATH
+                        Also write log output to the specified file
+  --save-input-output-to-path SAVE_INPUT_OUTPUT_TO_PATH
+                        Save system inputs and outputs to a file
+  --save-alignment-score-to-path SAVE_ALIGNMENT_SCORE_TO_PATH
+                        Save alignment score output to a file
 ```
 
 Here's an example invocation of the system using the TA3 Action-based interface (assuming it's running locally on port `8080`):
 ```
 $ run_action_based_align_system TA3ActionBased \
-           -e "http://127.0.0.1:8080" \
-           --algorithm "llama_index" \
-           --model falcon \
-           -s soartech \
-           --algorithm-kwargs '{"domain_docs_dir": "/data/shared/MVPData/DomainDocumentsPDF"}'
+           --adm-config adm_configs/metrics-evaluation/single_kdma_adm_adept_baseline.yml \
+           --api_endpoint "http://127.0.0.1:8080" \
+           --session-type adept
 ```
 
 *NOTE* - The first time you run the system it can take upwards of a
@@ -102,11 +107,11 @@ model is cached.
 
 ## Running the system against TA1 services or local files
 
-In the Python environment you have set up, a CLI application called `run_align_system` should now be available.  This single entrypoint supports interfacing with both local files on disk, and the TA3 web-based API.  Running the script with `--help` shows which interfaces are available:
+In the Python environment you have set up, a CLI application called `run_simplified_align_system` should now be available.  This single entrypoint supports interfacing with both local files on disk, and the TA3 web-based API.  Running the script with `--help` shows which interfaces are available:
 
 ```
-$ run_align_system --help
-usage: run_align_system [-h] {TA1Soartech,LocalFiles,TA1Adept} ...
+$ run_simplified_align_system --help
+usage: run_simplified_align_system [-h] {TA1Soartech,LocalFiles,TA1Adept} ...
 
 ALIGN System CLI
 
@@ -124,8 +129,8 @@ options:
 Running `--help` after the selected interface prints the full set of options for the interface and system.  E.g.:
 
 ```
-$ run_align_system TA1Soartech --help
-usage: run_align_system TA1Soartech [-h] [-s [SCENARIOS ...]] [--alignment-targets [ALIGNMENT_TARGETS ...]] [-e API_ENDPOINT] [-m MODEL] [-t] [-a ALGORITHM] [-A ALGORITHM_KWARGS] [--similarity-measure SIMILARITY_MEASURE]
+$ run_simplified_align_system TA1Soartech --help
+usage: run_simplified_align_system TA1Soartech [-h] [-s [SCENARIOS ...]] [--alignment-targets [ALIGNMENT_TARGETS ...]] [-e API_ENDPOINT] [-m MODEL] [-t] [-a ALGORITHM] [-A ALGORITHM_KWARGS] [--similarity-measure SIMILARITY_MEASURE]
 
 options:
   -h, --help            show this help message and exit
@@ -153,7 +158,7 @@ options:
 We've included some example scenario, probe, and alignment target data for testing.  These files can be found in the `example_data` directory.  Here's an example system invocation with the provided example files:
 
 ```
-run_align_system LocalFiles \
+run_simplified_align_system LocalFiles \
     -s example_data/scenario_1/scenario.json \
     --alignment-target-filepath example_data/scenario_1/alignment_target.json \
     -p example_data/scenario_1/probe{1,2,3,4}.json \
@@ -163,56 +168,64 @@ run_align_system LocalFiles \
     --align-to-target
 ```
 
-## ADM Invocations
+## Metrics Evaluation ADM Invocations
 
-### Simple Action-based Baseline ADM
+### Aligned ADM for ADEPT scenarios
 
-Simple baseline (unaligned) system using the `falcon` model:
 ```
 run_action_based_align_system TA3ActionBased \
-           --algorithm "llama_index" \
-           --model falcon \
-           -s soartech \
-           --algorithm-kwargs '{"retrieval_enabled": false}' \
-           --algorithm "llama_index" \
-           --model falcon
+           --adm-config adm_configs/metrics-evaluation/delivered/single_kdma_adm_adept.yml \
+           --username single_kdma_aligned_adm_adept \
+           --align-to-target \
+           --session-type adept
 ```
 
-### Simple Action-based Aligned ADM
+### Aligned Hybrid Kaleido ADM for ADEPT scenarios
 
-Simple aligned system using the `falcon` model (requires domain document PDFs):
 ```
 run_action_based_align_system TA3ActionBased \
-           --algorithm "llama_index" \
-           --model falcon \
-           -s soartech \
-           --algorithm-kwargs '{"domain_docs_dir": "/path/to/DomainDocumentsPDF"}' \
-           --algorithm-kwargs '{"retrieval_enabled": false}' \
-           --algorithm "llama_index" \
-           --model falcon \
-           --align-to-target
+           --adm-config adm_configs/metrics-evaluation/delivered/hybrid_kaleido.yml \
+           --username hybrid_kaleido_aligned_adm_adept \
+           --align-to-target \
+           --session-type adept
 ```
 
-### Action-based Chat Baseline ADM
+### Baseline ADM for ADEPT scenarios
+
+```
+run_action_based_align_system TA3ActionBased \
+           --adm-config adm_configs/metrics-evaluation/delivered/single_kdma_adm_baseline.yml \
+           --username single_kdma_baseline_adm_adept \
+           --session-type adept
+```
 
-Unaligned system using a Llama 2 chat model:
+### Aligned ADM for SoarTech scenarios
 
 ```
-run_action_based_chat_baseline TA3ActionBased \
-           -s adept \
-           --model meta-llama/Llama-2-13b-chat-hf
+run_action_based_align_system TA3ActionBased \
+           --adm-config adm_configs/metrics-evaluation/delivered/single_kdma_adm_soartech.yml \
+           --username single_kdma_aligned_adm_soartech \
+           --align-to-target \
+           --session-type soartech
 ```
 
-### Action-based Chat Aligned ADM
+### Aligned Hybrid Kaleido ADM for SoarTech scenarios
 
-Aligned system using a Llama 2 chat model:
+```
+run_action_based_align_system TA3ActionBased \
+           --adm-config adm_configs/metrics-evaluation/delivered/hybrid_kaleido.yml \
+           --username hybrid_kaleido_aligned_adm_soartech \
+           --align-to-target \
+           --session-type soartech
+```
+
+### Baseline ADM for SoarTech scenarios
 
 ```
-run_action_based_chat_baseline TA3ActionBased \
-           -s adept \
-           --model meta-llama/Llama-2-13b-chat-hf \
-           --precision half \
-           --align-to-target
+run_action_based_align_system TA3ActionBased \
+           --adm-config adm_configs/metrics-evaluation/delivered/single_kdma_adm_baseline.yml \
+           --username single_kdma_baseline_adm_soartech \
+           --session-type soartech
 ```
 
 

diff --git a/adm_configs/kaleido_config.yml b/adm_configs/kaleido_config.yml
@@ -0,0 +1,9 @@
+adm:
+  name: 'KaleidoADM'
+  init_kwargs:
+    model_name: 'allenai/kaleido-large'
+    use_tqdm: False
+
+  inference_kwargs:
+    distance_fn: 'RelevanceWeightedDistance'
+    kdma_descriptions_map: 'align_system/algorithms/lib/templates/kdma_descriptions_short_metrics_eval.yml'
diff --git a/adm_configs/metrics-evaluation/delivered/hybrid_kaleido.yml b/adm_configs/metrics-evaluation/delivered/hybrid_kaleido.yml
@@ -0,0 +1,17 @@
+adm:
+  name: 'HybridKaleidoADM'
+  init_kwargs:
+    kaleido_init_kwargs:
+      model_name: 'allenai/kaleido-large'
+      use_tqdm: False
+
+    llm_init_kwargs:
+      hf_model: 'meta-llama/Llama-2-7b-chat-hf'
+      precision: 'half'
+
+  inference_kwargs:
+    # Kaleido kwargs
+    distance_fn: 'RelevanceWeightedDistance'
+    kdma_descriptions_map: 'align_system/algorithms/lib/templates/kdma_descriptions_short_metrics_eval.yml'
+    # LLM kwargs
+    answer_attempts: 5
diff --git a/adm_configs/metrics-evaluation/delivered/single_kdma_adm_adept.yml b/adm_configs/metrics-evaluation/delivered/single_kdma_adm_adept.yml
@@ -0,0 +1,12 @@
+adm:
+  name: 'SingleKDMAADM'
+  init_kwargs:
+    hf_model: meta-llama/Llama-2-13b-chat-hf
+    precision: half
+    temperature: 0.7
+
+  inference_kwargs:
+    baseline: false
+    n_negative_samples: 5
+    n_positive_samples: 5
+    shuffle: true
diff --git a/adm_configs/metrics-evaluation/delivered/single_kdma_adm_baseline.yml b/adm_configs/metrics-evaluation/delivered/single_kdma_adm_baseline.yml
@@ -0,0 +1,12 @@
+adm:
+  name: 'SingleKDMAADM'
+  init_kwargs:
+    hf_model: meta-llama/Llama-2-13b-chat-hf
+    precision: half
+    temperature: 0.7
+
+  inference_kwargs:
+    baseline: true
+    n_negative_samples: 0
+    n_positive_samples: 5
+    shuffle: true
diff --git a/adm_configs/metrics-evaluation/delivered/single_kdma_adm_soartech.yml b/adm_configs/metrics-evaluation/delivered/single_kdma_adm_soartech.yml
@@ -0,0 +1,12 @@
+adm:
+  name: 'SingleKDMAADM'
+  init_kwargs:
+    hf_model: meta-llama/Llama-2-13b-chat-hf
+    precision: half
+    temperature: 0.7
+
+  inference_kwargs:
+    baseline: false
+    n_negative_samples: 0
+    n_positive_samples: 5
+    shuffle: true
diff --git a/adm_configs/metrics-evaluation/hybrid_kaleido_adept_high.yml b/adm_configs/metrics-evaluation/hybrid_kaleido_adept_high.yml
@@ -0,0 +1,22 @@
+adm:
+  name: 'HybridKaleidoADM'
+  init_kwargs:
+    kaleido_init_kwargs:
+      model_name: 'allenai/kaleido-large'
+      use_tqdm: False
+
+    llm_init_kwargs:
+      hf_model: 'meta-llama/Llama-2-7b-chat-hf'
+      precision: 'half'
+
+  inference_kwargs:
+    # Kaleido kwargs
+    distance_fn: 'RelevanceWeightedDistance'
+    kdma_descriptions_map: 'align_system/algorithms/lib/templates/kdma_descriptions_short_metrics_eval.yml'
+    # LLM kwargs
+    answer_attempts: 5
+
+alignment_target_override:
+  id: ADEPT-metrics_eval-alignment-target-train-HIGH
+  kdma_values:
+    - {kdma: MoralDesert, value: 1}
diff --git a/adm_configs/metrics-evaluation/hybrid_kaleido_adept_low.yml b/adm_configs/metrics-evaluation/hybrid_kaleido_adept_low.yml
@@ -0,0 +1,22 @@
+adm:
+  name: 'HybridKaleidoADM'
+  init_kwargs:
+    kaleido_init_kwargs:
+      model_name: 'allenai/kaleido-large'
+      use_tqdm: False
+
+    llm_init_kwargs:
+      hf_model: 'meta-llama/Llama-2-7b-chat-hf'
+      precision: 'half'
+
+  inference_kwargs:
+    # Kaleido kwargs
+    distance_fn: 'RelevanceWeightedDistance'
+    kdma_descriptions_map: 'align_system/algorithms/lib/templates/kdma_descriptions_short_metrics_eval.yml'
+    # LLM kwargs
+    answer_attempts: 5
+
+alignment_target_override:
+  id: ADEPT-metrics_eval-alignment-target-train-LOW
+  kdma_values:
+    - {kdma: MoralDesert, value: 0}