Skip to content

Commit

Permalink
Merge pull request #39 from ITM-Kitware/dev/api-updates-for-metrics-eval
Browse files Browse the repository at this point in the history
Dev/api updates for metrics eval
  • Loading branch information
dmjoy authored Mar 12, 2024
2 parents 74aae20 + a16cb4f commit b4b5431
Show file tree
Hide file tree
Showing 52 changed files with 1,735 additions and 380 deletions.
29 changes: 28 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,33 @@
This changelog follows the specifications detailed in: [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html), although we have not yet reached a `1.0.0` release.

## Unreleased
## 0.3.0

### Added

* Added new driver script for TA3 interactions that uses a new YAML config format for ADMs
* Added several ADM config files for new driver script
* Added a new ADM HybridKaleidoADM which defers to a Llama2SingleKDMAADM instance to fill out action parameters
* Added new abstract class for action based ADMs (called ActionBasedADM), requires a `choose_action` method
* Implemented ActionBasedADM `choose_action` method on the KaleidoADM, Llama2SingleKDMAADM, and a new ADM HybridKaleidoADM
* Added alignment accuracy metric in self-evaluation framework
* Added re-usable methods for filling out action parameters to Llama2SingleKDMAADM
* Added short KDMA descriptions for moral deservingness and maximization for Kaleido
* Added new prompt template for selecting the target character of an action
* Added high and low alignment system prompts for SoarTech's maximization KDMA

### Changed

* Replaced instances of "casualties" with "characters" as per the new new TA3 scenario data format
* Changed TA3 interface component over to using TA3 client module (rather than raw HTTP requests)
* Moved the previous `run_align_system.py` script to `run_simplified_align_system.py`, replacing it with the new primary CLI script
* Updated README with respect to new CLI script
* Changed some prompts to not display vitals with a value of None

### Fixed

* Fixed issue with logging of choice scores after multiple-sampling with voting
* Fixed issue where per-sample LLM outputs weren't being logged correctly

## 0.2.6

Expand All @@ -24,6 +47,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm

* Fixed issue with configurable KDMA Estimator and Distance functions for Kaleido ADM

### Changed

* Better error message on TA3 API action taken failure


## Version 0.2.5

Expand Down
133 changes: 73 additions & 60 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ Repository](https://github.com/NextCenturyCorporation/itm-evaluation-server).

There's a corresponding client module: [TA3 Evaluation Client](https://github.com/NextCenturyCorporation/itm-evaluation-client)

Note that this client module isn't a required dependency for the ALIGN system code.

#### Soartech's TA1 API

Soartech's TA1 service code can be found at: [Soartech's TA1
Expand All @@ -43,13 +41,14 @@ install git+https://github.com/ITM-Kitware/align-system.git`.
## Running the system against the TA3 action-based API

```
$ run_action_based_align_system --help
usage: run_action_based_align_system [-h] {TA3ActionBased} ...
$ run_align_system --help
usage: run_align_system [-h] {TA3ActionBased} ...
ALIGN Action Based System CLI
ALIGN System CLI
positional arguments:
{TA3ActionBased} Select interface. Adding --help after interface selection will print interface and system specified arguments
{TA3ActionBased} Select interface. Adding --help after interface selection will print interface and
system specified arguments
TA3ActionBased Interface with CACI's TA3 web-based service
options:
Expand All @@ -59,9 +58,13 @@ options:
Running `--help` after the selected interface prints the full set of options for the interface and system. E.g.:

```
$ run_action_based_align_system TA3ActionBased --help
usage: run_action_based_align_system TA3ActionBased [-h] [-u USERNAME] [-s SESSION_TYPE] [-e API_ENDPOINT] [--training-session] [-m MODEL] [-t] [-a ALGORITHM] [-A ALGORITHM_KWARGS]
[--similarity-measure SIMILARITY_MEASURE]
$ run_align_system TA3ActionBased --help
usage: run_align_system TA3ActionBased [-h] [-u USERNAME] [-s SESSION_TYPE]
[-e API_ENDPOINT] [--training-session]
[--scenario-id SCENARIO_ID] -c ADM_CONFIG [-t]
[-l LOGLEVEL] [--logfile-path LOGFILE_PATH]
[--save-input-output-to-path SAVE_INPUT_OUTPUT_TO_PATH]
[--save-alignment-score-to-path SAVE_ALIGNMENT_SCORE_TO_PATH]
options:
-h, --help show this help message and exit
Expand All @@ -70,28 +73,30 @@ options:
-s SESSION_TYPE, --session-type SESSION_TYPE
TA3 API Session Type (default: "eval")
-e API_ENDPOINT, --api_endpoint API_ENDPOINT
Restful API endpoint for scenarios / probes (default: "http://127.0.0.1:8080")
Restful API endpoint for scenarios / probes (default:
"http://127.0.0.1:8080")
--training-session Return training related information from API requests
-m MODEL, --model MODEL
LLM Baseline model to use
--scenario-id SCENARIO_ID
Specific scenario to run
-c ADM_CONFIG, --adm-config ADM_CONFIG
Path to ADM config YAML
-t, --align-to-target
Align algorithm to target KDMAs
-a ALGORITHM, --algorithm ALGORITHM
Algorithm to use
-A ALGORITHM_KWARGS, --algorithm-kwargs ALGORITHM_KWARGS
JSON encoded dictionary of kwargs for algorithm initialization
--similarity-measure SIMILARITY_MEASURE
Similarity measure to use (default: 'bert')
-l LOGLEVEL, --loglevel LOGLEVEL
--logfile-path LOGFILE_PATH
Also write log output to the specified file
--save-input-output-to-path SAVE_INPUT_OUTPUT_TO_PATH
Save system inputs and outputs to a file
--save-alignment-score-to-path SAVE_ALIGNMENT_SCORE_TO_PATH
Save alignment score output to a file
```

Here's an example invocation of the system using the TA3 Action-based interface (assuming it's running locally on port `8080`):
```
$ run_action_based_align_system TA3ActionBased \
-e "http://127.0.0.1:8080" \
--algorithm "llama_index" \
--model falcon \
-s soartech \
--algorithm-kwargs '{"domain_docs_dir": "/data/shared/MVPData/DomainDocumentsPDF"}'
--adm-config adm_configs/metrics-evaluation/single_kdma_adm_adept_baseline.yml \
--api_endpoint "http://127.0.0.1:8080" \
--session-type adept
```

*NOTE* - The first time you run the system it can take upwards of a
Expand All @@ -102,11 +107,11 @@ model is cached.

## Running the system against TA1 services or local files

In the Python environment you have set up, a CLI application called `run_align_system` should now be available. This single entrypoint supports interfacing with both local files on disk, and the TA3 web-based API. Running the script with `--help` shows which interfaces are available:
In the Python environment you have set up, a CLI application called `run_simplified_align_system` should now be available. This single entrypoint supports interfacing with both local files on disk, and the TA3 web-based API. Running the script with `--help` shows which interfaces are available:

```
$ run_align_system --help
usage: run_align_system [-h] {TA1Soartech,LocalFiles,TA1Adept} ...
$ run_simplified_align_system --help
usage: run_simplified_align_system [-h] {TA1Soartech,LocalFiles,TA1Adept} ...
ALIGN System CLI
Expand All @@ -124,8 +129,8 @@ options:
Running `--help` after the selected interface prints the full set of options for the interface and system. E.g.:

```
$ run_align_system TA1Soartech --help
usage: run_align_system TA1Soartech [-h] [-s [SCENARIOS ...]] [--alignment-targets [ALIGNMENT_TARGETS ...]] [-e API_ENDPOINT] [-m MODEL] [-t] [-a ALGORITHM] [-A ALGORITHM_KWARGS] [--similarity-measure SIMILARITY_MEASURE]
$ run_simplified_align_system TA1Soartech --help
usage: run_simplified_align_system TA1Soartech [-h] [-s [SCENARIOS ...]] [--alignment-targets [ALIGNMENT_TARGETS ...]] [-e API_ENDPOINT] [-m MODEL] [-t] [-a ALGORITHM] [-A ALGORITHM_KWARGS] [--similarity-measure SIMILARITY_MEASURE]
options:
-h, --help show this help message and exit
Expand Down Expand Up @@ -153,7 +158,7 @@ options:
We've included some example scenario, probe, and alignment target data for testing. These files can be found in the `example_data` directory. Here's an example system invocation with the provided example files:

```
run_align_system LocalFiles \
run_simplified_align_system LocalFiles \
-s example_data/scenario_1/scenario.json \
--alignment-target-filepath example_data/scenario_1/alignment_target.json \
-p example_data/scenario_1/probe{1,2,3,4}.json \
Expand All @@ -163,56 +168,64 @@ run_align_system LocalFiles \
--align-to-target
```

## ADM Invocations
## Metrics Evaluation ADM Invocations

### Simple Action-based Baseline ADM
### Aligned ADM for ADEPT scenarios

Simple baseline (unaligned) system using the `falcon` model:
```
run_action_based_align_system TA3ActionBased \
--algorithm "llama_index" \
--model falcon \
-s soartech \
--algorithm-kwargs '{"retrieval_enabled": false}' \
--algorithm "llama_index" \
--model falcon
--adm-config adm_configs/metrics-evaluation/delivered/single_kdma_adm_adept.yml \
--username single_kdma_aligned_adm_adept \
--align-to-target \
--session-type adept
```

### Simple Action-based Aligned ADM
### Aligned Hybrid Kaleido ADM for ADEPT scenarios

Simple aligned system using the `falcon` model (requires domain document PDFs):
```
run_action_based_align_system TA3ActionBased \
--algorithm "llama_index" \
--model falcon \
-s soartech \
--algorithm-kwargs '{"domain_docs_dir": "/path/to/DomainDocumentsPDF"}' \
--algorithm-kwargs '{"retrieval_enabled": false}' \
--algorithm "llama_index" \
--model falcon \
--align-to-target
--adm-config adm_configs/metrics-evaluation/delivered/hybrid_kaleido.yml \
--username hybrid_kaleido_aligned_adm_adept \
--align-to-target \
--session-type adept
```

### Action-based Chat Baseline ADM
### Baseline ADM for ADEPT scenarios

```
run_action_based_align_system TA3ActionBased \
--adm-config adm_configs/metrics-evaluation/delivered/single_kdma_adm_baseline.yml \
--username single_kdma_baseline_adm_adept \
--session-type adept
```

Unaligned system using a Llama 2 chat model:
### Aligned ADM for SoarTech scenarios

```
run_action_based_chat_baseline TA3ActionBased \
-s adept \
--model meta-llama/Llama-2-13b-chat-hf
run_action_based_align_system TA3ActionBased \
--adm-config adm_configs/metrics-evaluation/delivered/single_kdma_adm_soartech.yml \
--username single_kdma_aligned_adm_soartech \
--align-to-target \
--session-type soartech
```

### Action-based Chat Aligned ADM
### Aligned Hybrid Kaleido ADM for SoarTech scenarios

Aligned system using a Llama 2 chat model:
```
run_action_based_align_system TA3ActionBased \
--adm-config adm_configs/metrics-evaluation/delivered/hybrid_kaleido.yml \
--username hybrid_kaleido_aligned_adm_soartech \
--align-to-target \
--session-type soartech
```

### Baseline ADM for SoarTech scenarios

```
run_action_based_chat_baseline TA3ActionBased \
-s adept \
--model meta-llama/Llama-2-13b-chat-hf \
--precision half \
--align-to-target
run_action_based_align_system TA3ActionBased \
--adm-config adm_configs/metrics-evaluation/delivered/single_kdma_adm_baseline.yml \
--username single_kdma_baseline_adm_soartech \
--session-type soartech
```


Expand Down
9 changes: 9 additions & 0 deletions adm_configs/kaleido_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
adm:
name: 'KaleidoADM'
init_kwargs:
model_name: 'allenai/kaleido-large'
use_tqdm: False

inference_kwargs:
distance_fn: 'RelevanceWeightedDistance'
kdma_descriptions_map: 'align_system/algorithms/lib/templates/kdma_descriptions_short_metrics_eval.yml'
17 changes: 17 additions & 0 deletions adm_configs/metrics-evaluation/delivered/hybrid_kaleido.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
adm:
name: 'HybridKaleidoADM'
init_kwargs:
kaleido_init_kwargs:
model_name: 'allenai/kaleido-large'
use_tqdm: False

llm_init_kwargs:
hf_model: 'meta-llama/Llama-2-7b-chat-hf'
precision: 'half'

inference_kwargs:
# Kaleido kwargs
distance_fn: 'RelevanceWeightedDistance'
kdma_descriptions_map: 'align_system/algorithms/lib/templates/kdma_descriptions_short_metrics_eval.yml'
# LLM kwargs
answer_attempts: 5
12 changes: 12 additions & 0 deletions adm_configs/metrics-evaluation/delivered/single_kdma_adm_adept.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
adm:
name: 'SingleKDMAADM'
init_kwargs:
hf_model: meta-llama/Llama-2-13b-chat-hf
precision: half
temperature: 0.7

inference_kwargs:
baseline: false
n_negative_samples: 5
n_positive_samples: 5
shuffle: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
adm:
name: 'SingleKDMAADM'
init_kwargs:
hf_model: meta-llama/Llama-2-13b-chat-hf
precision: half
temperature: 0.7

inference_kwargs:
baseline: true
n_negative_samples: 0
n_positive_samples: 5
shuffle: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
adm:
name: 'SingleKDMAADM'
init_kwargs:
hf_model: meta-llama/Llama-2-13b-chat-hf
precision: half
temperature: 0.7

inference_kwargs:
baseline: false
n_negative_samples: 0
n_positive_samples: 5
shuffle: true
22 changes: 22 additions & 0 deletions adm_configs/metrics-evaluation/hybrid_kaleido_adept_high.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
adm:
name: 'HybridKaleidoADM'
init_kwargs:
kaleido_init_kwargs:
model_name: 'allenai/kaleido-large'
use_tqdm: False

llm_init_kwargs:
hf_model: 'meta-llama/Llama-2-7b-chat-hf'
precision: 'half'

inference_kwargs:
# Kaleido kwargs
distance_fn: 'RelevanceWeightedDistance'
kdma_descriptions_map: 'align_system/algorithms/lib/templates/kdma_descriptions_short_metrics_eval.yml'
# LLM kwargs
answer_attempts: 5

alignment_target_override:
id: ADEPT-metrics_eval-alignment-target-train-HIGH
kdma_values:
- {kdma: MoralDesert, value: 1}
22 changes: 22 additions & 0 deletions adm_configs/metrics-evaluation/hybrid_kaleido_adept_low.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
adm:
name: 'HybridKaleidoADM'
init_kwargs:
kaleido_init_kwargs:
model_name: 'allenai/kaleido-large'
use_tqdm: False

llm_init_kwargs:
hf_model: 'meta-llama/Llama-2-7b-chat-hf'
precision: 'half'

inference_kwargs:
# Kaleido kwargs
distance_fn: 'RelevanceWeightedDistance'
kdma_descriptions_map: 'align_system/algorithms/lib/templates/kdma_descriptions_short_metrics_eval.yml'
# LLM kwargs
answer_attempts: 5

alignment_target_override:
id: ADEPT-metrics_eval-alignment-target-train-LOW
kdma_values:
- {kdma: MoralDesert, value: 0}
Loading

0 comments on commit b4b5431

Please sign in to comment.