modifications/additions to llm_qa_direct_only and parse_scenario_womd. also incorporating grade files and corresponding generated graphs #6

dataei · 2025-03-08T06:24:33Z

closed comment test#1 to create pull request with newest commits, otherwise I could not do a new pull request with an already existing one.

…raining

…2,4,6shot

…or loop to prevent repeated output in llm_qa_direct_only.py. changed index in run_experiments.py to work with 8 interaction scenario ID. output observed in deepseek_grades.json

…rPlanningProblemGen into dev_denise

…eted easily.

…or loop to prevent repeated output in llm_qa_direct_only.py. changed index in run_experiments.py to work with 8 interaction scenario ID. output observed in deepseek_grades.json

…ting 2,4,6shot under generate_qa_prompt, and modified graph titles within main. parse_scenario_womd: called obtain_andwrite_mcq_data to sort through indices to generate parsed womdr data. committing deepseek grade files and corresponding graphs

…es with gpt4o-mini in addition to creating draft of script comparing experiment results in bar chart

…curately distinguish individual words; implemented context word count calculation; categorized scenario ID size based on byte count. in llm_qa_direct: removed faithfulness scores and represented only correctness scores; displayed context word count; updated plt graph to represent correctness scores solely.

…2,53) was returning two files rather than one (being 53) after previously committing changes. moved old batch of experiments run to a different file for organization, after modifying procedure and experiment goals.

…ectness averages in llm_qa_direct_only, and ran 60 experiments (grades and charts) for newly formatted experiment chart under old deepseek model v3

…deepseek-v3 vs v3-0324. halted v3-0324 due to inefficiencies in model.

…ven range

…g parameters in parse scenario womd.

…eted easily.

…ven range

…n 60 experiments

…on of the code has been done.

dataei and others added 30 commits February 25, 2025 16:22

first

78c94b1

k

e14c62d

merge conflicts

6810c58

merge conflicts

7035d96

comment test

c962628

first

9d54926

k

0792497

merge conflicts

bdd9789

merge conflicts

e6735bb

comment test

cea730b

first

b963219

modifying to deepinfra instead of deepseek and changed file path to t…

cea0646

…raining

debugging with print statements, planning direct prompting variables …

97bbf86

…2,4,6shot

added 2,4,6,8 shot direct prompting and moved writing script out of f…

9f1cefe

…or loop to prevent repeated output in llm_qa_direct_only.py. changed index in run_experiments.py to work with 8 interaction scenario ID. output observed in deepseek_grades.json

Merge branch 'dev_denise' of https://github.com/AugmentedDesignLab/Ca…

a60c014

…rPlanningProblemGen into dev_denise

Modifications to model names and corresponding evals can now be compl…

f893f7e

…eted easily.

added 2,4,6,8 shot direct prompting and moved writing script out of f…

fca4c75

…or loop to prevent repeated output in llm_qa_direct_only.py. changed index in run_experiments.py to work with 8 interaction scenario ID. output observed in deepseek_grades.json

syntax conflict

faf4d9b

ran more experiments to find pattern in small, medium, large file siz…

6cab612

…es with gpt4o-mini in addition to creating draft of script comparing experiment results in bar chart

latest

5e5df63

merge dev_denise with dev_Ishaan

e4bc684

Removing planner import

a28cb92

incorporated word count into planner.py, drafting automation for corr…

295ad3d

…ectness averages in llm_qa_direct_only, and ran 60 experiments (grades and charts) for newly formatted experiment chart under old deepseek model v3

renaming of folders for better organization between examples ran for …

caca3a9

…deepseek-v3 vs v3-0324. halted v3-0324 due to inefficiencies in model.

add grades folder to gitignore

19717bc

merge conflict resolution

1626387

merge conflict resolution

4e1e3ac

ishaan95 and others added 19 commits April 8, 2025 19:16

Calculate the most similar scenarios to a given index

9abc262

Search similar scenarios to a specific index and with respect to a gi…

7096db4

…ven range

reorganizing files and adding boxplots + creation script. fixed sizin…

857da20

…g parameters in parse scenario womd.

embedding space analysis, parallel experiments

7f28e5b

Modifications to model names and corresponding evals can now be compl…

fb99033

…eted easily.

latest

28845b5

Removing planner import

a8986fa

add grades folder to gitignore

442fea4

Calculate the most similar scenarios to a given index

c46ae04

Search similar scenarios to a specific index and with respect to a gi…

96c7cf7

…ven range

embedding space analysis, parallel experiments

d4f00ef

experimenting w/ llm as a judge prompting

92e1495

placed data collection outside of project folder

ccab94d

renamed box plotting script, added negative prompting (concise) to ru…

b1c953c

…n 60 experiments

modifications to negative prompting

9d8a5de

merge commit

3ae34b1

Reorganized the llm evaluation script

38abeb5

modified parse scenario to my file path

7cd3f39

lecturing prompting has been added. PDDL testing for the latest versi…

38e7009

…on of the code has been done.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

modifications/additions to llm_qa_direct_only and parse_scenario_womd. also incorporating grade files and corresponding generated graphs #6

modifications/additions to llm_qa_direct_only and parse_scenario_womd. also incorporating grade files and corresponding generated graphs #6

Uh oh!

dataei commented Mar 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

modifications/additions to llm_qa_direct_only and parse_scenario_womd. also incorporating grade files and corresponding generated graphs #6

Are you sure you want to change the base?

modifications/additions to llm_qa_direct_only and parse_scenario_womd. also incorporating grade files and corresponding generated graphs #6

Uh oh!

Conversation

dataei commented Mar 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants