Changes for Tutor Gym Paper

# Code Review
- [ ] Review the code together
- [ ] Merge branches into main

# LLM evaluation
- [ ] Check that correct/incorrect are evaluated one at a time, not as lists
- [ ] Each environment returns its list of actions that it supports and these are injected into the prompt, so we can support new environments with new actions 
- [ ] Finalize LLM prompts for tutoring and simulated student with Deepseek models (even small ones if faster), then run with the larger models and the for-pay models
    - [ ] Tutoring evaluation prompt
    - [ ] Simulated student evaluation
- [ ] Consider adding a "Get **hint**" action to the tutor.

# RL Wrapper
- [ ] Test to make sure we have something 🤷.

# CTAT Env
- [ ] Need to make as much of the matcher interpreter work as possible
    - Involves implementing several subroutines
- [ ] There is a global unordered attribute to graphs that isn’t directly implemented
    - [ ] Double check multiple next action behavior; make sure the behavior recorder is working as we expect

# OA Env
- [ ] Are we properly capturing the hint/substep sequences

# Update documentation
- [ ] add readme that talks about how to get things to download and run
- [ ] Make it easy to run LLM models
- [ ] Outline how someone can add a new environment (might require some refactoring)
    - do we need a separate content repository.

# Overall Ideas for the future
- [ ] Consider using Cohen's Kappa as a way to evaluate model performance
- [ ] Consider generating negative examples for each tutor using the LLM student
- [ ] Consider: ReAct: Synergizing reasoning and acting in language models
<https://arxiv.org/abs/2210.03629> Tree of thoughts: Deliberate problem solving with large language models
<https://proceedings.neurips.cc/paper_files/paper/2023/hash/271db9922b8d1f4dd7aaef84ed5ac703-Abstract-Conference.html>
- [ ] Consider trying to get CTAT human data
- [ ] Consider trying to get OA tutor data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes for Tutor Gym Paper #1

Code Review

LLM evaluation

RL Wrapper

CTAT Env

OA Env

Update documentation

Overall Ideas for the future

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Changes for Tutor Gym Paper #1

Description

Code Review

LLM evaluation

RL Wrapper

CTAT Env

OA Env

Update documentation

Overall Ideas for the future

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions