Skip to content

Changes for Tutor Gym Paper #1

@cmaclell

Description

@cmaclell

Code Review

  • Review the code together
  • Merge branches into main

LLM evaluation

  • Check that correct/incorrect are evaluated one at a time, not as lists
  • Each environment returns its list of actions that it supports and these are injected into the prompt, so we can support new environments with new actions
  • Finalize LLM prompts for tutoring and simulated student with Deepseek models (even small ones if faster), then run with the larger models and the for-pay models
    • Tutoring evaluation prompt
    • Simulated student evaluation
  • Consider adding a "Get hint" action to the tutor.

RL Wrapper

  • Test to make sure we have something 🤷.

CTAT Env

  • Need to make as much of the matcher interpreter work as possible
    • Involves implementing several subroutines
  • There is a global unordered attribute to graphs that isn’t directly implemented
    • Double check multiple next action behavior; make sure the behavior recorder is working as we expect

OA Env

  • Are we properly capturing the hint/substep sequences

Update documentation

  • add readme that talks about how to get things to download and run
  • Make it easy to run LLM models
  • Outline how someone can add a new environment (might require some refactoring)
    • do we need a separate content repository.

Overall Ideas for the future

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions