Skip to content

Commit

Permalink
issue template
Browse files Browse the repository at this point in the history
  • Loading branch information
slobentanzer committed Feb 12, 2024
1 parent 2b912f1 commit 89a10e2
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions content/40.methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ For instance, we test the conversion of numbers (which LLMs are notoriously bad

The Pytest framework is implemented at [https://github.com/biocypher/biochatter/blob/main/benchmark](https://github.com/biocypher/biochatter/blob/main/benchmark), and more information is available at [https://biochatter.org/benchmarking](https://biochatter.org/benchmarking).
The benchmark is updated upon the release of new models and extensions to the datasets, and continuously available at [https://biochatter.org/benchmark](https://biochatter.org/benchmark).
<!-- TODO link -->
We will run the benchmark on new models and variants (including fine-tuned models) upon requests from the community, which can be made on GitHub using our issue template (TODO link).
The living benchmark process is inspired by test-driven development, meaning test cases are created based on specific features or behaviors that are desired.
When a model doesn't initially produce the optimal response, which is often the case, adjustments are made to various elements of the framework, including prompts or functions, to enhance the model's effectiveness.
Monitoring the model's performance on these tests over time allows us to assess the framework's reliability and pinpoint areas that need improvement.
Expand Down

0 comments on commit 89a10e2

Please sign in to comment.