Skip to content

Commit

Permalink
benchmark figure
Browse files Browse the repository at this point in the history
  • Loading branch information
slobentanzer committed Feb 8, 2024
1 parent ddcfe62 commit b501943
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions content/20.results.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,12 @@ Consequently, the models without prompt engine show a lower performance in creat
![
**Benchmark results.**
A) Performance of different LLMs on the BioChatter benchmark datasets.
B) Performance of different LLMs with and without the use of BioChatter's prompt engine for KG querying.
](images/benchmark_results.png "Benchmark results"){#fig:benchmark}
While the closed-source models from OpenAI show consistently highest performance, some open-source models are comparable.
However, the measured performance does not correlate intuitively with the size and quantisation (bit-precision) of the models.
Some smaller models perform better than larger ones, even within the same model family; while very low bit-precision (2-bit) expectedly yields worse performance, the same is true for the high end (8-bit).
*: Of note, many characteristics of OpenAI models are not public, and thus their bit-precision (as well as the exact size of GPT4) is subject to speculation.
B) Comparison of the two benchmark tasks for KG querying show the superior performance of BioChatter's prompt engine. The BioChatter variant involves a multi-step procedure of constructing the query, while the "naive" version only receives the complete schema definition of the BioCypher KG (which BioChatter also uses as a basis for the prompt engine). The general instructions for both variants are the same, otherwise.
](images/biochatter-benchmark.png "Benchmark results"){#fig:benchmark}

### Knowledge Graphs

Expand Down
Binary file added content/images/biochatter_benchmark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit b501943

Please sign in to comment.