Skip to content

Commit

Permalink
discussion: symbols and ontologies
Browse files Browse the repository at this point in the history
  • Loading branch information
slobentanzer committed Feb 17, 2024
1 parent 0af330f commit be6580e
Show file tree
Hide file tree
Showing 28 changed files with 4,281 additions and 41 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Output directory containing the formatted manuscript

The [`gh-pages`](https://github.com/biocypher/biochatter-paper/tree/gh-pages) branch hosts the contents of this directory at <https://biocypher.github.io/biochatter-paper/>.
The permalink for this webpage version is <https://biocypher.github.io/biochatter-paper/v/b82b23134b1d959409a914e6eb2b4e222934b57e/>.
The permalink for this webpage version is <https://biocypher.github.io/biochatter-paper/v/336baf0c3f072b3524a425c525ecf0e0b789faa8/>.
To redirect to the permalink for the latest manuscript version at anytime, use the link <https://biocypher.github.io/biochatter-paper/v/freeze/>.

## Files
Expand Down Expand Up @@ -35,4 +35,4 @@ Verifying timestamps with the `ots verify` command requires running a local bitc
## Source

The manuscripts in this directory were built from
[`b82b23134b1d959409a914e6eb2b4e222934b57e`](https://github.com/biocypher/biochatter-paper/commit/b82b23134b1d959409a914e6eb2b4e222934b57e).
[`336baf0c3f072b3524a425c525ecf0e0b789faa8`](https://github.com/biocypher/biochatter-paper/commit/336baf0c3f072b3524a425c525ecf0e0b789faa8).
40 changes: 22 additions & 18 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
<meta name="author" content="Nils Krehl" />
<meta name="author" content="Qin Ma" />
<meta name="author" content="Julio Saez-Rodriguez" />
<meta name="dcterms.date" content="2024-02-16" />
<meta name="dcterms.date" content="2024-02-17" />
<meta name="keywords" content="biomedicine, large language models, framework, retrieval-augmented generation, knowledge graph" />
<title>A Platform for the Biomedical Application of Large Language Models</title>
<style>
Expand Down Expand Up @@ -121,11 +121,11 @@
<meta name="citation_title" content="A Platform for the Biomedical Application of Large Language Models" />
<meta property="og:title" content="A Platform for the Biomedical Application of Large Language Models" />
<meta property="twitter:title" content="A Platform for the Biomedical Application of Large Language Models" />
<meta name="dc.date" content="2024-02-16" />
<meta name="citation_publication_date" content="2024-02-16" />
<meta property="article:published_time" content="2024-02-16" />
<meta name="dc.modified" content="2024-02-16T00:16:54+00:00" />
<meta property="article:modified_time" content="2024-02-16T00:16:54+00:00" />
<meta name="dc.date" content="2024-02-17" />
<meta name="citation_publication_date" content="2024-02-17" />
<meta property="article:published_time" content="2024-02-17" />
<meta name="dc.modified" content="2024-02-17T06:11:42+00:00" />
<meta property="article:modified_time" content="2024-02-17T06:11:42+00:00" />
<meta name="dc.language" content="en-UK" />
<meta name="citation_language" content="en-UK" />
<meta name="dc.relation.ispartof" content="Manubot" />
Expand Down Expand Up @@ -169,9 +169,9 @@
<meta name="citation_fulltext_html_url" content="https://biocypher.github.io/biochatter-paper/" />
<meta name="citation_pdf_url" content="https://biocypher.github.io/biochatter-paper/manuscript.pdf" />
<link rel="alternate" type="application/pdf" href="https://biocypher.github.io/biochatter-paper/manuscript.pdf" />
<link rel="alternate" type="text/html" href="https://biocypher.github.io/biochatter-paper/v/b82b23134b1d959409a914e6eb2b4e222934b57e/" />
<meta name="manubot_html_url_versioned" content="https://biocypher.github.io/biochatter-paper/v/b82b23134b1d959409a914e6eb2b4e222934b57e/" />
<meta name="manubot_pdf_url_versioned" content="https://biocypher.github.io/biochatter-paper/v/b82b23134b1d959409a914e6eb2b4e222934b57e/manuscript.pdf" />
<link rel="alternate" type="text/html" href="https://biocypher.github.io/biochatter-paper/v/336baf0c3f072b3524a425c525ecf0e0b789faa8/" />
<meta name="manubot_html_url_versioned" content="https://biocypher.github.io/biochatter-paper/v/336baf0c3f072b3524a425c525ecf0e0b789faa8/" />
<meta name="manubot_pdf_url_versioned" content="https://biocypher.github.io/biochatter-paper/v/336baf0c3f072b3524a425c525ecf0e0b789faa8/manuscript.pdf" />
<meta property="og:type" content="article" />
<meta property="twitter:card" content="summary_large_image" />
<link rel="icon" type="image/png" sizes="192x192" href="https://manubot.org/favicon-192x192.png" />
Expand All @@ -188,10 +188,10 @@ <h1 class="title">A Platform for the Biomedical Application of Large Language Mo
</header>
<p><small><em>
This manuscript
(<a href="https://biocypher.github.io/biochatter-paper/v/b82b23134b1d959409a914e6eb2b4e222934b57e/">permalink</a>)
(<a href="https://biocypher.github.io/biochatter-paper/v/336baf0c3f072b3524a425c525ecf0e0b789faa8/">permalink</a>)
was automatically generated
from <a href="https://github.com/biocypher/biochatter-paper/tree/b82b23134b1d959409a914e6eb2b4e222934b57e">biocypher/biochatter-paper@b82b231</a>
on February 16, 2024.
from <a href="https://github.com/biocypher/biochatter-paper/tree/336baf0c3f072b3524a425c525ecf0e0b789faa8">biocypher/biochatter-paper@336baf0</a>
on February 17, 2024.
</em></small></p>
<h2 id="authors">Authors</h2>
<ul>
Expand Down Expand Up @@ -438,9 +438,13 @@ <h2 id="discussion">Discussion</h2>
Inspired by the productivity of open source libraries such as LangChain <span class="citation" data-cites="vKMc6EpN">[<a href="#ref-vKMc6EpN" role="doc-biblioref">16</a>]</span>, we propose an open framework that allows biomedical researchers to focus on the application of LLMs as opposed to engineering challenges.
To keep the framework effective and sustainable, we focus on reusing existing open-source libraries and tools, while adapting the advancements from the wider LLM community to the biomedical domain.
The transparency we emphasise at every step of the framework is essential to a sustainable application of LLMs in biomedical research and beyond <span class="citation" data-cites="BXdkfGlr">[<a href="#ref-BXdkfGlr" role="doc-biblioref">35</a>]</span>.</p>
<p>To account for the requirements of biomedical research workflows, we take particular care to guarantee robustness and objective evaluation of LLM behaviour and their performance in interaction with other parts of the framework.
<p>To facilitate efficient human-AI interaction, a “lingua franca” is required; symbolic representations of concepts are required at least at the surface level of the conversation <span class="citation" data-cites="fLS7kvml">[<a href="#ref-fLS7kvml" role="doc-biblioref">7</a>]</span>.
We enable interaction with LLMs on a symbolic level by providing ontological grounding via the synergy of BioChatter with BioCypher KGs.
The configuration of BioCypher KGs allows the user to specify the contextual domain by mapping KG concepts to existing ontologies and custom terminology.
This way, we guarantee an overlap in the contextual understanding of user and LLM despite the generic nature of most pre-trained models.</p>
<p>We take particular care to guarantee robustness and objective evaluation of LLM behaviour and their performance in interaction with other parts of the framework.
We achieve this goal by implementing a living benchmarking framework that allows the automated evaluation of LLMs, prompts, and other components (<a href="https://biochatter.org/benchmark/">https://biochatter.org/benchmark/</a>).
Even the most recent and biomedicine-specific benchmarking efforts are small-scale manual approaches that do not consider the full matrix of possible combinations of components, and many benchmarks are performed by accessing web interfaces of LLMs, which obfuscates important parameters such as model version and temperature <span class="citation" data-cites="uYvzQA7w">[<a href="#ref-uYvzQA7w" role="doc-biblioref">25</a>]</span>.
Even the most recent biomedicine-specific benchmarking efforts are small-scale manual approaches that do not consider the full matrix of possible combinations of components, and many benchmarks are performed by accessing web interfaces of LLMs, which obfuscates important parameters such as model version and temperature <span class="citation" data-cites="uYvzQA7w">[<a href="#ref-uYvzQA7w" role="doc-biblioref">25</a>]</span>.
As such, a framework is a necessary step towards the objective and reproducible evaluation of LLMs.
We prevent data leakage from the benchmark datasets into the training data of new models by encryption, which is essential for the sustainability of the benchmark as new models are released.
The living benchmark will be updated with new questions and tasks as they arise in the community.</p>
Expand All @@ -460,7 +464,7 @@ <h3 id="limitations">Limitations</h3>
We see current LLMs, particularly in the scope of the BioCypher ecosystem, as helpful tools to assist human researchers, alleviating menial and repetitive tasks and helping with technical aspects such as query languages.
They are not meant to replace human ingenuity and expertise but to augment it with their complementary strengths.
Despite the user-friendly design of BioChatter, there may be a learning curve for researchers unfamiliar with LLMs or the specific functionalities of the framework.
Encouraging adoption and providing adequate training and support are critical for maximizing its impact in the biomedical research community.</p>
For maximising its benefit to the community, encouraging adoption and providing adequate training and support will be critical.</p>
<h3 id="future-directions">Future directions</h3>
<p>Multitask learners that can synthesise, for instance, language, vision, and molecular measurements are an emerging field of research <span class="citation" data-cites="47ndFyxw JbgxHZwW 43KJj9IS">[<a href="#ref-47ndFyxw" role="doc-biblioref">41</a>,<a href="#ref-JbgxHZwW" role="doc-biblioref">42</a>,<a href="#ref-43KJj9IS" role="doc-biblioref">43</a>]</span>.
Autonomous agents for trivial tasks have already been developed on the basis of LLMs, and we expect this field to mature in the future <span class="citation" data-cites="lmJHElQl">[<a href="#ref-lmJHElQl" role="doc-biblioref">30</a>]</span>.
Expand Down Expand Up @@ -631,7 +635,7 @@ <h2 class="page_break_before" id="references">References</h2>
<div class="csl-left-margin">12. </div><div class="csl-right-inline"><strong>🦜️🔗 Langchain</strong> <a href="https://python.langchain.com/">https://python.langchain.com/</a></div>
</div>
<div id="ref-gy4YOpGJ" class="csl-entry" role="doc-biblioentry">
<div class="csl-left-margin">13. </div><div class="csl-right-inline"><strong>AutoGPT Official</strong> <div class="csl-block">AutoGPT Official</div> (2024-01-29) <a href="https://autogpt.net/">https://autogpt.net/</a></div>
<div class="csl-left-margin">13. </div><div class="csl-right-inline"><strong>AutoGPT Official</strong> <div class="csl-block">AutoGPT Official</div> (2024-02-16) <a href="https://autogpt.net/">https://autogpt.net/</a></div>
</div>
<div id="ref-fliCQWwF" class="csl-entry" role="doc-biblioentry">
<div class="csl-left-margin">14. </div><div class="csl-right-inline"><strong>Towards Conversational Diagnostic AI</strong> <div class="csl-block">Tao Tu, Anil Palepu, Mike Schaekermann, Khaled Saab, Jan Freyberg, Ryutaro Tanno, Amy Wang, Brenna Li, Mohamed Amin, Nenad Tomasev, … Vivek Natarajan</div> <em>arXiv</em> (2024) <a href="https://doi.org/gtdmpj">https://doi.org/gtdmpj</a> <div class="csl-block">DOI: <a href="https://doi.org/10.48550/arxiv.2401.05654">10.48550/arxiv.2401.05654</a></div></div>
Expand All @@ -649,7 +653,7 @@ <h2 class="page_break_before" id="references">References</h2>
<div class="csl-left-margin">18. </div><div class="csl-right-inline"><strong>Mixtral of Experts</strong> <div class="csl-block">Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, … William El Sayed</div> <em>arXiv</em> (2024) <a href="https://doi.org/gtc2g3">https://doi.org/gtc2g3</a> <div class="csl-block">DOI: <a href="https://doi.org/10.48550/arxiv.2401.04088">10.48550/arxiv.2401.04088</a></div></div>
</div>
<div id="ref-mGEvmJGA" class="csl-entry" role="doc-biblioentry">
<div class="csl-left-margin">19. </div><div class="csl-right-inline"><strong>xorbitsai/inference</strong> <div class="csl-block">Xorbits</div> (2024-02-15) <a href="https://github.com/xorbitsai/inference">https://github.com/xorbitsai/inference</a></div>
<div class="csl-left-margin">19. </div><div class="csl-right-inline"><strong>xorbitsai/inference</strong> <div class="csl-block">Xorbits</div> (2024-02-17) <a href="https://github.com/xorbitsai/inference">https://github.com/xorbitsai/inference</a></div>
</div>
<div id="ref-PDhRVYjU" class="csl-entry" role="doc-biblioentry">
<div class="csl-left-margin">20. </div><div class="csl-right-inline"><a href="https://www.reuters.com/technology/european-data-protection-board-discussing-ai-policy-thursday-meeting-2023-04-13/">https://www.reuters.com/technology/european-data-protection-board-discussing-ai-policy-thursday-meeting-2023-04-13/</a></div>
Expand Down Expand Up @@ -685,7 +689,7 @@ <h2 class="page_break_before" id="references">References</h2>
<div class="csl-left-margin">30. </div><div class="csl-right-inline"><strong>A Survey on Large Language Model based Autonomous Agents</strong> <div class="csl-block">Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, … Ji-Rong Wen</div> <em>arXiv</em> (2023) <a href="https://doi.org/gsv93m">https://doi.org/gsv93m</a> <div class="csl-block">DOI: <a href="https://doi.org/10.48550/arxiv.2308.11432">10.48550/arxiv.2308.11432</a></div></div>
</div>
<div id="ref-14upAJPXR" class="csl-entry" role="doc-biblioentry">
<div class="csl-left-margin">31. </div><div class="csl-right-inline"><strong>pytest-dev/pytest</strong> <div class="csl-block">pytest-dev</div> (2024-02-15) <a href="https://github.com/pytest-dev/pytest">https://github.com/pytest-dev/pytest</a></div>
<div class="csl-left-margin">31. </div><div class="csl-right-inline"><strong>pytest-dev/pytest</strong> <div class="csl-block">pytest-dev</div> (2024-02-17) <a href="https://github.com/pytest-dev/pytest">https://github.com/pytest-dev/pytest</a></div>
</div>
<div id="ref-KONKs6Pw" class="csl-entry" role="doc-biblioentry">
<div class="csl-left-margin">32. </div><div class="csl-right-inline"><strong>Large language models encode clinical knowledge</strong> <div class="csl-block">Karan Singhal, Shekoofeh Azizi, Tao Tu, SSara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, … Vivek Natarajan</div> <em>Nature</em> (2023-07-12) <a href="https://doi.org/gsgp8c">https://doi.org/gsgp8c</a> <div class="csl-block">DOI: <a href="https://doi.org/10.1038/s41586-023-06291-2">10.1038/s41586-023-06291-2</a> · PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37438534">37438534</a> · PMCID: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10396962">PMC10396962</a></div></div>
Expand Down
Binary file modified manuscript.pdf
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions v/336baf0c3f072b3524a425c525ecf0e0b789faa8/images/github.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions v/336baf0c3f072b3524a425c525ecf0e0b789faa8/images/orcid.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions v/336baf0c3f072b3524a425c525ecf0e0b789faa8/images/twitter.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit be6580e

Please sign in to comment.