-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCITATION.cff
76 lines (75 loc) · 3 KB
/
CITATION.cff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Chat AI
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Ali
family-names: Doosthosseini
email: ali.doosthosseini@uni-goettingen.de
affiliation: University of Göttingen
orcid: 'https://orcid.org/0000-0002-0654-1268'
- given-names: Jonathan
family-names: Decker
email: jonathan.decker@uni-goettingen.de
affiliation: University of Göttingen
orcid: 'https://orcid.org/0000-0002-7384-7304'
- given-names: Hendrik
family-names: Nolte
email: hendrik.nolte@gwdg.de
affiliation: GWDG
orcid: 'https://orcid.org/0000-0003-2138-8510'
- given-names: Julian
name-particle: M.
family-names: Kunkel
email: julian.kunkel@gwdg.de
affiliation: GWDG
orcid: 'https://orcid.org/0000-0002-6915-1179'
identifiers:
- type: other
value: 'arXiv:2407.00110'
description: The arXiv deposit of the encompassing paper.
repository-code: 'https://github.com/gwdg/chat-ai'
url: 'https://chat-ai.academiccloud.de'
abstract: >-
The increasing adoption of large language models (LLMs)
has created a pressing need for an efficient, secure and
private serving infrastructure, which allows researchers
to run open-source or custom fine-tuned LLMs and ensures
users that their data remains private and is not stored
without their consent. While high-performance computing
(HPC) systems equipped with state-of-the-art GPUs are
well-suited for training LLMs, their batch scheduling
paradigm is not designed to support real-time serving of
AI applications. Cloud systems, on the other hand, are
well suited for web services but commonly lack access to
the computational power of clusters, especially expensive
and scarce high-end GPUs, which are required for optimal
inference speed. We propose an architecture with an
implementation consisting of a web service that runs on a
cloud VM with secure access to a scalable backend running
a multitude of AI models on HPC systems. By offering a web
service using our HPC infrastructure to host LLMs, we
leverage the trusted environment of local universities and
research centers to offer a private and secure alternative
to commercial LLM services. Our solution natively
integrates with Slurm, enabling seamless deployment on HPC
clusters and is able to run side by side with regular
Slurm workloads, while utilizing gaps in the schedule
created by Slurm. In order to ensure the security of the
HPC system, we use the SSH ForceCommand directive to
construct a robust circuit breaker, which prevents
successful attacks on the web-facing server from affecting
the cluster. We have successfully deployed our system as a
production service, and made the source code available at
https://github.com/gwdg/chat-ai.
keywords:
- AI
- HPC
- Slurm
license: GPL-3.0
version: v0.6
date-released: '2024-02-07'