Skip to content

Commit 389538f

Browse files
authored
Merge pull request #405 from garlick/rfc41_job_info
rfc41: add new RFC for job information service
2 parents 5405a3b + 7576e8a commit 389538f

File tree

4 files changed

+222
-0
lines changed

4 files changed

+222
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ Table of Contents
5050
- [38/Flux Security Key Value Encoding](spec_38.rst)
5151
- [39/Flux Security Signature](spec_39.rst)
5252
- [40/Fluxion Resource Set Extension](spec_40.rst)
53+
- [41/Job Information Service](spec_41.rst)
5354

5455
Build Instructions
5556
------------------

index.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,11 @@ content secured with a digital signature.
266266
This specification defines the data format used by the Fluxion scheduler
267267
to store resource graph data in RFC 20 *R* version 1 objects.
268268

269+
:doc:`spec_41`
270+
~~~~~~~~~~~~~~
271+
272+
The Flux Job Information Service provides proxy access to KVS job
273+
information for guest users.
269274

270275
.. Each file must appear in a toctree
271276
.. toctree::
@@ -309,3 +314,4 @@ to store resource graph data in RFC 20 *R* version 1 objects.
309314
spec_38
310315
spec_39
311316
spec_40
317+
spec_41

spec_41.rst

Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
.. github display
2+
GitHub is NOT the preferred viewer for this file. Please visit
3+
https://flux-framework.rtfd.io/projects/flux-rfc/en/latest/spec_41.html
4+
5+
41/Job Information Service
6+
##########################
7+
8+
The Flux Job Information Service provides convenient, read-only access to
9+
KVS job data for job owners.
10+
11+
.. list-table::
12+
:widths: 25 75
13+
14+
* - **Name**
15+
- github.com/flux-framework/rfc/spec_41.rst
16+
* - **Editor**
17+
- Jim Garlick <garlick@llnl.gov>
18+
* - **State**
19+
- raw
20+
21+
Language
22+
********
23+
24+
.. include:: common/language.rst
25+
26+
Related Standards
27+
*****************
28+
29+
- :doc:`spec_15`
30+
- :doc:`spec_16`
31+
- :doc:`spec_18`
32+
- :doc:`spec_20`
33+
- :doc:`spec_21`
34+
- :doc:`spec_24`
35+
- :doc:`spec_25`
36+
37+
Background
38+
**********
39+
40+
Job info is stored in a job-specific KVS directory as described in
41+
:doc:`RFC 16 <spec_16>`. Among the keys stored for each job are:
42+
43+
.. list-table::
44+
:widths: 25 75
45+
46+
* - **J**
47+
- signed jobspec submitted by the user (:doc:`RFC 15 <spec_15>`)
48+
* - **jobspec**
49+
- unwrapped jobspec from J (:doc:`RFC 25 <spec_25>`), potentially modified
50+
at ingest
51+
* - **eventlog**
52+
- primary job eventlog (:doc:`RFC 21 <spec_21>`)
53+
* - **R**
54+
- resource set allocated to job (:doc:`RFC 20 <spec_20>`)
55+
* - **guest.exec.eventlog**
56+
- exec system eventlog
57+
* - **guest.input**
58+
- job input (:doc:`RFC 24 <spec_24>`)
59+
* - **guest.output**
60+
- job output (:doc:`RFC 24 <spec_24>`)
61+
62+
While the job is running, the ``guest`` key in the job directory is a
63+
symbolic link to a private KVS namespace that may be read or written by
64+
the job owner. All KVS accesses from the job are redirected to the private
65+
namespace. This supports the job shell running as the guest user, and also
66+
allows job applications and user-invoked tools to use the KVS.
67+
68+
Once the job becomes inactive, the private namespace is deleted and the
69+
``guest`` link becomes a directory in the primary KVS namespace that contains
70+
a snapshot of the private namespace's content. At this point only the
71+
instance owner may directly access this information.
72+
73+
The main purpose of the job info service is to give job owners convenient,
74+
read-only access to the data in their KVS job directory.
75+
76+
Goals
77+
*****
78+
79+
- Provide read-only access to all keys in a given KVS job directory.
80+
81+
- Restrict access to the job owner and the Flux instance owner.
82+
83+
- Provide scalability and performance comparable to direct KVS access.
84+
85+
- Hide the complexity of watching eventlogs across the ``guest`` transition
86+
described above.
87+
88+
Implementation
89+
**************
90+
91+
The job info service SHOULD be distributed across all broker ranks to
92+
avoid creating a bottleneck at the leader broker.
93+
94+
Job information requests below are for a single job ID. If a request was
95+
not sent by the instance owner or the job owner, it SHALL fail with error 1,
96+
"Operation not permitted" (EPERM).
97+
98+
Internally, the job information service MAY determine the *job owner* by
99+
fetching the primary job eventlog and reading the userid from the ``submit``
100+
event context.
101+
102+
Lookup
103+
======
104+
105+
The :program:`job-info.lookup` RPC fetches one or more job info keys from
106+
the KVS.
107+
108+
If a failure occurs while looking up any of the keys, the entire request
109+
SHALL fail.
110+
111+
The RPC payloads are defined as follows:
112+
113+
.. object:: job-info.lookup request
114+
115+
The request SHALL consist of a JSON object with the following keys:
116+
117+
.. object:: id
118+
119+
(*integer*, REQUIRED) The job id.
120+
121+
.. object:: keys
122+
123+
(*array of string*, REQUIRED) List of keys.
124+
125+
Keys SHALL be specified relative to the job directory, and SHALL use
126+
period ``.`` as the path delimiter.
127+
128+
.. object:: flags
129+
130+
(*integer*, REQUIRED) Flags, reserved for future use. Set to zero.
131+
132+
.. object:: job-info.lookup response
133+
134+
The response SHALL consist of a JSON object with the following keys:
135+
136+
.. object:: id
137+
138+
(*integer*, REQUIRED) The job id from the request.
139+
140+
.. object:: keys...
141+
142+
Additional keys correspond to the keys in the request.
143+
144+
Values are the KVS values associated with the keys, encoded as strings.
145+
146+
Eventlog Watch
147+
==============
148+
149+
The :program:`job-info.eventlog-watch` streaming RPC tracks events posted to
150+
an RFC 18 eventlog.
151+
152+
The RPC stream SHALL be terminated with error 61, "No data available"
153+
(ENODATA) when one of the following conditions is met:
154+
155+
- The RPC is canceled with a :program:`job-info.eventlog-watch-cancel` request.
156+
- The job becomes inactive
157+
- A context-specific terminating event is posted to the eventlog:
158+
159+
.. list-table::
160+
:widths: 25 75
161+
162+
* - **eventlog**
163+
- clean
164+
* - **guest.exec.eventlog**
165+
- done
166+
* - **guest.output**
167+
- data with eof=true on all streams and all ranks
168+
* - **guest.input**
169+
- data with eof=true
170+
171+
172+
The RPC payloads are defined as follows:
173+
174+
.. object:: job-info.eventlog-watch request
175+
176+
The request SHALL consist of a JSON object with the following keys:
177+
178+
.. object:: id
179+
180+
(*integer*, REQUIRED) The job id.
181+
182+
.. object:: path
183+
184+
(*string*, REQUIRED) The eventlog key.
185+
186+
The key SHALL be specified relative to the job directory, and SHALL use
187+
period ``.`` as the path delimiter.
188+
189+
.. object:: flags
190+
191+
(*integer*, REQUIRED) A bitfield comprised of zero or more flags:
192+
193+
waitcreate (1)
194+
If key does not exist yet, wait for its creation before responding.
195+
196+
.. object:: job-info.eventlog-watch response
197+
198+
Each non-error response SHALL consist of a JSON object with the following
199+
keys:
200+
201+
.. object:: event
202+
203+
(*string*, REQUIRED) Exactly one :doc:`RFC 18 <spec_18>` eventlog entry,
204+
including trailing newline.
205+
206+
.. object:: job-info.eventlog-watch-cancel request
207+
208+
Cancel a :program:`job-info.eventlog-watch` request, as described in
209+
:doc:`RFC 6 <spec_6>`.
210+
211+
.. object:: matchtag
212+
213+
(*integer*, REQUIRED) The matchtag of the request to be canceled.

spell.en.pws

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -476,3 +476,5 @@ usr
476476
acceptor
477477
Fluxion
478478
mortem
479+
waitcreate
480+
bitfield

0 commit comments

Comments
 (0)