|
| 1 | +.. github display |
| 2 | + GitHub is NOT the preferred viewer for this file. Please visit |
| 3 | + https://flux-framework.rtfd.io/projects/flux-rfc/en/latest/spec_41.html |
| 4 | +
|
| 5 | +41/Job Information Service |
| 6 | +########################## |
| 7 | + |
| 8 | +The Flux Job Information Service provides convenient, read-only access to |
| 9 | +KVS job data for job owners. |
| 10 | + |
| 11 | +.. list-table:: |
| 12 | + :widths: 25 75 |
| 13 | + |
| 14 | + * - **Name** |
| 15 | + - github.com/flux-framework/rfc/spec_41.rst |
| 16 | + * - **Editor** |
| 17 | + - Jim Garlick <garlick@llnl.gov> |
| 18 | + * - **State** |
| 19 | + - raw |
| 20 | + |
| 21 | +Language |
| 22 | +******** |
| 23 | + |
| 24 | +.. include:: common/language.rst |
| 25 | + |
| 26 | +Related Standards |
| 27 | +***************** |
| 28 | + |
| 29 | +- :doc:`spec_15` |
| 30 | +- :doc:`spec_16` |
| 31 | +- :doc:`spec_18` |
| 32 | +- :doc:`spec_20` |
| 33 | +- :doc:`spec_21` |
| 34 | +- :doc:`spec_24` |
| 35 | +- :doc:`spec_25` |
| 36 | + |
| 37 | +Background |
| 38 | +********** |
| 39 | + |
| 40 | +Job info is stored in a job-specific KVS directory as described in |
| 41 | +:doc:`RFC 16 <spec_16>`. Among the keys stored for each job are: |
| 42 | + |
| 43 | +.. list-table:: |
| 44 | + :widths: 25 75 |
| 45 | + |
| 46 | + * - **J** |
| 47 | + - signed jobspec submitted by the user (:doc:`RFC 15 <spec_15>`) |
| 48 | + * - **jobspec** |
| 49 | + - unwrapped jobspec from J (:doc:`RFC 25 <spec_25>`), potentially modified |
| 50 | + at ingest |
| 51 | + * - **eventlog** |
| 52 | + - primary job eventlog (:doc:`RFC 21 <spec_21>`) |
| 53 | + * - **R** |
| 54 | + - resource set allocated to job (:doc:`RFC 20 <spec_20>`) |
| 55 | + * - **guest.exec.eventlog** |
| 56 | + - exec system eventlog |
| 57 | + * - **guest.input** |
| 58 | + - job input (:doc:`RFC 24 <spec_24>`) |
| 59 | + * - **guest.output** |
| 60 | + - job output (:doc:`RFC 24 <spec_24>`) |
| 61 | + |
| 62 | +While the job is running, the ``guest`` key in the job directory is a |
| 63 | +symbolic link to a private KVS namespace that may be read or written by |
| 64 | +the job owner. All KVS accesses from the job are redirected to the private |
| 65 | +namespace. This supports the job shell running as the guest user, and also |
| 66 | +allows job applications and user-invoked tools to use the KVS. |
| 67 | + |
| 68 | +Once the job becomes inactive, the private namespace is deleted and the |
| 69 | +``guest`` link becomes a directory in the primary KVS namespace that contains |
| 70 | +a snapshot of the private namespace's content. At this point only the |
| 71 | +instance owner may directly access this information. |
| 72 | + |
| 73 | +The main purpose of the job info service is to give job owners convenient, |
| 74 | +read-only access to the data in their KVS job directory. |
| 75 | + |
| 76 | +Goals |
| 77 | +***** |
| 78 | + |
| 79 | +- Provide read-only access to all keys in a given KVS job directory. |
| 80 | + |
| 81 | +- Restrict access to the job owner and the Flux instance owner. |
| 82 | + |
| 83 | +- Provide scalability and performance comparable to direct KVS access. |
| 84 | + |
| 85 | +- Hide the complexity of watching eventlogs across the ``guest`` transition |
| 86 | + described above. |
| 87 | + |
| 88 | +Implementation |
| 89 | +************** |
| 90 | + |
| 91 | +The job info service SHOULD be distributed across all broker ranks to |
| 92 | +avoid creating a bottleneck at the leader broker. |
| 93 | + |
| 94 | +Job information requests below are for a single job ID. If a request was |
| 95 | +not sent by the instance owner or the job owner, it SHALL fail with error 1, |
| 96 | +"Operation not permitted" (EPERM). |
| 97 | + |
| 98 | +Internally, the job information service MAY determine the *job owner* by |
| 99 | +fetching the primary job eventlog and reading the userid from the ``submit`` |
| 100 | +event context. |
| 101 | + |
| 102 | +Lookup |
| 103 | +====== |
| 104 | + |
| 105 | +The :program:`job-info.lookup` RPC fetches one or more job info keys from |
| 106 | +the KVS. |
| 107 | + |
| 108 | +If a failure occurs while looking up any of the keys, the entire request |
| 109 | +SHALL fail. |
| 110 | + |
| 111 | +The RPC payloads are defined as follows: |
| 112 | + |
| 113 | +.. object:: job-info.lookup request |
| 114 | + |
| 115 | + The request SHALL consist of a JSON object with the following keys: |
| 116 | + |
| 117 | + .. object:: id |
| 118 | + |
| 119 | + (*integer*, REQUIRED) The job id. |
| 120 | + |
| 121 | + .. object:: keys |
| 122 | + |
| 123 | + (*array of string*, REQUIRED) List of keys. |
| 124 | + |
| 125 | + Keys SHALL be specified relative to the job directory, and SHALL use |
| 126 | + period ``.`` as the path delimiter. |
| 127 | + |
| 128 | + .. object:: flags |
| 129 | + |
| 130 | + (*integer*, REQUIRED) Flags, reserved for future use. Set to zero. |
| 131 | + |
| 132 | +.. object:: job-info.lookup response |
| 133 | + |
| 134 | + The response SHALL consist of a JSON object with the following keys: |
| 135 | + |
| 136 | + .. object:: id |
| 137 | + |
| 138 | + (*integer*, REQUIRED) The job id from the request. |
| 139 | + |
| 140 | + .. object:: keys... |
| 141 | + |
| 142 | + Additional keys correspond to the keys in the request. |
| 143 | + |
| 144 | + Values are the KVS values associated with the keys, encoded as strings. |
| 145 | + |
| 146 | +Eventlog Watch |
| 147 | +============== |
| 148 | + |
| 149 | +The :program:`job-info.eventlog-watch` streaming RPC tracks events posted to |
| 150 | +an RFC 18 eventlog. |
| 151 | + |
| 152 | +The RPC stream SHALL be terminated with error 61, "No data available" |
| 153 | +(ENODATA) when one of the following conditions is met: |
| 154 | + |
| 155 | +- The RPC is canceled with a :program:`job-info.eventlog-watch-cancel` request. |
| 156 | +- The job becomes inactive |
| 157 | +- A context-specific terminating event is posted to the eventlog: |
| 158 | + |
| 159 | +.. list-table:: |
| 160 | + :widths: 25 75 |
| 161 | + |
| 162 | + * - **eventlog** |
| 163 | + - clean |
| 164 | + * - **guest.exec.eventlog** |
| 165 | + - done |
| 166 | + * - **guest.output** |
| 167 | + - data with eof=true on all streams and all ranks |
| 168 | + * - **guest.input** |
| 169 | + - data with eof=true |
| 170 | + |
| 171 | + |
| 172 | +The RPC payloads are defined as follows: |
| 173 | + |
| 174 | +.. object:: job-info.eventlog-watch request |
| 175 | + |
| 176 | + The request SHALL consist of a JSON object with the following keys: |
| 177 | + |
| 178 | + .. object:: id |
| 179 | + |
| 180 | + (*integer*, REQUIRED) The job id. |
| 181 | + |
| 182 | + .. object:: path |
| 183 | + |
| 184 | + (*string*, REQUIRED) The eventlog key. |
| 185 | + |
| 186 | + The key SHALL be specified relative to the job directory, and SHALL use |
| 187 | + period ``.`` as the path delimiter. |
| 188 | + |
| 189 | + .. object:: flags |
| 190 | + |
| 191 | + (*integer*, REQUIRED) A bitfield comprised of zero or more flags: |
| 192 | + |
| 193 | + waitcreate (1) |
| 194 | + If key does not exist yet, wait for its creation before responding. |
| 195 | + |
| 196 | +.. object:: job-info.eventlog-watch response |
| 197 | + |
| 198 | + Each non-error response SHALL consist of a JSON object with the following |
| 199 | + keys: |
| 200 | + |
| 201 | + .. object:: event |
| 202 | + |
| 203 | + (*string*, REQUIRED) Exactly one :doc:`RFC 18 <spec_18>` eventlog entry, |
| 204 | + including trailing newline. |
| 205 | + |
| 206 | +.. object:: job-info.eventlog-watch-cancel request |
| 207 | + |
| 208 | + Cancel a :program:`job-info.eventlog-watch` request, as described in |
| 209 | + :doc:`RFC 6 <spec_6>`. |
| 210 | + |
| 211 | + .. object:: matchtag |
| 212 | + |
| 213 | + (*integer*, REQUIRED) The matchtag of the request to be canceled. |
0 commit comments