Skip to content

Commit 11308d9

Browse files
authored
Merge pull request #95 from slaclab/main
Winter shutdown support announcement
2 parents c7f37bc + b86bf64 commit 11308d9

File tree

2 files changed

+40
-0
lines changed

2 files changed

+40
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ and the Rubin observatory. The S3DF infrastructure is optimized for
66
data analytics and is characterized by large, massive throughput, high
77
concurrency storage systems.
88

9+
**S3DF will remain operational over the Winter shutdown (Dec 21st 2024 to Jan 5th 2025). Staff will be taking time off as per SLAC guidelines. S3DF resources will continue to be managed remotely if there are interruptions to operations. Response times for issues will vary, depending on the criticality of the issue. [Full details are here](https://s3df.slac.stanford.edu/#/changelog).**
910

1011
## Quick Reference
1112

changelog.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,44 @@
11
# Status & Outages
22

3+
## Support during Winter Shutdown
4+
5+
S3DF will remain operational over the Winter shutdown (Dec 21st 2024 to Jan 5th 2025). Staff will be taking time off as per SLAC guidelines. S3DF resources will continue to be managed remotely if there are interruptions to operations. Response times for issues will vary, depending on the criticality of the issue as detailed below.
6+
7+
**Contacting S3DF staff for issues:**
8+
Users should email s3df-help@slac.stanford.edu for ALL issues (critical and non-critical) providing full details of the problem (including what resources were being used, the impact and other information that may be useful in resolving the issue).
9+
We will update the #comp-sdf Slack channel for critical issues as they are being worked on with status updates.
10+
[This S3DF status web-page](https://s3df.slac.stanford.edu/#/changelog) will also have any updates on current issues.
11+
If critical issues are not responded to within 2 hours of reporting the issue please contact your [Facility Czar](https://s3df.slac.stanford.edu/#/contact-us) for escalation.
12+
13+
**Critical issues** will be responded to as we become aware of them, except for the period of Dec 24-25 and Jan 31-1, which will be handled as soon as possible depending on staff availability.
14+
* Critical issues are defined as full (a system-wide) outages that impact:
15+
* Access to S3DF resources including
16+
* All SSH logins
17+
* All IANA interactive resources
18+
* B50 compute resources(*)
19+
* Bullet Cluster
20+
* Access to all of the S3DF storage
21+
* Home directories
22+
* Group, Data and Scratch filesystems
23+
* B50 Lustre, GPFS and NFS storage(*)
24+
* Batch system access to S3DF Compute resources
25+
* S3DF Kubernetes vClusters
26+
* VMware clusters
27+
* S3DF virtual machines
28+
* B50 virtual machines(*)
29+
* Critical issues for other SCS-managed systems and services for Experimental system support will be managed in conjunction with the experiment as appropriate. This includes
30+
* LCLS workflows
31+
* Rubin USDF resources
32+
* CryoEM workflows
33+
* Fermi workflows
34+
(*) B50 resources are also dependent on SLAC-IT resources being available.
35+
36+
**Non-critical issues** will be responded to in the order they were received in the ticketing system when normal operations resume after the Winter Shutdown. Non-critical issues include:
37+
* Individual node-outages in the compute or interactive pool
38+
* Variable or unexpected performance issues for compute, storage or networking resources.
39+
* Batch job errors (that do not impact overall batch system scheduling)
40+
* Tape restores and data transfer issues
41+
342
## Outages
443

544
### Current

0 commit comments

Comments
 (0)