Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure prod rhods for ope courses (Jan 18 ope course start date) #361

Closed
dystewart opened this issue Jan 5, 2024 · 15 comments
Closed
Assignees

Comments

@dystewart
Copy link

For Orran's course that will be utilizing rhods we will need to stand up a new OCP cluster for the Spring semester. Our prod rhods does provide the user experience that ope are looking for due to the configuration of the rhods install in prod. See this slack thread for more specifics.

For more info on motivations behind doing this see this thread I've opened in the rhods redhat slack: thread

In short, this will greatly simplify the student experience for this course since we can curate the rhods install to the needs of the students, which is something we cannot do in the current prod install on a namespace basis.

To be clear this will not eliminate the ability for students to see each other's notebooks and log into them, though that level of security is possible by creating and granting access to a datascience project per student (which is not really feasible with the number of student we will be hosting). This however does grant us the ability to allow student to launch their notebooks through the jupyter tile in rhods, without seeing/clicking through and other notebooks in the process.

The first step here is to find some hardware and get OCP installed there ASAP since this course will be going like Jan. 18

@dystewart dystewart self-assigned this Jan 5, 2024
@dystewart dystewart changed the title Create New OCP cluster for ope dedicated rhods Create New OCP cluster for ope dedicated rhods (Jan 18 ope course start date) Jan 5, 2024
@msdisme msdisme assigned joachimweyl and unassigned joachimweyl Jan 5, 2024
@msdisme
Copy link

msdisme commented Jan 5, 2024

From a discussion with @hakasapl I think the hardware is available.

Can you say more about how we plan to configure it and the expected number of nodes?

@dystewart
Copy link
Author

@msdisme As far as configuration, we will be able to mirror much of what we've done with the test cluster, minus a few unnecessary operators. The big difference is that we will need to set up networking so that the cluster is available externally, much like prod.

We will also need Ceph set up for this cluster as well, since the notebooks all require a pvc

As for number of nodes, since this cluster is just for OCP/rhods:

  • 3-4 control plane nodes (minimum required by OCP is 3)
  • 10-20 worker nodes (not exactly sure here depends on the hosts)

This course will have 300 or more students, and we have to be prepared for all students logging in and spinning up notebooks concurrently.

@dystewart
Copy link
Author

Dropping the prob rebuild doc here since much of the process will look the same: https://github.com/nerc-project/nerc-runbooks/blob/main/docs/nerc-ocp-prod-rebuild.md

@joachimweyl
Copy link
Contributor

joachimweyl commented Jan 8, 2024

@hakasapl what hardware would you suggest for this?
@dystewart will you be standing this cluster up? Do you need FC430s or FC830s? Do you want FC430s for the Controllers and FC830s for the Workers?
3 FC430s for the controllers and since the FC830s have a bunch more processors probably 5 should be enough for workers. Thoughts?

@msdisme
Copy link

msdisme commented Jan 8, 2024

Some notes from a discussion with Heidi:

  1. authentication via NERC coldfront.
    a. think it needs to be
  2. how do we gather info/etc. for billing purposes
    a. do we treat as su's or bare metal?
    what we decide here is secondary since for this semester they are being charged a flat fee per student, but goal was to begin figuring out what it would be if part of regular cluster
  3. who's monitoring/managing- NERC, Gerard/CS, some group of us?
    a. hook in the same way prod and dev do?
  4. use NERC OpenShift as the base line?
    a. yes pease
  5. Separate NESE storage requirement?
    a. can we share the ones being used for OPE dev cluster or do we need a new one?

@joachimweyl
Copy link
Contributor

@naved001 since racks 1&2 have been added to ESI are we able to provide @dystewart access to the nodes he needs via ESI? If so can we start by offering him 3 FC430s and 5 FC830s?

@naved001
Copy link

naved001 commented Jan 8, 2024

I have allocated the following nodes to the ESI project orran_cloud_computing:

FC430:
MOC-R4PAC24U31-S1A
MOC-R4PAC24U37-S3C
MOC-R4PAC22U31-S1A

FC830:
MOC-R4PAC24U05-S3
MOC-R4PAC24U03-S1
MOC-R4PAC24U03-S3
MOC-R4PAC22U27-S1
MOC-R4PAC22U25-S1

@joachimweyl
Copy link
Contributor

@dystewart please confirm you are able to access the allocated hardware.

@msdisme
Copy link

msdisme commented Jan 8, 2024

From Orran email:

I just spoke to Erwan, and it appears like there is a straightforward path forward.

So the issue for classes is that the project concept gives all users in a project access to all of the containers of all the other users in that project.

There is a default project, I think he said jupyter something, where if you use the tile (see below), rather than the project interface, users containers are spun up isolated from each other in that project. What we can do is readily turn off the project functionality for any user in a class, and if they use the tile to spin up containers, everything will work like it always did on AWS.
image

Turns out, that for NERC, the “launch application” link is missing if you look below. We must have done something special to turn that off. So, we also have to re-enable “launch application” on the NERC tile for RHODS.
image

Description automatically generated

Research users would still use the normal project interface. Classes would use the tile feature. The PODs created will have the user name in them, so we would eventually need to map back from a user to a course; but that is secondary for the short term.

So, we need to do two things:
Re-enable launching a notebook from the tile directly.
Disable for all course students access to the projects interface.

I think we can/should discuss proper multi-tenancy support in RHODS longer term, but, as far as I can tell, this immediately solves our problems with the classes for at least the near term. It would be nice to have per-class resource management controls…. But we can worry about that later.

@joachimweyl
Copy link
Contributor

@dystewart please review the notes above.

@dystewart
Copy link
Author

Enables rhods notebook controller: OCP-on-NERC/nerc-ocp-config#338

@dystewart dystewart changed the title Create New OCP cluster for ope dedicated rhods (Jan 18 ope course start date) Configure prod rhods for ope courses (Jan 18 ope course start date) Jan 9, 2024
dystewart added a commit to OCP-on-NERC/nerc-ocp-config that referenced this issue Jan 9, 2024
This will re-enable the default functionality of the rhods notebook controller in the prod cluster.
As a side effect of enabling, the ability to launch a notebook from the jupyter tile will be restored
Addresses: nerc-project/operations#361
@dystewart
Copy link
Author

Update default jupyter notebook pvc size in rhods-notebooks: OCP-on-NERC/nerc-ocp-config#339

@joachimweyl
Copy link
Contributor

@hpdempsey I heard you have an idea as to what you think we should do with the 8 nodes that we prepped for the initial incarnation of this issue. What do you think would be a good use for that cluster of nodes?

@dystewart
Copy link
Author

@hpdempsey I heard you have an idea as to what you think we should do with the 8 nodes that we prepped for the initial incarnation of this issue. What do you think would be a good use for that cluster of nodes?

I think we should create a new issue for this, if we plan to use the nodes for something else, and we can close this one since ope is in prod

@joachimweyl
Copy link
Contributor

We will leave these nodes available for now and if we get a new project we need them for we can open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants