Skip to content

Commit

Permalink
deploy: 4c66b8e
Browse files Browse the repository at this point in the history
  • Loading branch information
rajarshitiwari committed Feb 9, 2024
1 parent eab5c71 commit 85f6cfb
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 17 deletions.
22 changes: 15 additions & 7 deletions _sources/interim-service.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
(target)=interim-service
# Interim National HPC Service

This page describes the nature of the service being put in place to enable researchers maintain access to HPC resources as well as the high level migration plan in operation to migrate researchers to the new system.
This page describes the nature of the service being put in place to enable researchers maintain access to HPC resources following the closure of the Kay supercomputer. It also highlights the differences between the two systems and services as well as the high level migration plan in operation to migrate researchers to the new system.

```{dropdown} Background
ICHEC first presented its plan for the provision of compute sources from foreign sites (to be procured on a commercial basis) at its Board meeting of September 2022. These arrangements were deemed essential to ensure continuity of service to the research community.
Expand Down Expand Up @@ -59,11 +59,17 @@ Here we provide some quick references for users transitioning to meluxina, pleas
```

```{admonition} Quick Summary
- Similar to kay, there is a login node for meluxina `login.lxp.lu` where you `ssh` to. One noteworthy difference is that meluxina has the default ssh port 22 closed, and uses port 8822 for ssh. So you ssh command would be `ssh -p 8822 username@login.lxp.lu` .
- Similar to kay, there is a login node for meluxina `login.lxp.lu` where you `ssh` to. One noteworthy difference is that meluxina uses port 8822 for ssh instead of the default port 22. So your ssh command would be `ssh -p 8822 username@login.lxp.lu` . If your ssh session to meluxina does not connect or hangs, it is likely that you are not specifying the correct port (8822) or that your local firewall is preventing outbound connections to this port.
- You are assigned a user name when your account is created on meluxina, so it's good idea to use $USER variable in your script rather than using your username.
- The meluxina login nodes do not have access to the centrally installed applications or modules and so to see what modules are available or to load modules and build your software you will need to launch an interactive slurm job.
- You might need to rethink in terms of taskfarming or resource management, if your HPC runs were optimized for 40 core nodes. For most people, this will not be an issue, but if it is, please do [contact us](./contact-us).
- You are assigned a (different) user name when your account is created on meluxina, so it's good idea to use $USER variable in your script rather than using your username.
- Resources are allocated based on node hours and each node has considerably more CPU cores and/or GPUs than kay. As a result, each node hour used is considerably more expensive (and powerful) than kay and so it is important that you fully use all cores or GPUs on the node. You might need to rethink in terms of domain decomposition, taskfarming or other hardcoded compute or memory sizing when transferring scripts from kay. Please [contact us](./contact-us) for help with performance optimisation.
- Extensive documentation is available on [Meluxina Documentation Site](https://docs.lxp.lu/)
- After initial onboarding, all technical support on meluxina should be via the [ICHEC Helpdesk](./contact-us).
```

Expand All @@ -75,14 +81,16 @@ Below is a {ref}`Table <kay-lxp-comp>` of comparison between kay and meluxina ma
:name: kay-lxp-comp
|Specs/features|kay|meluxina|
|:---:|:---:|:---:|
|Standard CPU nodes|Intel CPUs 40core per node|Amd CPUs 128core per node|
|Standard CPU nodes|Intel CPUs 40core, 192GB RAM, 400 GB SSD per node|Amd CPUs 128core, 512 GB RAM, No local disk per node|
|Standard GPU nodes|2 x (NVidia V100 cards, 16GB) per node| 4 x (Nvidia A100 cards, 40GB) per node|
|Large Memory Nodes|||
|Large Memory Nodes|Intel CPUs 40 core, 1.5 TiB RAM|AMD CPUs 128core, 4TiB RAM|
|Hyperthreading| No |On by default, changable per job|
|login access|`ssh user@kay.ichec.ie`|`ssh -p 8822 user@login.lxp.lu`|
|queue manager|Slurm [See link](https://www.ichec.ie/academic/national-hpc/kay-documentation/slurm-workload-manager)|Slurm [See link](https://docs.lxp.lu/first-steps/handling_jobs/)|
|Resource Consumption unit|CPU core hour|CPU/GPU Node hour|
|Popular slurm partitions|DevQ, ProdQ, LongQ, GpuQ, ...|cpu, gpu, fpga, largemem|
|Slurm partitions|DevQ, ProdQ, LongQ, GpuQ, ...|cpu, gpu, fpga, largemem|
|Internet connectivity|No internet connection from compute nodes|Internet accessible from compute nodes|
|Resource usage/allocation command|mybalance / quota|myquota|
```


Expand Down
29 changes: 20 additions & 9 deletions interim-service.html
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,7 @@ <h2> Contents </h2>
<p>(target)=interim-service</p>
<section class="tex2jax_ignore mathjax_ignore" id="interim-national-hpc-service">
<h1>Interim National HPC Service<a class="headerlink" href="#interim-national-hpc-service" title="Link to this heading">#</a></h1>
<p>This page describes the nature of the service being put in place to enable researchers maintain access to HPC resources as well as the high level migration plan in operation to migrate researchers to the new system.</p>
<p>This page describes the nature of the service being put in place to enable researchers maintain access to HPC resources following the closure of the Kay supercomputer. It also highlights the differences between the two systems and services as well as the high level migration plan in operation to migrate researchers to the new system.</p>
<details class="sd-sphinx-override sd-dropdown sd-card sd-mb-3">
<summary class="sd-summary-title sd-card-header">
Background<div class="sd-summary-down docutils">
Expand Down Expand Up @@ -434,9 +434,12 @@ <h1>Kay vs Meluxina<a class="headerlink" href="#kay-vs-meluxina" title="Link to
<div class="admonition-quick-summary admonition">
<p class="admonition-title">Quick Summary</p>
<ul class="simple">
<li><p>Similar to kay, there is a login node for meluxina <code class="docutils literal notranslate"><span class="pre">login.lxp.lu</span></code> where you <code class="docutils literal notranslate"><span class="pre">ssh</span></code> to. One noteworthy difference is that meluxina has the default ssh port 22 closed, and uses port 8822 for ssh. So you ssh command would be <code class="docutils literal notranslate"><span class="pre">ssh</span> <span class="pre">-p</span> <span class="pre">8822</span> <span class="pre">username&#64;login.lxp.lu</span></code> .</p></li>
<li><p>You are assigned a user name when your account is created on meluxina, so it’s good idea to use $USER variable in your script rather than using your username.</p></li>
<li><p>You might need to rethink in terms of taskfarming or resource management, if your HPC runs were optimized for 40 core nodes. For most people, this will not be an issue, but if it is, please do <a class="reference internal" href="contact-us.html"><span class="doc std std-doc">contact us</span></a>.</p></li>
<li><p>Similar to kay, there is a login node for meluxina <code class="docutils literal notranslate"><span class="pre">login.lxp.lu</span></code> where you <code class="docutils literal notranslate"><span class="pre">ssh</span></code> to. One noteworthy difference is that meluxina uses port 8822 for ssh instead of the default port 22. So your ssh command would be <code class="docutils literal notranslate"><span class="pre">ssh</span> <span class="pre">-p</span> <span class="pre">8822</span> <span class="pre">username&#64;login.lxp.lu</span></code> . If your ssh session to meluxina does not connect or hangs, it is likely that you are not specifying the correct port (8822) or that your local firewall is preventing outbound connections to this port.</p></li>
<li><p>The meluxina login nodes do not have access to the centrally installed applications or modules and so to see what modules are available or to load modules and build your software you will need to launch an interactive slurm job.</p></li>
<li><p>You are assigned a (different) user name when your account is created on meluxina, so it’s good idea to use $USER variable in your script rather than using your username.</p></li>
<li><p>Resources are allocated based on node hours and each node has considerably more CPU cores and/or GPUs than kay. As a result, each node hour used is considerably more expensive (and powerful) than kay and so it is important that you fully use all cores or GPUs on the node. You might need to rethink in terms of domain decomposition, taskfarming or other hardcoded compute or memory sizing when transferring scripts from kay. Please <a class="reference internal" href="contact-us.html"><span class="doc std std-doc">contact us</span></a> for help with performance optimisation.</p></li>
<li><p>Extensive documentation is available on <a class="reference external" href="https://docs.lxp.lu/">Meluxina Documentation Site</a></p></li>
<li><p>After initial onboarding, all technical support on meluxina should be via the <a class="reference internal" href="contact-us.html"><span class="doc std std-doc">ICHEC Helpdesk</span></a>.</p></li>
</ul>
</div>
<p>Below is a <a class="reference internal" href="#kay-lxp-comp"><span class="std std-ref">Table</span></a> of comparison between kay and meluxina machine, highlighting similarities and differences -</p>
Expand All @@ -450,16 +453,16 @@ <h1>Kay vs Meluxina<a class="headerlink" href="#kay-vs-meluxina" title="Link to
</thead>
<tbody>
<tr class="row-even"><td class="text-center"><p>Standard CPU nodes</p></td>
<td class="text-center"><p>Intel CPUs 40core per node</p></td>
<td class="text-center"><p>Amd CPUs 128core per node</p></td>
<td class="text-center"><p>Intel CPUs 40core, 192GB RAM, 400 GB SSD per node</p></td>
<td class="text-center"><p>Amd CPUs 128core, 512 GB RAM, No local disk per node</p></td>
</tr>
<tr class="row-odd"><td class="text-center"><p>Standard GPU nodes</p></td>
<td class="text-center"><p>2 x (NVidia V100 cards, 16GB) per node</p></td>
<td class="text-center"><p>4 x (Nvidia A100 cards, 40GB) per node</p></td>
</tr>
<tr class="row-even"><td class="text-center"><p>Large Memory Nodes</p></td>
<td class="text-center"><p></p></td>
<td class="text-center"><p></p></td>
<td class="text-center"><p>Intel CPUs 40 core, 1.5 TiB RAM</p></td>
<td class="text-center"><p>AMD CPUs 128core, 4TiB RAM</p></td>
</tr>
<tr class="row-odd"><td class="text-center"><p>Hyperthreading</p></td>
<td class="text-center"><p>No</p></td>
Expand All @@ -477,10 +480,18 @@ <h1>Kay vs Meluxina<a class="headerlink" href="#kay-vs-meluxina" title="Link to
<td class="text-center"><p>CPU core hour</p></td>
<td class="text-center"><p>CPU/GPU Node hour</p></td>
</tr>
<tr class="row-odd"><td class="text-center"><p>Popular slurm partitions</p></td>
<tr class="row-odd"><td class="text-center"><p>Slurm partitions</p></td>
<td class="text-center"><p>DevQ, ProdQ, LongQ, GpuQ, …</p></td>
<td class="text-center"><p>cpu, gpu, fpga, largemem</p></td>
</tr>
<tr class="row-even"><td class="text-center"><p>Internet connectivity</p></td>
<td class="text-center"><p>No internet connection from compute nodes</p></td>
<td class="text-center"><p>Internet accessible from compute nodes</p></td>
</tr>
<tr class="row-odd"><td class="text-center"><p>Resource usage/allocation command</p></td>
<td class="text-center"><p>mybalance / quota</p></td>
<td class="text-center"><p>myquota</p></td>
</tr>
</tbody>
</table>
</section>
Expand Down
Loading

0 comments on commit 85f6cfb

Please sign in to comment.