Skip to content

Latest commit

 

History

History
112 lines (89 loc) · 4.76 KB

lesson-outline.md

File metadata and controls

112 lines (89 loc) · 4.76 KB
layout title
page
Running LAMMPS on HPC Systems - Lesson Outline

How to use this outline

The following list of items is meant as a guide on what content should go where in this repo. This should work as a guide where you can contribute. If a bullet point is prefixed by a file name, this is the lesson where the listed content should go into. This document is meant as a concept map converted into a flow learning goals and questions.

Accelerating LAMMPS on a HPC

  • index.md: Prelude:Why should I take this course?

    • Why should I bother about software performance?
    • What can I expect to learn from this course?
  • 01-why-bother-with-performance.md:: Brief notes on software performance

    • What is software performance?
    • Why is software performance important?
    • How can performance be measured?
    • What is meant by flops, walltime and CPU hours?
    • What can affect performance?
  • 02-benchmark-and-scaling.md: How do I benchmark software performance in HPC?: about benchmark and scaling

    • What is benchmarking?
    • What are the factors that can affect a benchmark?
      • Case study 1: A simple benchmarking example of LAMMPS in a HPC
      • Hands-on 1: Can you do it on your own?
    • What is scaling?
    • How do I perform scaling analysis?
    • Quntifying speedup: t1/tp
    • Am I wasting my resourse?
      • Case Study 2: Get scaling data for a LAMMPS run
      • Hands-on 2: Do a scaling analysis
  • 03-acceleration.md: Can I accelerate performance?:brief discussion over various aspects of speeding up software performances

    • Hardware acceleration and software acceleration

      • multi-core cpu
      • GPU
    • Can I use specialised code to extract best of an available hardware?

      • Multi-threading via OpenMP: parallel processing in shared memory platform

        • Thread based parallelism
        • Important run-time environment variables
        • bottlenecks in an OpenMP applications
          • hyperthreading
          • cpu affinity
      • Multi-threading via CUDA: host-device relationship

        • bottlenecks in host-device architectures
    • What if I need more workers than that available in a single node?

      • How using MPI we can achieve this?
      • What is the bottleneck here?
        • communication overhead
        • domain decomposition
    • Is this possible to use optimized library/code to get acceleration?

      • Brief mention about various optimized libraries like MKL, FFTW
  • 04-lammps-bottlenecks.md: Identifying bottlenecks in LAMMPS: learn to analyze timing data in LAMMPS

    • Case study 3: Understand the task timing breakdown of LAMMPS output
    • Hands-on 3: Understand the task timing breakdown of LAMMPS output of a different problem
  • 05-accelerating-lammps.md: How can I accelerate LAMMPS performance?: various options to accelerate LAMMPS

    • Knowing what hardwares LAMMPS can be used on

    • How can I enable architecture support at runtime?

      • Accelerator packages in LAMMPS
        • What packages for which architecture?
          • OPT
          • USER-OMP
          • USER-INTEL
          • GPU
          • KOKKOS
    • Why KOKKOS?

      • What is Kokkos?
      • Important features of LAMMPS Kokkos package
      • Fixes that support KOKKOS in LAMMPS
      • Package options
  • 06-invoking-kokkos.md: How do I invoke KOKKOS in LAMMPS?: technical aspects to use KOKKOS with LAMMPS

    • Transition from regular LAMMPS call to accelerated call
  • 07-kokkos-openmp.md: Compare KOKKOS/OpenMP performance with regular LAMMPS/OpenMP performance: learn to use openmp with KOKKOS

    • Case study 4: using OpenMP+KOKKOS for Skylake AVX-512 architecture
    • Comparing LAMMPS performance between runs with and without KOKKOS
    • Exercise 4: Similar study with slightly different problem
  • 08-kokkos-gpu.md: Compare KOKKOS/GPU performance with regular LAMMPS/GPU performance: learn to use gpu with KOKKOS

    • Case study 5: using OpenMP+KOKKOS for NVIDIA Tesla V100 architecture
    • Comparing LAMMPS performance between runs with and without KOKKOS
    • Exercise 5: Similar study with slightly different problem
  • 09-limitations.md: What are the limitatations of different accelerator packages?: discuss the limitations of KOKKOS and other accelerator packages