Merge pull request #24 from xLPMG/develop

Develop
xLPMG · Feb 1, 2024 · 7e8d78e · 7e8d78e
2 parents 033dc9b + 0b134f3
commit 7e8d78e
Show file tree

Hide file tree

Showing 21 changed files with 861 additions and 625 deletions.
diff --git a/configs/1d.json b/configs/1d.json
@@ -0,0 +1,16 @@
+{
+    "solver": "fwave",
+    "simulationSizeX": 10,
+    "simulationSizeY": 1,
+    "offsetX": 0,
+    "offsetY": 0,
+    "nx":10,
+    "ny":1,
+    "setup":"DAMBREAK1D",
+    "writingFrequency":10,
+    "endTime":10,
+    "baseHeight":5,
+    "height":100,
+    "outputMethod":"csv",
+    "timeStepScaling":0.2
+}
diff --git a/configs/config.json b/configs/config.json
@@ -8,10 +8,8 @@
     "ny":100,
     "setup":"CIRCULARDAMBREAK2D",
     "writingFrequency":50,
-    "endTime":50,
-    "stations":[
-        { "name":"station_1", "locX":0, "locY":10 },
-        { "name":"station_2", "locX":5, "locY":10 },
-        { "name":"station_3", "locX":10, "locY":10 }
-    ]
+    "endTime":30,
+    "baseHeight":5,
+    "diameter":10,
+    "height":100
 }
diff --git a/docs/Doxyfile b/docs/Doxyfile
@@ -943,7 +943,7 @@ WARN_LOGFILE           =
 # spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
 # Note: If this tag is empty the current directory is searched.
 
-INPUT                  = ../../src
+INPUT                  = ../../src ../../lib
 
 # This tag can be used to specify the character encoding of the source files
 # that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -9,12 +9,11 @@
 # Doxygen
 subprocess.call('doxygen ../Doxyfile', shell=True)
 
-
 # -- Project information -----------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
 
 project = 'tsunami_lab'
-copyright = '2023, Luca-Philipp Grumbach & Richard Hofmann'
+copyright = '2024, Luca-Philipp Grumbach & Richard Hofmann'
 author = 'Luca-Philipp Grumbach & Richard Hofmann'
 
 # -- General configuration ---------------------------------------------------

diff --git a/docs/source/files/assignments/08.rst b/docs/source/files/assignments/08.rst
@@ -1,15 +1,17 @@
+################
 8. Optimization
-*****************
+################
 
+**********
 8.1 ARA
-========
+**********
 
 .. figure:: https://wiki.uni-jena.de/download/attachments/22453005/IMG_7381_0p5.JPG?version=1&modificationDate=1625042348365&api=v2
 
     HPC-Cluster ARA. Source: https://wiki.uni-jena.de/pages/viewpage.action?pageId=22453005
 
 8.1.1 - Uploading and running the code
-----------------------------------------
+========================================
 
 First we cloned our github repository to "beegfs" and transfered the bythymetry and displacement data with "wget https://cloud.uni-jena.de/s/CqrDBqiMyKComPc/download/data_in.tar.xz -O tsunami_lab_data_in.tar.xz" there.
 
@@ -41,9 +43,10 @@ sbatch file:
 Since we only want to use one node, we set ``nodes`` and ``ntasks`` to 1 and ``cpus-per-task`` to 72.
 
 8.1.2 - Visualizations
---------------------------
+========================
 
-**Tohoku 5000**
+Tohoku 5000
+-----------
 
 .. raw:: html
 
@@ -52,7 +55,8 @@ Since we only want to use one node, we set ``nodes`` and ``ntasks`` to 1 and ``c
     </video> 
 
 
-**Tohoku 1000**
+Tohoku 1000
+-----------
 
 .. raw:: html
 
@@ -62,15 +66,17 @@ Since we only want to use one node, we set ``nodes`` and ``ntasks`` to 1 and ``c
 
 
 
-**Chile 5000**
+Chile 5000
+-----------
 
 .. raw:: html
 
     <video width="100%" height="auto" controls>
       <source src="../../_static/assets/task_8-1-2_chile_5000.mp4" type="video/mp4">
     </video> 
 
-**Chile 1000**
+Chile 1000
+-----------
 
 .. raw:: html
 
@@ -82,15 +88,15 @@ Since we only want to use one node, we set ``nodes`` and ``ntasks`` to 1 and ``c
 Comparing to the simulations from assignment 6, it is clear that all simulations behave equally.
 
 8.1.3 - Private PC vs ARA
----------------------------
+===========================
 
 .. note:: 
 
   The code was compiled using ``scons mode=benchmark opt=-O2``.
   The benchmarking mode disables all file output (and also skips all imports of ``<filesystem>``).
 
 Setups
-^^^^^^^^^^
+-------
 
 If you are interested, you can view the used configurations here:
 
@@ -103,7 +109,7 @@ If you are interested, you can view the used configurations here:
 :download:`tohoku1000.json <../../_static/text/tohoku1000.json>`
 
 Results
-^^^^^^^^^^
+--------
 
 ..  list-table:: execution times on different devices
     :header-rows: 1
@@ -187,17 +193,18 @@ Results
   and stopped after the program has finished and all memory has been freed.
 
 Observations
-^^^^^^^^^^^^^^
+--------------
 
 In every scenario, ARA had a faster setup time but slower computation times.
 We conclude that ARA has faster data/file access (because the setup heavily depends on data reading speed from a file)
 while the private PC seems to have better single core performance.
 
+**************
 8.2 Compilers
-===============
+**************
 
 8.2.1 - Generic compiler support
----------------------------------
+=================================
 
 We enabled generic compiler support by adding the following lines to our ``SConstruct`` file
 
@@ -221,10 +228,10 @@ Now, scons can be invoked with a compiler of choice, for example by running
   CXX=icpc scons
 
 8.2.2 & 8.2.3 - Test runs
---------------------------
+===========================
 
 Time measurements
-^^^^^^^^^^^^^^^^^^^^^^^^^
+------------------
 
 For each run, we used the following configuration:
 
@@ -313,7 +320,7 @@ We therefore ended up using ``compiler/intel/2018-Update1`` and ``gcc (GCC) 4.8.
 This configuration was the only one that worked for us, as we did not manage to fix all the errors that were thrown at us.
 
 Observations from the table
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+----------------------------
 
 As one would intuitively expect, the higher the optimization level is,
 the quicker the process finished.
@@ -328,7 +335,7 @@ We would also need to ensure that there are no other intensive processes running
 Nonetheless, by using the table as a rough estimate it seems that ``g++`` is faster when using ``-O0`` and ``-Ofast`` while ``icpc`` is preferable for ``-O2``.
 
 8.2.3 - Optimization flags
----------------------------
+===========================
 
 To allow for an easy switch between optimization flag, we added following code to our SConstruct:
 
@@ -355,7 +362,7 @@ and
     env.Append( CXXFLAGS = [ env['opt'] ] ) 
 
 The dangers of -Ofast
-^^^^^^^^^^^^^^^^^^^^^^^
+----------------------
 
 One of the options that ``-Ofast`` enables is ``-ffast-math``.
 With that, a whole lot of other options get activated as well, such as
@@ -386,7 +393,7 @@ and
 `<https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html>`_
 
 8.2.4 - Compiler reports
-------------------------
+=========================
 
 We added the support for a compiler report flag with the following lines in our ``SConstruct``
 
@@ -435,7 +442,7 @@ This snippet refers to the loops that provide our solver with data from a setup:
     }  
 
 F-Wave optimization report
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+---------------------------
 
 The full report can be found :download:`here. <../../_static/text/task8-2-4_fwave_optrpt.txt>`
 
@@ -484,7 +491,7 @@ For ``netUpdates``, the report tells us that
 We can conclude that the compiler is able to inline our calls to ``computeEigenvalues`` and ``computeEigencoefficients``.
 
 WavePropagation2d optimization report
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+--------------------------------------
 
 The full report can be found :download:`here. <../../_static/text/task8-2-4_waveprop2d_optrpt.txt>`
 
@@ -514,12 +521,12 @@ could not be vectorized:
   Lines 86 and 88 are the two for-loops for y- and x-axis of the x-sweep and 
   lines 152 and 154 are the two for-loops for y- and x-axis of the y-sweep. 
 
-
+*********************************************
 8.3 Instrumentation and Performance Counters
-==============================================
+*********************************************
 
 8.3.1 to 8.3.4 - VTune
------------------------
+=======================
 
 First we used the gui of Intel vTune to specify our reports.
 
@@ -542,7 +549,7 @@ Then the following batch script was used to run the hotspots measurement:
   /cluster/intel/vtune_profiler_2020.2.0.610396/bin64/vtune -collect hotspots -app-working-dir /beegfs/xe63nel/tsunami_lab/build -- /beegfs/xe63nel/tsunami_lab/build/tsunami_lab ../configs/config.json
 
 Hotspots
-^^^^^^^^^^
+---------
 
 ..  image:: ../../_static/assets/task_8-3-1_hotspot_bottomUp.png
 
@@ -564,7 +571,7 @@ It was interesting to see (although it should not come as a surprise) that the `
 of the CPU time. 
 
 Threads
-^^^^^^^^^^
+--------
 
 ..  image:: ../../_static/assets/task_8-3-1_threads.png
 
@@ -573,10 +580,10 @@ Threads
 The poor result for the thread report was also expected, because we only compute sequentially.
 
 8.3.5 - Code optimizations
----------------------------
+===========================
 
 TsunamiEvent2d speedup
-^^^^^^^^^^^^^^^^^^^^^^^
+-----------------------
 
 In order to increase the speed of this setup, we introduced a variable ``lastnegativeIndex`` for the X and Y direction for the bathymetry and displacement.
 The idea is the following: 
@@ -634,7 +641,7 @@ Code snippets of the implementation:
     }
 
 F-Wave solver optimization  
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+----------------------------
 
 In ``computeEigencoefficients``, we changed
 
@@ -677,7 +684,7 @@ Furthermore, we established a constant for :code:`t_real(0.5) * m_g`:
 
 
 Coarse Output optimization
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
+----------------------------
 
 Inside the ``write()`` function in ``NetCdf.cpp`` we calculated
 
@@ -699,8 +706,9 @@ once and then reuse it wherever we need it:
 
 This way, the division only happens once.
 
+************************
 Individual phase ideas
-========================
+************************
 
 For the individual phase, we plan on building a graphical user interface using `ImGui <https://github.com/ocornut/imgui>`_.