Skip to content

Commit

Permalink
Merge pull request #24 from xLPMG/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
xLPMG authored Feb 1, 2024
2 parents 033dc9b + 0b134f3 commit 7e8d78e
Show file tree
Hide file tree
Showing 21 changed files with 861 additions and 625 deletions.
16 changes: 16 additions & 0 deletions configs/1d.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"solver": "fwave",
"simulationSizeX": 10,
"simulationSizeY": 1,
"offsetX": 0,
"offsetY": 0,
"nx":10,
"ny":1,
"setup":"DAMBREAK1D",
"writingFrequency":10,
"endTime":10,
"baseHeight":5,
"height":100,
"outputMethod":"csv",
"timeStepScaling":0.2
}
10 changes: 4 additions & 6 deletions configs/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,8 @@
"ny":100,
"setup":"CIRCULARDAMBREAK2D",
"writingFrequency":50,
"endTime":50,
"stations":[
{ "name":"station_1", "locX":0, "locY":10 },
{ "name":"station_2", "locX":5, "locY":10 },
{ "name":"station_3", "locX":10, "locY":10 }
]
"endTime":30,
"baseHeight":5,
"diameter":10,
"height":100
}
2 changes: 1 addition & 1 deletion docs/Doxyfile
Original file line number Diff line number Diff line change
Expand Up @@ -943,7 +943,7 @@ WARN_LOGFILE =
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
# Note: If this tag is empty the current directory is searched.

INPUT = ../../src
INPUT = ../../src ../../lib

# This tag can be used to specify the character encoding of the source files
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
Expand Down
3 changes: 1 addition & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,11 @@
# Doxygen
subprocess.call('doxygen ../Doxyfile', shell=True)


# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = 'tsunami_lab'
copyright = '2023, Luca-Philipp Grumbach & Richard Hofmann'
copyright = '2024, Luca-Philipp Grumbach & Richard Hofmann'
author = 'Luca-Philipp Grumbach & Richard Hofmann'

# -- General configuration ---------------------------------------------------
Expand Down
72 changes: 40 additions & 32 deletions docs/source/files/assignments/08.rst
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
################
8. Optimization
*****************
################

**********
8.1 ARA
========
**********

.. figure:: https://wiki.uni-jena.de/download/attachments/22453005/IMG_7381_0p5.JPG?version=1&modificationDate=1625042348365&api=v2

HPC-Cluster ARA. Source: https://wiki.uni-jena.de/pages/viewpage.action?pageId=22453005

8.1.1 - Uploading and running the code
----------------------------------------
========================================

First we cloned our github repository to "beegfs" and transfered the bythymetry and displacement data with "wget https://cloud.uni-jena.de/s/CqrDBqiMyKComPc/download/data_in.tar.xz -O tsunami_lab_data_in.tar.xz" there.

Expand Down Expand Up @@ -41,9 +43,10 @@ sbatch file:
Since we only want to use one node, we set ``nodes`` and ``ntasks`` to 1 and ``cpus-per-task`` to 72.

8.1.2 - Visualizations
--------------------------
========================

**Tohoku 5000**
Tohoku 5000
-----------

.. raw:: html

Expand All @@ -52,7 +55,8 @@ Since we only want to use one node, we set ``nodes`` and ``ntasks`` to 1 and ``c
</video>


**Tohoku 1000**
Tohoku 1000
-----------

.. raw:: html

Expand All @@ -62,15 +66,17 @@ Since we only want to use one node, we set ``nodes`` and ``ntasks`` to 1 and ``c



**Chile 5000**
Chile 5000
-----------

.. raw:: html

<video width="100%" height="auto" controls>
<source src="../../_static/assets/task_8-1-2_chile_5000.mp4" type="video/mp4">
</video>

**Chile 1000**
Chile 1000
-----------

.. raw:: html

Expand All @@ -82,15 +88,15 @@ Since we only want to use one node, we set ``nodes`` and ``ntasks`` to 1 and ``c
Comparing to the simulations from assignment 6, it is clear that all simulations behave equally.

8.1.3 - Private PC vs ARA
---------------------------
===========================

.. note::

The code was compiled using ``scons mode=benchmark opt=-O2``.
The benchmarking mode disables all file output (and also skips all imports of ``<filesystem>``).

Setups
^^^^^^^^^^
-------

If you are interested, you can view the used configurations here:

Expand All @@ -103,7 +109,7 @@ If you are interested, you can view the used configurations here:
:download:`tohoku1000.json <../../_static/text/tohoku1000.json>`

Results
^^^^^^^^^^
--------

.. list-table:: execution times on different devices
:header-rows: 1
Expand Down Expand Up @@ -187,17 +193,18 @@ Results
and stopped after the program has finished and all memory has been freed.

Observations
^^^^^^^^^^^^^^
--------------

In every scenario, ARA had a faster setup time but slower computation times.
We conclude that ARA has faster data/file access (because the setup heavily depends on data reading speed from a file)
while the private PC seems to have better single core performance.

**************
8.2 Compilers
===============
**************

8.2.1 - Generic compiler support
---------------------------------
=================================

We enabled generic compiler support by adding the following lines to our ``SConstruct`` file

Expand All @@ -221,10 +228,10 @@ Now, scons can be invoked with a compiler of choice, for example by running
CXX=icpc scons
8.2.2 & 8.2.3 - Test runs
--------------------------
===========================

Time measurements
^^^^^^^^^^^^^^^^^^^^^^^^^
------------------

For each run, we used the following configuration:

Expand Down Expand Up @@ -313,7 +320,7 @@ We therefore ended up using ``compiler/intel/2018-Update1`` and ``gcc (GCC) 4.8.
This configuration was the only one that worked for us, as we did not manage to fix all the errors that were thrown at us.

Observations from the table
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------

As one would intuitively expect, the higher the optimization level is,
the quicker the process finished.
Expand All @@ -328,7 +335,7 @@ We would also need to ensure that there are no other intensive processes running
Nonetheless, by using the table as a rough estimate it seems that ``g++`` is faster when using ``-O0`` and ``-Ofast`` while ``icpc`` is preferable for ``-O2``.

8.2.3 - Optimization flags
---------------------------
===========================

To allow for an easy switch between optimization flag, we added following code to our SConstruct:

Expand All @@ -355,7 +362,7 @@ and
env.Append( CXXFLAGS = [ env['opt'] ] )
The dangers of -Ofast
^^^^^^^^^^^^^^^^^^^^^^^
----------------------
One of the options that ``-Ofast`` enables is ``-ffast-math``.
With that, a whole lot of other options get activated as well, such as
Expand Down Expand Up @@ -386,7 +393,7 @@ and
`<https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html>`_
8.2.4 - Compiler reports
------------------------
=========================
We added the support for a compiler report flag with the following lines in our ``SConstruct``
Expand Down Expand Up @@ -435,7 +442,7 @@ This snippet refers to the loops that provide our solver with data from a setup:
}
F-Wave optimization report
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
---------------------------
The full report can be found :download:`here. <../../_static/text/task8-2-4_fwave_optrpt.txt>`
Expand Down Expand Up @@ -484,7 +491,7 @@ For ``netUpdates``, the report tells us that
We can conclude that the compiler is able to inline our calls to ``computeEigenvalues`` and ``computeEigencoefficients``.
WavePropagation2d optimization report
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--------------------------------------
The full report can be found :download:`here. <../../_static/text/task8-2-4_waveprop2d_optrpt.txt>`
Expand Down Expand Up @@ -514,12 +521,12 @@ could not be vectorized:
Lines 86 and 88 are the two for-loops for y- and x-axis of the x-sweep and
lines 152 and 154 are the two for-loops for y- and x-axis of the y-sweep.
*********************************************
8.3 Instrumentation and Performance Counters
==============================================
*********************************************
8.3.1 to 8.3.4 - VTune
-----------------------
=======================
First we used the gui of Intel vTune to specify our reports.
Expand All @@ -542,7 +549,7 @@ Then the following batch script was used to run the hotspots measurement:
/cluster/intel/vtune_profiler_2020.2.0.610396/bin64/vtune -collect hotspots -app-working-dir /beegfs/xe63nel/tsunami_lab/build -- /beegfs/xe63nel/tsunami_lab/build/tsunami_lab ../configs/config.json
Hotspots
^^^^^^^^^^
---------
.. image:: ../../_static/assets/task_8-3-1_hotspot_bottomUp.png
Expand All @@ -564,7 +571,7 @@ It was interesting to see (although it should not come as a surprise) that the `
of the CPU time.
Threads
^^^^^^^^^^
--------
.. image:: ../../_static/assets/task_8-3-1_threads.png
Expand All @@ -573,10 +580,10 @@ Threads
The poor result for the thread report was also expected, because we only compute sequentially.
8.3.5 - Code optimizations
---------------------------
===========================
TsunamiEvent2d speedup
^^^^^^^^^^^^^^^^^^^^^^^
-----------------------
In order to increase the speed of this setup, we introduced a variable ``lastnegativeIndex`` for the X and Y direction for the bathymetry and displacement.
The idea is the following:
Expand Down Expand Up @@ -634,7 +641,7 @@ Code snippets of the implementation:
}
F-Wave solver optimization
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------
In ``computeEigencoefficients``, we changed
Expand Down Expand Up @@ -677,7 +684,7 @@ Furthermore, we established a constant for :code:`t_real(0.5) * m_g`:
Coarse Output optimization
^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------
Inside the ``write()`` function in ``NetCdf.cpp`` we calculated
Expand All @@ -699,8 +706,9 @@ once and then reuse it wherever we need it:
This way, the division only happens once.
************************
Individual phase ideas
========================
************************
For the individual phase, we plan on building a graphical user interface using `ImGui <https://github.com/ocornut/imgui>`_.
Expand Down
Loading

0 comments on commit 7e8d78e

Please sign in to comment.