Add SSH connection as option to the database credentials file (#169)

tornede · Feb 20, 2024 · 8eca12b · 8eca12b
1 parent acf09a6
commit 8eca12b
Show file tree

Hide file tree

Showing 21 changed files with 1,708 additions and 1,185 deletions.
diff --git a/.gitignore b/.gitignore
@@ -137,14 +137,18 @@ dmypy.json
 
 # todo
 todo.md
-config/database_credentials.cfg
-config/example*.cfg
+
+# Configs
+config/database_credentials.yml
+config/example_conditional_grid.yml
+config/example_general_usage.yml
+config/example_logtables.yml
+config/example_pause_and_continue.yml
 output/
 
 # codecarbon
 .codecarbon.config
 emissions.csv
-config/*.yml
 
 # development folder
-development/
+development/
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -7,7 +7,9 @@ v1.4.0 (??.??.2024)
 
 Feature
 -------
-- Changed supported experiment configuration file type to YAML. 
+- Change the supported database configuration file type to YAML.
+- Change the supported credentials file type to YAML.
+- Add support for ssh jump hosts in the database connection.
 
 
 v1.3.2 (23.01.2024)

diff --git a/config/database_credentials.cfg b/config/database_credentials.cfg
@@ -0,0 +1,4 @@
+[CREDENTIALS]
+host=apollo.ai.uni-hannover.de 
+user=testuser_pyexperimenter
+password=c2ncKK3siSBkCuGE
diff --git a/config/example_database_credentials.cfg b/config/example_database_credentials.cfg
diff --git a/config/example_database_credentials.yml b/config/example_database_credentials.yml
@@ -0,0 +1,16 @@
+CREDENTIALS:
+  Database:
+    user: example_user
+    password: example_password
+  Connection:
+    Standard: 
+      server: example.mysqlserver.com
+    Ssh:
+      server: example.sshmysqlserver.com (address from ssh server)
+      address: example.sslserver.com
+      port: optional_ssh_port
+      remote_address: optional_mysql_server_address
+      remote_port: optional_mysql_server_port
+      local_address: optional_local_address
+      local_port: optional_local_port
+      passphrase: optional_ssh_passphrase
diff --git a/docs/source/usage/database_credential_file.rst b/docs/source/usage/database_credential_file.rst
@@ -4,13 +4,42 @@
 Database Credential File
 ------------------------
 
-When working with ``MySQL`` as a database provider, an additional database credential file is needed, containing the credentials for accessing the database:
+When working with ``MySQL`` as a database provider, an additional database credential file is needed, containing the credentials for accessing the database.
+By default, this file is located at ``config/database_credentials.yml``. If this is not the case, the corresponding path has to be explicitly given when :ref:`executing <execution>` ``PyExperimenter``.
+Below is an example of a database credential file, that connects to a server with the address ``example.mysqlserver.com`` using the user ``example_user`` and the password ``example_password``. 
 
-.. code-block:: 
+.. code-block:: yaml
 
-    [CREDENTIALS]
-    host = <host>
-    user = <user>
-    password = <password>
+    CREDENTIALS:
+      Database:
+        user: example_user
+        password: example_password
+      Connection:
+        Standard: 
+          server: example.mysqlserver.com
 
-By default, this file is located at ``config/database_credentials.cfg``. If this is not the case, the corresponding path has to be explicitly given when :ref:`executing <execution>` ``PyExperimenter``.
+However, for security reasons, databases might only be accessible from a specific IP address. In these cases, one can use an ssh jumphost. This means that ``PyExperimenter`` will first connect to the ssh server
+that has access to the database and then connect to the database server from there. This is done by adding an additional ``Ssh`` section to the database credential file.
+The following example shows how to connect to a database server using an SSH server with the address ``ssh_hostname`` and the port ``optional_ssh_port``.
+
+.. code-block:: yaml
+
+    CREDENTIALS:
+      Database:
+        user: example_user
+        password: example_password
+      Connection:
+        Standard: 
+          server: example.sshmysqlserver.com
+        Ssh:
+          server: example.mysqlserver.com (address from ssh server)
+          address: ssh_hostname (either name/ip address of the ssh server or a name from you local ssh config file)
+          port: optional_ssh_port (default: 22)
+          passphrase: passphrase
+          remote_address: optional_mysql_server_address (default: 127.0.0.1)
+          remote_port: optional_mysql_server_port (default: 3306)
+          local_address: optional_local_address (default: 127.0.0.1)
+          local_port: optional_local_port (default: 3306)
+
+.. note::
+  Note that we do not support further parameters for the SSH connection, such as explicitly setting the private key file. To use these, you have to adapt your local ssh config file.
diff --git a/docs/source/usage/execution.rst b/docs/source/usage/execution.rst
@@ -6,7 +6,7 @@ Executing PyExperimenter
 
 The actual execution of ``PyExperimenter`` only needs a few lines of code. Please make sure that you have created the :ref:`experiment configuration file <experiment_configuration_file>` and defined the :ref:`experiment function <experiment_function>` beforehand. 
 
-.. code-block:: 
+.. code-block:: python
 
     from py_experimenter.experimenter import PyExperimenter
 
@@ -24,22 +24,22 @@ Creating a PyExperimenter
 
 A ``PyExperimenter`` can be created without any further information, assuming the :ref:`experiment configuration file <experiment_configuration_file>` can be accessed at its default location.
 
-.. code-block:: 
+.. code-block:: python
 
     experimenter = PyExperimenter()
 
 Additionally, further information can be given to ``PyExperimenter``:
 
 - ``experiment_configuration_file_path``: The path of the :ref:`experiment configuration file <experiment_configuration_file>`. Default: ``config/experiment_configuration.cfg``.
 - ``database_credential_file_path``: The path of the :ref:`database credential file <database_credential_file>`. Default: ``config/database_credentials.cfg``
+- ``use_ssh_tunnel``: Specifies if a SSH tunnel will be used to connect to the database. Default: ``False``. If ``use_ssh_tunnel`` is set to ``True``, creating a ``PyExperimenter`` will also open an ssh tunnel, which should be :ref:`closed manually <close_ssh_tunnel>`. The details of the ssh-connection have to be specified in the :ref:`database credential file <database_credential_file>`.
 - ``database_name``: The name of the database to manage the experiments. If given, it will overwrite the database name given in the `experiment_configuration_file_path`.
 - ``table_name``: The name of the database table to manage the experiments. If given, it will overwrite the table name given in the `experiment_configuration_file_path`.
 - ``use_codecarbon``: Specifies if :ref:`CodeCarbon <experiment_configuration_file_codecarbon>` will be used to track experiment emissions. Default: ``True``. 
 - ``name``: The name of the experimenter, which will be added to the database table of each executed experiment. If using the PyExperimenter on an HPC system, this can be used for the job ID, so that the according log file can easily be found. Default: ``PyExperimenter``.
 - ``logger_name``: The name of the logger, which will be used to log information about the execution of the PyExperimenter. If there already exists a logger with the given ``logger_name``, it will be used instead. However, the ``log_file`` will be ignored in this case. The logger will then be passed to every component of ``PyExperimenter``, so that all information is logged to the same file. Default: ``py-experimenter``.
 - ``log_level``: The log level of the logger. Default: ``INFO``.
-- ``log_file``: The path of the log file. Default: ``py-experimenter.log``.	 
-
+- ``log_file``: The path of the log file. Default: ``py-experimenter.log``.     
 
 -------------------
 Fill Database Table
@@ -59,7 +59,7 @@ Fill Table From Experiment Configuration File
 
 The database table can be filled with the cartesian product of the keyfields defined in the :ref:`experiment configuration file <experiment_configuration_file>`.
 
-.. code-block:: 
+.. code-block:: python
 
     experimenter.fill_table_from_config()
 
@@ -72,7 +72,7 @@ Fill Table With Specific Rows
 
 Alternatively, or additionally, specific rows can be added to the table. Note that ``rows`` is a list of dicts, where each dict has to contain a value for each keyfield. A more complex example featuring a conditional experiment grid can be found in the :ref:`examples section <examples>`.
 
-.. code-block:: 
+.. code-block:: python
 
     experimenter.fill_table_with_rows(rows=[
         {
@@ -97,7 +97,7 @@ Execute Experiments
 
 An experiment can be executed easily with the following call:
 
-.. code-block:: 
+.. code-block:: python
 
     experimenter.execute(
         experiment_function = run_experiment, 
@@ -117,7 +117,7 @@ Reset Experiments
 
 Each database table contains a ``status`` column, summarizing the current state of an experiment. Experiments can be reset based on these states. If this is done, the table rows having a given status will be deleted, and corresponding new rows without results will be created. A comma separated list of ``status`` has to be provided.
 
-.. code-block:: 
+.. code-block:: python
     
     experimenter.reset_experiments(<status>, <status>, ...)
 
@@ -138,7 +138,7 @@ Obtain Results
 
 The current content of the database table can be obtained as a ``pandas.DataFrame``. This can, for example, be used to generate a result table and export it to LaTeX.
 
-.. code-block:: 
+.. code-block:: python
 
     result_table = experimenter.get_table()
     result_table = result_table.groupby(['dataset']).mean()[['seed']]
@@ -164,11 +164,11 @@ Tracking information about the carbon footprint of experiments is supported via
 Pausing and Unpausing Experiments
 ---------------------------------
 
-For convenience, we support pausing and unpausing experiments. This means that you can use one ``PyExperimenter`` to start an experiment, which will be paused after certain operations. Therefore, it can be resumed later on. Afterwards, depending on the parametrization of ``execute()`` of the ``PyExperimenter`` instance (see :ref:`asdf <execute_experiments:>`), the experimenter terminates or another experiment will be started. 
+For convenience, we support pausing and unpausing experiments. This means that you can use one ``PyExperimenter`` to start an experiment, which will be paused after certain operations. Therefore, it can be resumed later on. Afterwards, depending on the parametrization of ``execute()`` of the ``PyExperimenter`` instance (see :ref:`in Execute Experiments <execute_experiments>`), the experimenter terminates or another experiment will be started. 
 
 To pause an experiment, the experiment function has to return the state ``ExperimentStatus.PAUSED``:
 
-.. code-block:: 
+.. code-block:: python
 
     def run_experiment_until_pause(keyfields: dict, result_processor: ResultProcessor, custom_fields: dict):
         # do something
@@ -187,7 +187,7 @@ To pause an experiment, the experiment function has to return the state ``Experi
 
 At a later point in time, the experiment can be unpaused and continued. This can be done by calling ``unpause_experiment()`` on ``PyExperimenter`` instance given the specific ``experiment_id`` of the experiment to continue, together with a separate experiment function, which only contains experiment code to be executed after the pause. Note that only a single ``experiment_id`` can be executed at the same time, i.e. there is no parallelization of unpausing multiple ``experiment_id`` supported.
 
-.. code-block:: 
+.. code-block:: python
 
     def run_experiment_after_pause(keyfields: dict, result_processor: ResultProcessor, custom_fields: dict):
         # do something
@@ -201,3 +201,17 @@ At a later point in time, the experiment can be unpaused and continued. This can
 
 A complete example on how to pause and continue an experiment can be found in the :ref:`examples section <examples>`.
 
+
+
+.. _close_ssh_tunnel:
+
+----------------
+Close SSH Tunnel
+----------------
+
+If an SSH tunnel was opened during the creation of the ``PyExperimenter``, it has to be closed manually by calling the following method:
+
+.. code-block:: python
+
+    experimenter.execute(...)
+    experimenter.close_ssh_tunnel()
diff --git a/docs/source/usage/experiment_function.rst b/docs/source/usage/experiment_function.rst
@@ -6,7 +6,7 @@ Experiment Function
 
 The execution of a single experiment has to be defined within a function. The function is called with the ``keyfields`` values of a database entry. The results are meant to be processed to be written into the database, i.e. as ``resultfields``. During the experiment different information can be logged into ``logtables``.
 
-.. code-block:: 
+.. code-block:: python
 
     import os
     from py_experimenter.result_processor import ResultProcessor
@@ -58,7 +58,7 @@ Push Data To Resultfields
 
 ``Resultfields`` can be filled any time during the execution process by calling the following code within your experiment function, e.g. ``run_ml``. Note that a resultfield is meant to be written once, if you re-write a resultfield, the old value will be overwritten. Furthermore note that you do not have to write all resultfields at once, but can also only write a subset as demonstrated in the example above. Multiple in-depth examples showcasing the usage of resultfields can be found within the :ref:`examples section <examples>`.
 
-.. code-block:: 
+.. code-block:: python
 
     result_processor.process_results({
             '<resultfield_name>': <resultfield_value>, 
@@ -75,7 +75,7 @@ Push Data To Logtables
 
 ``Logtables`` can be filled any time during the execution process by calling the following code within your experiment function, e.g. ``run_ml``. An in-depth example showcasing the usage of logtables can be found within the :ref:`examples section <examples>`.
 
-.. code-block:: 
+.. code-block:: python
 
     result_processor.process_logs({
         '<logtable_name>': {