Skip to content

Commit

Permalink
Merge pull request #25 from ecmwf/feature/direct_config
Browse files Browse the repository at this point in the history
Add config and user_config options to pyfdb
  • Loading branch information
simondsmart authored Oct 8, 2024
2 parents 0527eb8 + a41a611 commit 8a70b73
Show file tree
Hide file tree
Showing 7 changed files with 301 additions and 27 deletions.
56 changes: 52 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,27 @@ import shutil
fdb = pyfdb.FDB()
```

A config and userconfig can also be passed directly to the initialization function:
```python
config = dict(
type="local",
engine="toc",
schema="/path/to/fdb_schema",
spaces=[
dict(
handler="Default",
roots=[
{"path": "/path/to/root"},
],
)
],
)

fdb = pyfdb.FDB(config = config, userconfig = {})
# Now use fdb.list, fdb.archive etc
```
The module level functions `pyfdb.list, pyfdb.archive` etc use the default `pyfdb.FDB()` initialization with the default config search path.

### Archive
```python
key = {
Expand Down Expand Up @@ -100,7 +121,7 @@ for el in pyfdb.list(request, True, True):
# {'class': 'rd', 'date': '20191110', 'domain': 'g', 'expver': 'xxxx', 'stream': 'oper', 'time': '0000', 'levtype': 'pl', 'type': 'an', 'levelist': '400', 'param': '138', 'step': '0'}
```

#### fdb object, request as dicitonary
#### Using the fdb object with the request as a dictionary
As an alternative, use the created FDB instance and start queries from there
```python
request['levelist'] = ['400', '500', '700', '850', '1000']
Expand All @@ -111,7 +132,7 @@ for el in fdb.list(request, True, True):

### Retrieve

#### save to file
#### To a file
```python
import tempfile
import os
Expand Down Expand Up @@ -147,7 +168,7 @@ with open(filename, 'wb') as o, pyfdb.retrieve(request) as i:
shutil.copyfileobj(i, o)
```

#### read into python object
#### Read into memory
```python
datareader = pyfdb.retrieve(request)

Expand All @@ -174,7 +195,7 @@ print(chunk)
datareader.seek(0)
```

#### decode GRIB
#### Decode GRIB
```python
from pyeccodes import Reader
reader = Reader(datareader)
Expand All @@ -185,9 +206,36 @@ grib.dump()


## 3. Development

### Pre-Commit Hooks

Pre-commit hooks are supplied in `.pre-commit-config.yaml` to lint and format the code before committing. To activate this:
```bash
pip install pre-commit
pre-commit install # Install the hooks so that they run before `git commit`
```
At the moment this runs isort, black and flake8, if any of these encounter errors they can't autofix then the commit will be blocked.

### Run Unit Tests

To run the unit tests, make sure that the `pytest` module is installed first:

```sh
python -m pytest
```
to test against a source build of fdb5 use:
```
FDB_HOME=/path/to/build/fdb5 python -m pytest
```

### Run Unit Tests across multiple python versions with Tox

Tox is a useful tool to quickly run pytest across multiple python versions by managing a set of python environments for you. A tox.ini file is provided that targets python3.8 - 3.12. Note that this will also install older versions of libraries like numpy which helps to catch incompatibilities with older versions of those libraries too.

To run tox, [install it](https://tox.wiki/), modify the `FDB5_HOME = ../build` line in `tox.ini` to point to a build of fdb5, this will be reused for all the tests. If your fdb5 is built as part of a bundle and `FDB5_HOME` points to the bundle build root, you may need to copy `build/fdb5/etc/fdb` to `build/etc/fdb` because by default fdb looks for a schema in `build/etc/fdb`.

Then run
```sh
tox
```
The first run will take a while for it to install all the environments but after that it's very fast.
24 changes: 24 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,30 @@ An example of archival, listing and retrieval via pyfdb is shown next. For the e
fdb = pyfdb.FDB()
A config and userconfig can also be passed directly to the initialization function.

.. code:: python
config = dict(
type="local",
engine="toc",
schema="/path/to/fdb_schema",
spaces=[
dict(
handler="Default",
roots=[
{"path": "/path/to/root"},
],
)
],
)
fdb = pyfdb.FDB(config = config, userconfig = {})
# Now use fdb.list, fd.archive, fdb.retrieve etc
The module level functions `pyfdb.list, pyfdb.archive` etc use the default `pyfdb.FDB()` initialization with the default config search path so when passing config directly you must then use the FDB instance methods.

**Archive**

.. code:: python
Expand Down
38 changes: 38 additions & 0 deletions pyfdb/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,39 @@
"""PyFDB is python client to the FDB5 library. It provides list, retrieve,
and archive functions to interact with FDB5 databases.
See the docstrings of pyfdb.list, pyfdb.archive, pyfdb.retrieve for more information.
Example:
import pyfdb
fdb = pyfdb.FDB()
fdb.archive(open('x138-300.grib', "rb").read(), key)
fdb.flush()
for entry in fdb.list(request = {...}, duplicates=False, keys=False):
print(entry)
Locating the FDB5 library:
PyFDB uses findlibs to locate the fdb5 shared library. findlibs will attempt to
locate the FDB5 library by looking in the following locations:
* sys.prefix
* $CONDA_PREFIX
* $FDB5_HOME or $FDB5_DIR
* $LD_LIBRARY_PATH and $DYLD_LIBRARY_PATH
* "/", "/usr/", "/usr/local/", "/opt/", and "/opt/homebrew/".
You can set "$FDB5_HOME" or load a conda environment to direct pyFDB to a particular FDB5 library.
You can check which library is being used by printing `pyfdb.lib`
$ print(pyfdb.lib)
<pyfdb.pyfdb.PatchedLib FDB5 version 5.12.1 from /path/to/lib/libfdb5.dylib>
See https://github.com/ecmwf/findlibs for more info on library resolution.
See https://github.com/ecmwf/pyfdb for more information on pyfdb.
"""

from .pyfdb import *
1 change: 1 addition & 0 deletions pyfdb/processed_fdb.h
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ int fdb_delete_datareader(fdb_datareader_t* dr);
struct fdb_handle_t;
typedef struct fdb_handle_t fdb_handle_t;
int fdb_new_handle(fdb_handle_t** fdb);
int fdb_new_handle_from_yaml(fdb_handle_t** fdb, const char* config, const char* user_config);
int fdb_archive(fdb_handle_t* fdb, fdb_key_t* key, const char* data, size_t length);
int fdb_archive_multiple(fdb_handle_t* fdb, fdb_request_t* req, const char* data, size_t length);
int fdb_list(fdb_handle_t* fdb, const fdb_request_t* req, fdb_listiterator_t** it, bool duplicates);
Expand Down
109 changes: 86 additions & 23 deletions pyfdb/pyfdb.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,17 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import io
import json
import os
from functools import wraps

import cffi
import findlibs
from pkg_resources import parse_version
from packaging import version

__version__ = "0.0.4"

__fdb_version__ = "5.11.0"
__fdb_version__ = "5.12.1"

ffi = cffi.FFI()

Expand All @@ -38,19 +40,14 @@ class PatchedLib:
and patches the accessors with automatic python-C error handling.
"""

__type_names = {}

def __init__(self):
self.path = findlibs.find("fdb5")

libName = findlibs.find("fdb5")

if libName is None:
if self.path is None:
raise RuntimeError("FDB5 library not found")

ffi.cdef(self.__read_header())
self.__lib = ffi.dlopen(libName)

# Todo: Version check against __version__
self.__lib = ffi.dlopen(self.path)

# All of the executable members of the CFFI-loaded library are functions in the FDB
# C API. These should be wrapped with the correct error handling. Otherwise forward
Expand All @@ -72,10 +69,13 @@ def __init__(self):

tmp_str = ffi.new("char**")
self.fdb_version(tmp_str)
versionstr = ffi.string(tmp_str[0]).decode("utf-8")
self.version = ffi.string(tmp_str[0]).decode("utf-8")

if parse_version(versionstr) < parse_version(__fdb_version__):
raise RuntimeError("Version of libfdb found is too old. {} < {}".format(versionstr, __fdb_version__))
if version.parse(self.version) < version.parse(__fdb_version__):
raise RuntimeError(
f"This version of pyfdb ({__version__}) requires fdb version {__fdb_version__} or greater."
f"You have fdb version {self.version} loaded from {self.path}"
)

def __read_header(self):
with open(os.path.join(os.path.dirname(__file__), "processed_fdb.h"), "r") as f:
Expand All @@ -98,6 +98,9 @@ def wrapped_fn(*args, **kwargs):

return wrapped_fn

def __repr__(self):
return f"<pyfdb.pyfdb.PatchedLib FDB5 version {self.version} from {self.path}>"


# Bootstrap the library

Expand All @@ -118,7 +121,9 @@ def __init__(self, keys):

def set(self, param, value):
lib.fdb_key_add(
self.__key, ffi.new("const char[]", param.encode("ascii")), ffi.new("const char[]", value.encode("ascii"))
self.__key,
ffi.new("const char[]", param.encode("ascii")),
ffi.new("const char[]", value.encode("ascii")),
)

@property
Expand Down Expand Up @@ -279,30 +284,83 @@ def __exit__(self, exc_type, exc_val, exc_tb):


class FDB:
"""This is the main container class for accessing FDB"""
"""This is the main container class for accessing FDB
Usage:
fdb = pyfdb.FDB()
# call fdb.archive, fdb.list, fdb.retrieve, fdb.flush as needed.
See the module level pyfdb.list, pyfdb.retrieve, and pyfdb.archive
docstrings for more information on these functions.
"""

__fdb = None

def __init__(self):
def __init__(self, config=None, user_config=None):
fdb = ffi.new("fdb_handle_t**")
lib.fdb_new_handle(fdb)

if config is not None or user_config is not None:

def prepare_config(c):
if c is None:
return ""
if not isinstance(c, str):
return json.dumps(c)
return c

config = prepare_config(config)
user_config = prepare_config(user_config)

lib.fdb_new_handle_from_yaml(
fdb,
ffi.new("const char[]", config.encode("utf-8")),
ffi.new("const char[]", user_config.encode("utf-8")),
)
else:
lib.fdb_new_handle(fdb)

# Set free function
self.__fdb = ffi.gc(fdb[0], lib.fdb_delete_handle)

def archive(self, data, request=None):
def archive(self, data, request=None) -> None:
"""Archive data into the FDB5 database
Args:
data (bytes): bytes data to be archived
request (dict | None): dictionary representing the request to be associated with the data,
if not provided the key will be constructed from the data.
"""
if request is None:
lib.fdb_archive_multiple(self.ctype, ffi.NULL, ffi.from_buffer(data), len(data))
else:
lib.fdb_archive_multiple(self.ctype, Request(request).ctype, ffi.from_buffer(data), len(data))

def flush(self):
def flush(self) -> None:
"""Flush any archived data to disk"""
lib.fdb_flush(self.ctype)

def list(self, request=None, duplicates=False, keys=False):
def list(self, request=None, duplicates=False, keys=False) -> ListIterator:
"""List entries in the FDB5 database
Args:
request (dict): dictionary representing the request.
duplicates (bool) = false : whether to include duplicate entries.
keys (bool) = false : whether to include the keys for each entry in the output.
Returns:
ListIterator: an iterator over the entries.
"""
return ListIterator(self, request, duplicates, keys)

def retrieve(self, request) -> DataRetriever:
"""Retrieve data as a stream.
Args:
request (dict): dictionary representing the request.
Returns:
DataRetriever: An object implementing a file-like interface to the data stream.
"""
return DataRetriever(self, request)

@property
Expand All @@ -313,27 +371,32 @@ def ctype(self):
fdb = None


def archive(data):
# Use functools.wraps to copy over the docstring from FDB.xxx to the module level functions
@wraps(FDB.archive)
def archive(data) -> None:
global fdb
if not fdb:
fdb = FDB()
fdb.archive(data)


def list(request, duplicates=False, keys=False):
@wraps(FDB.list)
def list(request, duplicates=False, keys=False) -> ListIterator:
global fdb
if not fdb:
fdb = FDB()
return ListIterator(fdb, request, duplicates, keys)


def retrieve(request):
@wraps(FDB.retrieve)
def retrieve(request) -> DataRetriever:
global fdb
if not fdb:
fdb = FDB()
return DataRetriever(fdb, request)


@wraps(FDB.flush)
def flush():
global fdb
if not fdb:
Expand Down
Loading

0 comments on commit 8a70b73

Please sign in to comment.