Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge main into v1.6.0 branch #463

Merged
merged 145 commits into from
Jan 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
145 commits
Select commit Hold shift + click to select a range
41a6ed0
Change distance search in merge_split_MEST to use BallTree query, and…
w-k-jones Nov 29, 2023
098e4d8
Fix dimension order for coordinates in merge_split
w-k-jones Nov 29, 2023
6b5cb68
Add tests for merge_split with PBCs
w-k-jones Nov 29, 2023
3deffa9
Speed up adding edges to graph using add_weighted_edges_from
w-k-jones Nov 29, 2023
2d5d366
Remove unused imports
w-k-jones Nov 29, 2023
1c7ed6f
Apply frame_len filter before assigning edges to the graph, and add n…
w-k-jones Nov 29, 2023
a51f027
Use connected_components from scipy.sparse.csgraph to link cells into…
w-k-jones Nov 29, 2023
5b4aa05
Fix handling of unassigned features in merge_split_MEST and add test …
w-k-jones Nov 29, 2023
ce36b3b
Remove unused import
w-k-jones Nov 29, 2023
eff22f6
Fix distance calculation for 2D case
w-k-jones Nov 29, 2023
de01424
Add flag for cells starting with split/ending with merge
w-k-jones Nov 29, 2023
0b75bae
Rename cell_starts_with_merge flag
w-k-jones Nov 29, 2023
c888bf9
Add test for 3D merge/split and fix coordinate stack axis
w-k-jones Nov 29, 2023
81bdcce
Fix application of dxy and dz to 3D feature locations in filter_min_d…
w-k-jones Nov 29, 2023
71c4b39
Switch to using scipy.sparse.csgraph implementation of minimum spanni…
w-k-jones Nov 29, 2023
c63632a
Cast start_node_cells and end_node_cells to np.int32 to avoid type mi…
w-k-jones Nov 29, 2023
fa4659c
Use groupby instead of bincount to count number of child cells/features
w-k-jones Nov 30, 2023
e9ec119
Recalculate start and end node cells after applying minimum spanning …
w-k-jones Nov 30, 2023
68f2978
Added notebook for big datasets
freemansw1 Feb 16, 2024
7584bd4
Merge remote-tracking branch 'upstream/RC_v1.5.x' into tobathon_big_d…
freemansw1 Mar 13, 2024
08a5420
Merge branch 'RC_v1.5.x' into tobathon_big_dataset_update
freemansw1 May 9, 2024
adabf96
updated docstrings for n_min_threshold
Jun 21, 2024
3a18cfb
black formatting
Jun 21, 2024
900f72b
Add optional typing for default None keywords
w-k-jones Jun 24, 2024
cdfc8dd
Clip values to range of hdim size in feature_position, or hdim size +…
w-k-jones Jun 24, 2024
8d0fcb8
Reformatting
w-k-jones Jun 24, 2024
294ced5
Remove erroneus docstring entry
w-k-jones Jun 24, 2024
39ae935
removed index.rst and black-formatted the notebook and documentation …
freemansw1 Jul 18, 2024
61371a6
Update to improve visibility of example notebooks
freemansw1 Jul 18, 2024
a9c74c7
black format all notebooks
freemansw1 Jul 18, 2024
0928de9
fix code block
freemansw1 Jul 18, 2024
fd025d0
fix warning statement when feature IDs are not unique
JuliaKukulies Jul 23, 2024
729ebd0
Merge remote-tracking branch 'upstream/RC_v1.5.x' into tobathon_big_d…
freemansw1 Jul 23, 2024
adb864e
fix for sphinx_toolbox
freemansw1 Jul 23, 2024
ed5407a
allow for different ID column name at locations where it was not fixe…
JuliaKukulies Jul 23, 2024
cf6f4c8
added parameters for bulk statistic tests and an extra warning when f…
JuliaKukulies Jul 23, 2024
4ab4342
black formatting
JuliaKukulies Jul 23, 2024
ab260b3
black formatting with right versiobn
JuliaKukulies Jul 23, 2024
337235a
add sphinx_toolbox to requirements
freemansw1 Jul 23, 2024
1cafeba
fix to just import sphinx toolbox code module
freemansw1 Jul 23, 2024
af97821
fixed message
JuliaKukulies Jul 23, 2024
abdce8f
fix code block
freemansw1 Jul 23, 2024
f902c65
and one more formatting
JuliaKukulies Jul 23, 2024
e85c446
Merge remote-tracking branch 'upstream/RC_v1.5.x' into tobathon_big_d…
freemansw1 Jul 24, 2024
e5f64c4
Merge remote-tracking branch 'upstream/RC_v1.5.x' into bulk_stats_bug…
JuliaKukulies Jul 24, 2024
2ede64b
made warning message more readable
JuliaKukulies Jul 24, 2024
5551fb7
fixed docstrings and type hint according to Fabians comments
JuliaKukulies Aug 14, 2024
7d7503e
Merge pull request #432 from JuliaKukulies/n_min_thresholds_docs
JuliaKukulies Aug 14, 2024
67f86cf
Merge pull request #408 from freemansw1/tobathon_big_dataset_update
freemansw1 Aug 15, 2024
dadad41
Merge pull request #437 from JuliaKukulies/bulk_stats_bug_fix
JuliaKukulies Aug 15, 2024
4fa1efa
Merge pull request #434 from w-k-jones/fix_interp_fp_bug
w-k-jones Sep 11, 2024
58ddbdd
fixed error message in bulk statistics when none of the feature label…
JuliaKukulies Sep 19, 2024
0cf4a21
tested with notebook
JuliaKukulies Sep 19, 2024
303d0fe
corrected warning for feature labels in every time step
JuliaKukulies Sep 20, 2024
7296875
formatting
JuliaKukulies Sep 20, 2024
dfd4a04
allows calculation of statistics on raw data
JuliaKukulies Sep 20, 2024
84c347f
black formatting
JuliaKukulies Sep 20, 2024
42db306
black formatting
JuliaKukulies Sep 20, 2024
22314f7
black formatting
JuliaKukulies Sep 20, 2024
3bd7afa
more formatting
JuliaKukulies Sep 20, 2024
b2607df
removed tests from notebook
JuliaKukulies Sep 23, 2024
26fefc7
updated reference
JuliaKukulies Sep 23, 2024
2781713
Fix bug in transform_feature_points when warning of dropped features …
w-k-jones Sep 24, 2024
1dd1722
Formatting
w-k-jones Sep 24, 2024
6470db8
Fix depreciation warnings in tests
w-k-jones Sep 24, 2024
81a580b
Add python 3.12 to matrix testing
w-k-jones Sep 24, 2024
6b3473a
Add optional use of dz or vertical_coord for specifying vertical loca…
w-k-jones Sep 24, 2024
8302b16
Remove erroneus import
w-k-jones Sep 24, 2024
b6a48ac
Merge branch 'RC_v1.5.x' of https://github.com/climate-processes/toba…
w-k-jones Sep 24, 2024
4c96374
Fix changed import location of build_distance_function
w-k-jones Sep 24, 2024
0eac773
added the main tobac publications to top of publication list
JuliaKukulies Sep 24, 2024
ad60ed7
Merge branch 'matrix_testing_fix' into reference_update
JuliaKukulies Sep 24, 2024
401ebca
Fix bug that caused statistics to be recalculated for each threshold …
w-k-jones Sep 25, 2024
fdd1ff8
Move filter_min_distance to after coordinate interpolation in feature…
w-k-jones Sep 25, 2024
b9c2cec
Enable use of vertical coordinate rather than fixed dz in filter_min_…
w-k-jones Sep 25, 2024
c9f3cb9
Add tests for minimum distance filtering using vertical coord
w-k-jones Sep 25, 2024
9c1058f
Formatting
w-k-jones Sep 25, 2024
f5e5e25
Manually reformat
w-k-jones Sep 25, 2024
3609a59
Merge pull request #451 from w-k-jones/matrix_testing_fix
w-k-jones Sep 26, 2024
c06fe82
merge in matrix test fix
JuliaKukulies Sep 26, 2024
ab0420b
Merge remote-tracking branch 'origin/RC_v1.5.x' into RC_v1.5.4
JuliaKukulies Sep 26, 2024
177904e
Merge branch 'RC_v1.5.x' of https://github.com/climate-processes/toba…
w-k-jones Sep 26, 2024
412ab97
Merge branch 'RC_v1.5.x' of https://github.com/climate-processes/toba…
w-k-jones Sep 26, 2024
0040330
directly pass data_i to statistics function instead of making a copy
JuliaKukulies Sep 26, 2024
132ea24
added accidently removed pre-commit yaml file
JuliaKukulies Sep 26, 2024
03315ab
Merge pull request #372 from w-k-jones/merge_split_pbc
w-k-jones Sep 28, 2024
dbe40e6
corrected type hints and docstrings because we require *fields to be …
JuliaKukulies Oct 11, 2024
aa73815
allow some tolerance in the time selection for xarray in the statisti…
JuliaKukulies Oct 11, 2024
bca9cf0
added test for bulk stats to ensure all timesteps are added to featur…
JuliaKukulies Oct 12, 2024
65b0b1b
corrected notebook issue
JuliaKukulies Oct 12, 2024
c38ae1c
Merge pull request #450 from JuliaKukulies/reference_update
JuliaKukulies Oct 14, 2024
d9f2df8
Merge pull request #449 from JuliaKukulies/bulk_stats_on_raw_input
JuliaKukulies Oct 14, 2024
38f6eb9
Merge branch 'RC_v1.5.x' into min_distance_3D_fix
w-k-jones Oct 14, 2024
ca25b26
Merge remote-tracking branch 'upstream/RC_v1.5.x' into RC_v1.5.4
JuliaKukulies Oct 14, 2024
e8ccdcc
Merge pull request #448 from JuliaKukulies/RC_v1.5.4
JuliaKukulies Oct 15, 2024
17795d5
switch json schema to basic github actions python install
freemansw1 Oct 15, 2024
0a3b23f
update check_formatting job
freemansw1 Oct 15, 2024
bea540c
Re-add statistics on unsmoothed data
w-k-jones Oct 17, 2024
a987435
Merge pull request #452 from tobac-project/min_distance_3D_fix
w-k-jones Oct 17, 2024
bfe5e7d
update black formatting check
freemansw1 Oct 17, 2024
266971f
update black formatting check
freemansw1 Oct 17, 2024
c704246
remove python version requirement
freemansw1 Oct 17, 2024
a887461
switch to micromamba
freemansw1 Oct 17, 2024
478f9ff
updates to micromamba-shell
freemansw1 Oct 17, 2024
b006dd7
switch to using an environment file rather than mamba install
freemansw1 Oct 17, 2024
321ea16
reformatted files with black 24.x
freemansw1 Oct 17, 2024
a1a12bd
Merge branch 'RC_v1.5.x' into fix_json_check
freemansw1 Oct 17, 2024
2ec1642
merged latest changes and reformatted
freemansw1 Oct 17, 2024
2b7bb7c
Merge pull request #457 from freemansw1/fix_json_check
freemansw1 Oct 18, 2024
e19fe49
updated version in __init__.py
freemansw1 Oct 18, 2024
cf92f10
Merge remote-tracking branch 'upstream/RC_v1.5.x' into RC_v1.5.x
freemansw1 Oct 18, 2024
02d3a53
rerun notebooks
freemansw1 Oct 18, 2024
91eb221
updated changelog
freemansw1 Oct 18, 2024
1e526ed
fix bug with some iris/xarray combinations for the example notebook
freemansw1 Oct 18, 2024
2d708be
fix ffmpeg error in notebook
freemansw1 Oct 19, 2024
343243f
Merge pull request #459 from tobac-project/RC_v1.5.x
freemansw1 Oct 21, 2024
9a559ca
Merge remote-tracking branch 'upstream/main' into v160_merge_main
freemansw1 Oct 28, 2024
0be6ddf
fix merge conflicts and failed tests with v1.5.4
freemansw1 Oct 31, 2024
5a9190e
fix formatting
freemansw1 Oct 31, 2024
9a6ac6d
bring CI fixes from append tracking
freemansw1 Nov 1, 2024
57a77c9
add xarray pin for example notebooks
freemansw1 Nov 1, 2024
eb4cd3a
attempt to fix pylint CI
freemansw1 Nov 1, 2024
dafc026
more changes to pylint workflow.
freemansw1 Nov 1, 2024
203e4fd
add ffmpeg to example notebook requirements
freemansw1 Nov 1, 2024
706b3ee
Merge pull request #465 from freemansw1/fix_CI_issues
freemansw1 Nov 8, 2024
dd83955
fix linting syntax error
freemansw1 Nov 8, 2024
ddf7f96
update checkout version
freemansw1 Nov 8, 2024
de4b9c7
fix python versions not tested
freemansw1 Nov 13, 2024
0d06863
add future annotations
freemansw1 Nov 13, 2024
dfb21a6
add future annotations to tracking
freemansw1 Nov 13, 2024
e527b5e
add future annotations to tracking tests
freemansw1 Nov 13, 2024
0b75b46
Change .data to .values call in get_statistics_from_mask to prevent p…
w-k-jones Dec 13, 2024
cafc779
Merge pull request #468 from freemansw1/fix_typing_error
freemansw1 Dec 18, 2024
feeff81
Merge pull request #474 from w-k-jones/fix_calculate_area_dask
w-k-jones Dec 18, 2024
4556917
Update version number to 1.5.5
w-k-jones Dec 18, 2024
edec939
Update changelog
w-k-jones Dec 18, 2024
f66dce2
Add nbconvert to dev requirements file
w-k-jones Dec 18, 2024
5dba99e
Update notebooks for v1.5.5 release
w-k-jones Dec 18, 2024
a7431f6
Fix version number in changelog
w-k-jones Dec 18, 2024
be72676
Merge pull request #476 from w-k-jones/release_v1.5.5
w-k-jones Dec 24, 2024
ad8d41a
Merge remote-tracking branch 'upstream/main' into v160_merge_main
freemansw1 Jan 6, 2025
0a163ba
Merge pull request #466 from freemansw1/fix_linting_CI
freemansw1 Jan 6, 2025
848475d
Merge remote-tracking branch 'upstream/main' into v160_merge_main
freemansw1 Jan 6, 2025
62b6f26
add annotations
freemansw1 Jan 6, 2025
9d4f03e
add another future import
freemansw1 Jan 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 9 additions & 13 deletions .github/workflows/check_formatting.yml
Original file line number Diff line number Diff line change
@@ -1,23 +1,19 @@
name: check_formatting
name: Check Python File Formatting with Black
on: [push, pull_request]
jobs:
formatting_job:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Set up conda
uses: conda-incubator/setup-miniconda@v2
uses: mamba-org/setup-micromamba@v1
with:
miniforge-version: latest
miniforge-variant: mambaforge
channel-priority: strict
channels: conda-forge
show-channel-urls: true
use-only-tar-bz2: true

- name: Install dependencies and check formatting
shell: bash -l {0}
environment-file: environment-ci.yml
generate-run-shell: true
cache-environment: true
cache-downloads: true
- name: Check formatting
shell: micromamba-shell {0}
run:
mamba install --quiet --yes --file requirements.txt black &&
black --version &&
black tobac --check --diff
9 changes: 4 additions & 5 deletions .github/workflows/check_json.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,12 @@ jobs:
shell: bash -el {0}
steps:
- name: check out repository code
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: set up conda environment
uses: conda-incubator/setup-miniconda@v2
uses: actions/setup-python@v5
with:
auto-update-conda: true
auto-activate-base: false
activate-environment: checkjson-env
python-version: '3.12'
cache: 'pip' # caching pip dependencies
- name: Install check-jsonschema
run: |
pip install check-jsonschema
Expand Down
21 changes: 6 additions & 15 deletions .github/workflows/check_notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,22 +9,13 @@ jobs:
steps:
- name: check out repository code
uses: actions/checkout@v3
- name: set up conda environment
uses: conda-incubator/setup-miniconda@v2
- name: set up mamba environment
uses: mamba-org/setup-micromamba@v1
with:
miniforge-version: latest
miniforge-variant: mambaforge
channel-priority: strict
channels: conda-forge
show-channel-urls: true
use-only-tar-bz2: true
auto-update-conda: true
auto-activate-base: false
activate-environment: notebook-env
- name: Install tobac dependencies
run: |
mamba install -c conda-forge --yes ffmpeg gcc jupyter pytables
mamba install -c conda-forge --yes --file example_requirements.txt
environment-file: environment-examples.yml
generate-run-shell: true
cache-environment: true
cache-downloads: true
- name: Install tobac
run: |
pip install .
Expand Down
20 changes: 8 additions & 12 deletions .github/workflows/codecov-CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,24 +7,20 @@ jobs:
runs-on: ubuntu-latest
env:
OS: ubuntu-latest
PYTHON: "3.9"
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
# Similar to MetPy install-conda action
- name: Set up conda
uses: conda-incubator/setup-miniconda@v2
uses: mamba-org/setup-micromamba@v1
with:
miniforge-version: latest
miniforge-variant: mambaforge
channel-priority: strict
channels: conda-forge
show-channel-urls: true
use-only-tar-bz2: true
environment-file: environment-ci.yml
generate-run-shell: true
cache-environment: true
cache-downloads: true

- name: Install dependencies and generate report
shell: bash -l {0}
- name: Generate report
shell: micromamba-shell {0}
run:
mamba install --quiet --yes --file requirements.txt coverage pytest-cov &&
python -m coverage run -m pytest --cov=./ --cov-report=xml
- name: Upload Coverage to Codecov
uses: codecov/codecov-action@v4
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/matrix_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11"]
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
os: [macos, ubuntu, windows]

steps:
Expand All @@ -35,7 +35,7 @@ jobs:
cache-downloads: true
channels: conda-forge
channel-priority: strict
python-version: ${{ matrix.python-version }}
create-args: python=${{ matrix.python-version }}

- name: Fetch all history for all tags and branches
run: |
Expand Down
35 changes: 15 additions & 20 deletions .github/workflows/pylint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,49 +7,44 @@ permissions:
pull-requests: write

jobs:
build:
lint-workflow:
runs-on: ubuntu-latest
defaults:
run:
shell: bash -l {0}
steps:
- name: Check out Git repository
uses: actions/checkout@v3

- name: Set up conda
uses: conda-incubator/setup-miniconda@v2
uses: actions/checkout@v4
- name: Set up mamba environment
uses: mamba-org/setup-micromamba@v1
with:
miniforge-version: latest
miniforge-variant: mambaforge
channel-priority: strict
channels: conda-forge
show-channel-urls: true
use-only-tar-bz2: true

- name: Install tobac and pylint
run: |
mamba install --yes pylint
environment-file: environment-ci.yml
generate-run-shell: true
cache-environment: true
cache-downloads: true
- name: Install tobac
run:
pip install .

- name: Store the PR branch
run: |
run:
echo "SHA=$(git rev-parse "$GITHUB_SHA")" >> $GITHUB_OUTPUT
id: git

- name: Checkout RC branch
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
ref: ${{ github.base_ref }}

- name: Get pylint score of RC branch
run: |
run:
pylint tobac --disable=C --exit-zero
id: main_score

- name: Checkout PR branch
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
ref: "${{ steps.git.outputs.SHA }}"
ref: ${{ steps.git.outputs.SHA }}

- name: Get pylint score of PR branch
run: |
Expand Down
42 changes: 42 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,47 @@
### Tobac Changelog

_**Version 1.5.5:**_

**Bug fixes**

- Including of annotations import for python versions before 3.10 [#468](https://github.com/tobac-project/tobac/pull/468)
- Fix bulk statistics calculation when provided a dask array [#474](https://github.com/tobac-project/tobac/pull/474)

**Internal Enhancements**

- Fix matrix testing to use the specified python versions [#468](https://github.com/tobac-project/tobac/pull/468)


_**Version 1.5.4:**_

**Enhancements for Users**

- Added the ability to use the Minimum Euclidean Spanning Tree merge/split method on data with periodic boundaries [#372](https://github.com/tobac-project/tobac/pull/372)
- Added the ability to calculate online bulk statistics during feature detection on the raw (i.e., unsmoothed) data [#449](https://github.com/tobac-project/tobac/pull/449)

**Bug fixes**

- Fixes to calculations of bulk statistics [#437](https://github.com/tobac-project/tobac/pull/437)
- Fixes to handling of PBC feature points on the PBC wraparound border [#434](https://github.com/tobac-project/tobac/pull/434)
- Fixed an error that allows non-matching features to be used in the offline bulk statistics calculation [#448](https://github.com/tobac-project/tobac/pull/448)
- Fixed a bug that prevented using minimum distance filtering with varying vertical coordinates [#452](https://github.com/tobac-project/tobac/pull/452)

**Documentation**

- Add thumbnails to the new example gallery [#428](https://github.com/tobac-project/tobac/pull/428)
- Added documentation for developers [#281](https://github.com/tobac-project/tobac/pull/281)
- Updated documentation for the `n_min_threshold` function in feature detection [#432](https://github.com/tobac-project/tobac/pull/432)
- Added documentation for dealing with big datasets [#408](https://github.com/tobac-project/tobac/pull/408)
- Updated documentation to note that the *tobac* v1.5.0 paper in GMD is in its final form [#450](https://github.com/tobac-project/tobac/pull/450)

**Internal Enhancements**

- PBC Distance Function handling improved for tracking and other portions of the library that uses it [#386](https://github.com/tobac-project/tobac/pull/386)
- Added tests to `tobac.utils.get_spacings` [#429](https://github.com/tobac-project/tobac/pull/429)
- Added matrix testing for Python 3.12 [#451](https://github.com/tobac-project/tobac/pull/451)
- Resolved issues around updating dependencies in `black` formatting checks and Zenodo JSON checks [#457](https://github.com/tobac-project/tobac/pull/457)


_**Version 1.5.3:**_

**Enhancements for Users**
Expand Down
1 change: 1 addition & 0 deletions dev_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ pre-commit
black
pytest
typing_extensions
nbconvert
50 changes: 47 additions & 3 deletions doc/big_datasets.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,55 @@
Handling Large Datasets
-------------------------------------

Often, one desires to use *tobac* to identify and track features in large datasets ("big data"). This documentation strives to suggest various methods for doing so efficiently. Current versions of *tobac* do not allow for out-of-memory computation, meaning that these strategies may need to be employed for both computational and memory reasons.
Often, one desires to use *tobac* to identify and track features in large datasets ("big data"). This documentation strives to suggest various methods for doing so efficiently. Current versions of *tobac* do not support out-of-core (e.g., :code:`dask`) computation, meaning that these strategies may need to be employed for both computational and memory reasons.

.. _Split Feature Detection:

=======================
Split Feature Detection
Split Feature Detection and Run in Parallel
=======================
Current versions of threshold feature detection (see :doc:`feature_detection_overview`) are time independent, meaning that one can parallelize feature detection across all times (although not across space). *tobac* provides the :py:meth:`tobac.utils.combine_tobac_feats` function to combine a list of dataframes produced by a parallelization method (such as :code:`jug` or :code:`multiprocessing.pool`) into a single combined dataframe suitable to perform tracking with.
Current versions of threshold feature detection (see :doc:`feature_detection_overview`) are time independent, meaning that one can easily parallelize feature detection across all times (although not across space). *tobac* provides the :py:meth:`tobac.utils.combine_feature_dataframes` function to combine a list of dataframes produced by a parallelization method (such as :code:`jug`, :code:`multiprocessing.pool`, or :code:`dask.bag`) into a single combined dataframe suitable to perform tracking with.

Below is a snippet from a larger notebook demonstrating how to run feature detection in parallel ( :doc:`big_datasets_examples/notebooks/parallel_processing_tobac`):

::

# build list of tracked variables using Dask.Bag

b = db.from_sequence(
[
combined_ds["data"][x : x + 1]
for x in range(len(combined_ds["time"]))
],
npartitions=1,
)
out_feature_dfs = db.map(
lambda x: tobac.feature_detection_multithreshold(
x.to_iris(), 4000, **parameters_features
),
b,
).compute()

combined_dataframes = tobac.utils.general.combine_feature_dataframes(out_feature_dfs)


.. _Split Segmentation:

======================================
Split Segmentation and Run in Parallel
======================================
Recall that the segmentation mask (see :doc:`segmentation_output`) is the same size as the input grid, which results in large files when handling large input datasets. The following strategies can help reduce the output size and make segmentation masks more useful for the analysis.

The first strategy is to only segment on features *after tracking and quality control*. While this will not directly impact performance, waiting to run segmentation on the final set of features (after discarding, e.g., non-tracked cells) can make analysis of the output segmentation dataset easier.

To enhance the speed at which segmentation runs, one can process multiple segmentation times in parallel independently, similar to feature detection. Unlike feature detection, however, there is currently no built-in *tobac* method to combine multiple segmentation times into a single file. While one can do this using typical NetCDF tools such as :code:`nccat` or with xarray utilities such as :code:`xr.concat`, you can also leave the segmentation mask output as separate files, opening them later with multiple file retrievals such as :code:`xr.open_mfdataset`.


.. _Tracking Hanging:

=====================================
Tracking Hangs with too many Features
=====================================

When tracking on a large dataset, :code:`tobac.tracking.linking_trackpy` can hang using the default parameters. This is due to the tracking library :code:`trackpy` searching for the next timestep's feature in too large of an area. This can be solved *without impact to scientific output* by lowering the :code:`subnetwork_size` parameter in :code:`tobac.tracking.linking_trackpy`.

Loading
Loading