Skip to content

Conversation

@volcan01010
Copy link
Contributor

@volcan01010 volcan01010 commented Nov 3, 2025

Description

This pull request adds a set of tests for the behaviour of geodiff.rebase() within pygeodiff.
They are intended as an aid to addressing issue #210, where the presence of UNIQUE constraints in a database causes geodiff to throw an error.

The first group of tests covers situations where there are no conflicts or constraint violations and geodiff is expected to give the same results with or without the database constraints. The current behaviour is that these raise an error in databases with a UNIQUE constraint.

The second group of tests covers situations where edits by users have resulted in a conflicting UPDATE, a violation of a UNIQUE constraint or the violation of a FOREIGN KEY constraint. The latter two are new scenarios, and I wasn't sure what the correct behavior of geodiff should be. I have followed the pattern set by the conflicting updates.

Tests that fail with the current geodiff implementation are marked with pytest's xfail flag. This allows this code to be merged without it failing the CI pipeline. Once #210 is solved, the tests will pass and the flag can be removed.

Note that these tests only address a SQLite database. SQLite databases will only enforce a FOREIGN KEY constraint if explicitly configured to via a PRAGMA command. I have not run the tests against a PostGIS database.

Example output

$ GEODIFFLIB=$(pwd)/geodiff/build/libgeodiff.so GEODIFFCLI=$(pwd)/geodiff/build/geodiff pytest pygeodiff/tests
=================================================================== test session starts ====================================================================
platform linux -- Python 3.13.8, pytest-8.4.2, pluggy-1.6.0
rootdir: /home/user/github/MerginMaps/geodiff
configfile: pyproject.toml
collected 33 items

pygeodiff/tests/test_api_calls.py ..                                                                                                                 [  6%]
pygeodiff/tests/test_changeset_reader.py .....                                                                                                       [ 21%]
pygeodiff/tests/test_cli.py .                                                                                                                        [ 24%]
pygeodiff/tests/test_concurrent_commits.py .                                                                                                         [ 27%]
pygeodiff/tests/test_errors.py ....                                                                                                                  [ 39%]
pygeodiff/tests/test_geodiff_rebase.py .x.x.x.x..xxxx                                                                                                [ 81%]
pygeodiff/tests/test_geometry_utils.py .                                                                                                             [ 84%]
pygeodiff/tests/test_single_commit.py ...                                                                                                            [ 93%]
pygeodiff/tests/test_skip_tables.py ..                                                                                                               [100%]

============================================================== 25 passed, 8 xfailed in 0.98s ===============================================================

This pull request supersedes pull request #229

@volcan01010 volcan01010 marked this pull request as draft November 3, 2025 19:14
@volcan01010 volcan01010 changed the title DRAFT: Add fk tests DRAFT: Add tests handling rebase with database-defined UNIQUE and FOREIGN KEY constraints Nov 4, 2025
@volcan01010 volcan01010 changed the title DRAFT: Add tests handling rebase with database-defined UNIQUE and FOREIGN KEY constraints Add tests handling rebase with database-defined UNIQUE and FOREIGN KEY constraints Nov 4, 2025
@volcan01010 volcan01010 marked this pull request as ready for review November 4, 2025 12:09
@volcan01010
Copy link
Contributor Author

@wonder-sk - it may be useful to have a call to discuss this, particularly how the new scenarios where there are genuine UNIQUE or FK constraint violations.

@volcan01010 volcan01010 changed the title Add tests handling rebase with database-defined UNIQUE and FOREIGN KEY constraints Add tests for rebase with database-defined constraints Nov 4, 2025
@dvdkon
Copy link
Contributor

dvdkon commented Nov 6, 2025

Hi, I'm working on fixing #210 (see PR #232). Your tests have been helpful in checking that my patches actually work and I think it's a good idea to get them into Geodiff to guard against regressions. I have a few things that I'd like to see addressed, though:

  • The commit history is needlessly long, I'd rather squash it into one commit (but I can do that myself if need be).
  • The name test_geodiff_rebase.py is not very descriptive. Maybe test_db_constraints.py?
  • The Python file should have the same header as the others (encoding comment just for consistency + copyright information).
  • As I've now fixed the issue, the tests should no longer be marked xfail.
  • As implemented, DB constraint conflicts don't write anything into the conflicts file. This is because the constraint-level issues are only discovered at the end, when applying the changeset created while diffing.
    This means that changesets that are unappliable due to constraints will cause an exception to be thrown during rebase().

These shouldn't be too much trouble and I hope we'll soon have this issue fixed and tested.

@ximenesuk
Copy link

Hi, thank you for the work so far on issue #210 (PR #232). We have cherry-picked the commit onto this branch and can confirm that four of the tests are now XPASSing (with a renamed test file):

============================================================================ test session starts =============================================================================
platform linux -- Python 3.14.0, pytest-9.0.0, pluggy-1.6.0
rootdir: /home/colin/repos/fdc/geodiff
configfile: pyproject.toml
collected 14 items

pygeodiff/tests/test_rebase_db_constraint.py .X.X.X.X..xxxx                                                                                                            [100%]

================================================================== 6 passed, 4 xfailed, 4 xpassed in 1.36s ===================================================================

Those tests have now been unmarked.

Of the four that are still expected to fail, two are failing due to your final point above:

========================================================================= short test summary info ===========================================================================
FAILED pygeodiff/tests/test_rebase_db_constraint.py::test_geodiff_rebase_unique_constraint_violation[user_a_data_first] - pygeodiff.geodifflib.GeoDiffLibError: rebase
FAILED pygeodiff/tests/test_rebase_db_constraint.py::test_geodiff_rebase_unique_constraint_violation[user_b_data_first] - pygeodiff.geodifflib.GeoDiffLibError: rebase
====================================================================== 2 failed, 12 deselected in 0.29s ======================================================================

The final two are failing due to foreign key violations:

========================================================================== short test summary info ===========================================================================
FAILED pygeodiff/tests/test_rebase_db_constraint.py::test_geodiff_rebase_fkey_constraint_violation[user_a_data_first] - AssertionError: assert {(20251103002, 'OAK', 30), (20251103004, 'PIN', 101), (20251103001, 'MPL', 25)} == {(20251103002, 'OAK', 30), (20251103001, 'MPL', 25)}

  Extra items in the left set:
  (20251103004, 'PIN', 101)

  Full diff:
    {
        (
            20251103001,
            'MPL',
            25,
        ),
        (
            20251103002,
            'OAK',
            30,
        ),
  +     (
  +         20251103004,
  +         'PIN',
  +         101,
  +     ),
    }
FAILED pygeodiff/tests/test_rebase_db_constraint.py::test_geodiff_rebase_fkey_constraint_violation[user_b_data_first] - AssertionError: assert {('OAK', 'Oak'), ('MPL', 'Maple')} == {('OAK', 'Oak'), ('PIN', 'Pine'), ('MPL', 'Maple')}

  Extra items in the right set:
  ('PIN', 'Pine')

  Full diff:
    {
        (
            'MPL',
            'Maple',
        ),
        (
            'OAK',
            'Oak',
        ),
  -     (
  -         'PIN',
  -         'Pine',
  -     ),
    }
====================================================================== 2 failed, 12 deselected in 0.25s ======================================================================

@volcan01010
Copy link
Contributor Author

volcan01010 commented Nov 11, 2025

Thanks for this @dvdkon. Some comments from when @ximenesuk and I ran the code today.

  • We have made the changes that you suggested in the comment above.
  • The tests that now pass cover the initial scenario that was reported in Database defined UNIQUE or FOREIGN KEY constraint causes rebase to fail #210. This is great! Once this work is complete and has filtered through all the Mergin ecosystem, our syncing issues should disappear and we can fully embrace multi-user working across mobile and desktop. Thank you!
  • The remaining failing tests cover new scenarios that are introduced by the presence of constraints. Now, two individually constraint-compliant databases produce a result that violates the constraints when they are rebased. In writing those tests, I didn't know what the correct behaviour would be, so I assumed the geodiff would do the merge as best it can without raising an error, then producing a conflict file with the unmerged rows.
  • This means that the merged database contains the changes from theirs only. This is the opposite to when conflicting updates are handled and theirs is overwritten with ours. You may have a different idea for the preferred behaviour, or you may want to change how the conflicting updates are handled so that all types of conflicts prioritise the data from either theirs or mine. We can updated the tests to reflect the required specification.
  • In the conflict file, the databases are referred to as new and old. This is different to the arguments passed to the rebase function (which are modified_their and modified) and also implies relative chronology that may not reflect when the files were actually changed. In the tests, we have called the databases theirs and mine. It would be good to get consistent names through the C++, the Python and the tests.
  • The test_geodiff_rebase_unique_constraint_violation behaves as expected for now by throwing an error on the UNIQUE constraint violation.
  • The test result in @ximenesuk comment above shows that the test_geodiff_rebase_fkey_constraint_violation fails because the data in the final database are not as expected. Actually, this test should also have thrown an error due to a FOREIGN KEY violation. It doesn't because geodiff doesn't run the PRAGMA command to enable foreign key enforcement when it connects to a SQLite database. We think that it should. Without it, the data produced by a rebase contains orphaned rows.
  • Foreign keys will be enforced when geodiff runs against PostGIS, so the algorithm will need to handle them in those cases.
  • We are happy to have a call to discuss the changes if you'd like.

@volcan01010
Copy link
Contributor Author

Another point regarding the PRAGMA foreign_keys=ON: there is existing code within the SQLite driver that throws an exception if foreign keys are detected, but it is never called. Will it be removed as part of this work? See this comment on Issue 210 for details: #210 (comment).

dvdkon and others added 3 commits November 13, 2025 10:52
All the issues checked by this function are now handled.
Co-authored-by: David Koňařík <dvdkon@konarici.cz>
@dvdkon
Copy link
Contributor

dvdkon commented Nov 13, 2025

@volcan01010 Thanks for making the changes, I've slightly modified your tests to fit the current Geodiff behaviour and cherry-picked them into my PR (#232).

Also thanks for reminding me of checkCompatibleForRebase. This wasn't used in the "main" codepath and was outdated anyway, so I've fully deleted it.

I've set the SQLite driver to always enable foreign key constraints, since users couldn't enable them on their own otherwise.

Sadly we can't report conflicts caused by DB constraints in the conflicts file, because they are only discovered at the very end, after the diffs are written and when they're being applied to the database. The problematic changes are at least written as warnings to the log (which goes to stdout when GEODIFF_LOGGER_LEVEL is set appropriately).

@volcan01010
Copy link
Contributor Author

Thanks. I'll close this merge request and continue discussion on #232.

@volcan01010 volcan01010 reopened this Nov 14, 2025
@volcan01010
Copy link
Contributor Author

I've re-opened, as we can't push code to the branch in the other pull request and may have other suggested edits for the tests.

@volcan01010
Copy link
Contributor Author

volcan01010 commented Nov 14, 2025

This is great news. I'm happy that foreign keys will be checked now, because they are important to us.

Sadly we can't report conflicts caused by DB constraints in the conflicts file, because they are only discovered at the very end, after the diffs are written and when they're being applied to the database. The problematic changes are at least written as warnings to the log (which goes to stdout when GEODIFF_LOGGER_LEVEL is set appropriately).

It would be valuable to get error information back in a more direct way, and at the moment the function is called. Two possible ways are:

  • GeoDiffLibError attributes. At the moment, the GeoDiffLibError just says "rebase". When SQLite throws an error, it includes a message like UNIQUE constraint failed: species.species_id or FOREIGN KEY constraint failed. Can we make geodiff include that information in the error, and then pass that through to the GeoDiffLibError that Python sees? If some information about the changes being applied at the time could also be attached, that would be even better. Just knowing the table name would be helpful. This should be a relatively easy change.
  • rebase() return values. At the moment, the rebase function returns None. It could be made to behave a bit more like an API and return a JSON result, with status and message fields. The status could be OK, conflict, unique constraint failed etc, with messages of the conflict file contents or the database message as the code. This would be more work, but would follow the pattern of geodiff only raising an error if something unexpected happens.

My preference would be for adding more information to the GeoDiffLibErrors. You could also make a specific GeoDiffLibConstraintError. If we had a GeoDiffLibConstraintError with helpful information, the QGIS plugin could better explain to the user what has gone wrong. At the moment, users get a fright when all their points "disappear" as their geopackage is turned into a conflicted copy. As developers, we can look in .mergin/client-log.txt, but our users don't know about this. If you make this change now, then it gives us scope to contribute changes to the QGIS plugin ourselves. We are Python developers, so we can't contribute to the C++.

I expect that our users will get UNIQUE constraint errors from time to time. Our field data capture plugin uses QGIS expressions to autogenerate locality IDs, which have a unique constraint, based on the user login name. If a user works on two separate devices without syncing as they change between them, they will collect localities with matching IDs. This will result in a genuine unique constraint error when they rebase. I know that this happens, because I have done it myself! It would be nice to be able to explain this when the rebase fails.

@volcan01010
Copy link
Contributor Author

Hi @dvdkon,

I have merged your changes back onto this branch and pushed up a few more commits.

  • The first updates the tests for the UNIQUE and FOREIGN KEY scenarios so that they assert that they get a more specific error and that the message contains the underlying error from the database. This will make debugging easier if a conflict does occur. I used the existing GeoDiffLibConflictError. When these tests pass, we can be happy that Database defined UNIQUE or FOREIGN KEY constraint causes rebase to fail #210 is solved.
  • Are there any existing scenarios where a GeoDiffLibConflictError is actually raised? There aren't any in the tests. If pygeodiff encounters a conflict, it handles it without error and creates the conflict file (as seen in test_geodiff_rebase_conflicting_edits). In that case, perhaps it could become a GeoDiffLibConstraintError instead.
  • I added a couple of tests for scenarios that raise a plain GeoDiffLibError. These can be used to catch regressions in further work. I also fixed a minor typo.

@volcan01010
Copy link
Contributor Author

Looking at pygeodiff source code, I see that it doesn't get any text argument back from C++ library, only a return code. This would make it hard to give much debugging information in the error. As a minimum, just knowing whether it was a UNIQUE or FOREIGN KEY error would be valuable.

Can you add extra return codes and error types? Then geodifflib.py could do the following:

class GeoDiffLibConstraintError(GeoDiffLibError):
    pass

# keep in sync with c-library
SUCCESS = 0
ERROR = 1
CONFLICT = 2
UNSUPPORTED_CHANGE = 3
FAILED_UNIQUE_CONSTRAINT = 4
FAILED_FOREIGN_KEY_CONSTRAINT = 5


def _parse_return_code(rc, msg):
    if rc == SUCCESS:
        return
    elif rc == ERROR:
        raise GeoDiffLibError(msg)
    elif rc == CONFLICT:
        raise GeoDiffLibConflictError(msg)
    elif rc == UNSUPPORTED_CHANGE:
        raise GeoDiffLibUnsupportedChangeError(msg)
    elif rc == FAILED_UNIQUE_CONSTRAINT:
        raise GeoDiffLibConstraintError(f"{msg} (UNIQUE constraint failed)")
    elif rc == FAILED_FOREIGN_KEY_CONSTRAINT:
        raise GeoDiffLibConstraintError(f"{msg} (FOREIGN KEY constraint failed)")
    else:
        raise GeoDiffLibVersionError(
            "Internal error (enum " + str(rc) + " not handled)"
        )

Or you could have separate GeoDiffLibUniqueConstraintError and GeoDiffLibForeignKeyConstraintError classes.

@volcan01010
Copy link
Contributor Author

I've just pushed a commit that uses a custom LoggingGeoDiff class to check the log messages. This is helpful for the test, but wouldn't help with the stock GeoDiff, where it would be useful to know what kind of error was thrown.

Below is output from the constration violation tests.

$ GEODIFFLIB=$(pwd)/geodiff/build/libgeodiff.so GEODIFFCLI=$(pwd)/geodiff/build/geodiff pytest pygeodiff/tests/test_rebase_db_constraint.py -k violation -vvvs
=================================================================== test session starts ====================================================================
platform linux -- Python 3.13.8, pytest-8.4.2, pluggy-1.6.0 -- /home/user/.virtualenvs/geodiff/bin/python3.13
cachedir: .pytest_cache
rootdir: /home/user/github/MerginMaps/geodiff
configfile: pyproject.toml
collected 18 items / 14 deselected / 4 selected

pygeodiff/tests/test_rebase_db_constraint.py::test_geodiff_rebase_unique_constraint_violation[user_a_data_first] DEBUG: rebase info (base2their / old)
TABLE species
  inserted 4,
  deleted  --none --
  updated  --none --

DEBUG: mapping
  species
    4->5,

DEBUG: No conflicts present
WARNING: CONFLICT: unresolvable_conflict:
{
  "changes": [
    {
      "column": 0,
      "new": 5
    },
    {
      "column": 1,
      "new": "BCH"
    },
    {
      "column": 2,
      "new": "Beech"
    }
  ],
  "table": "species",
  "type": "insert"
}
ERROR: Could not resolve dependencies in constraint conflicts.
ERROR: Unable to perform GEODIFF_applyChangeset modified2final
PASSED
pygeodiff/tests/test_rebase_db_constraint.py::test_geodiff_rebase_unique_constraint_violation[user_b_data_first] DEBUG: rebase info (base2their / old)
TABLE species
  inserted 4,
  deleted  --none --
  updated  --none --

DEBUG: mapping
  species
    4->5,

DEBUG: No conflicts present
WARNING: CONFLICT: unresolvable_conflict:
{
  "changes": [
    {
      "column": 0,
      "new": 5
    },
    {
      "column": 1,
      "new": "BCH"
    },
    {
      "column": 2,
      "new": "Birch"
    }
  ],
  "table": "species",
  "type": "insert"
}
ERROR: Could not resolve dependencies in constraint conflicts.
ERROR: Unable to perform GEODIFF_applyChangeset modified2final
PASSED
pygeodiff/tests/test_rebase_db_constraint.py::test_geodiff_rebase_fkey_constraint_violation[user_a_data_first] DEBUG: rebase info (base2their / old)
TABLE species
  inserted --none --
  deleted  3,
  updated  --none --
TABLE trees
  inserted --none --
  deleted  3,
  updated  --none --

DEBUG: mapping
  trees
    --none --

DEBUG: No conflicts present
ERROR: Failed to release savepoint (SQLITE3 error [787]: FOREIGN KEY constraint failed)
ERROR: Unable to perform GEODIFF_applyChangeset modified2final
PASSED
pygeodiff/tests/test_rebase_db_constraint.py::test_geodiff_rebase_fkey_constraint_violation[user_b_data_first] DEBUG: rebase info (base2their / old)
TABLE trees
  inserted 4,
  deleted  --none --
  updated  --none --

DEBUG: mapping
--none --

DEBUG: No conflicts present
ERROR: Failed to release savepoint (SQLITE3 error [787]: FOREIGN KEY constraint failed)
ERROR: Unable to perform GEODIFF_applyChangeset modified2final
PASSED

@dvdkon
Copy link
Contributor

dvdkon commented Nov 18, 2025

Looking at pygeodiff source code, I see that it doesn't get any text argument back from C++ library, only a return code. This would make it hard to give much debugging information in the error. As a minimum, just knowing whether it was a UNIQUE or FOREIGN KEY error would be valuable.

I agree that GeoDiff could (and should) give more detailed errors to the caller, without hacks like reading the log. I made some changes and pushed them to a dev branch, I'll make a PR later, once the base change gets merged.

I "hijacked" the GEODIFF_CONFICTS error code that's been unused since beb141e to handle this case, and extended the API to also report the full error messages in a backwards-compatible way. The tests now check that both the right exception is thrown, and that the message string contains details as to which kind of constraint broke rebasing.

@volcan01010
Copy link
Contributor Author

Amazing! Thank you.

I've pushed up a couple more commits. They disable LoggingGeodiff and add a couple of other tests. The first checks when no changes have been made. That is just to prevent regressions.

The second adds another failure scenario, where the failure is caused by an error raised by a trigger script. The test layout matches the ones in your development branch. It will be interesting to see if the error message propagates through in the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants